Data Science Ethics

Today we will discuss your (HW) case studies in potential ethics violations, as well as discuss our broader responsibilities as a data scientists. Optional further reading is this book chapter from "Modern Data Science with R". We will use the following real-world examples to guide our discussion of how decisions made by data scientists can impact individuals:

Ethics of Releasing Datasets:
Read this article from Wired discusses the implications of releasing a big dataset from OKCupid, a dating website. The OKCupid publication can be downloaded here for more details.

Ethics of Creating Classification Software:
Some data scientists released this software that inputs a name with their location, and predicts that person's race. Read this article about ethical considerations surrounding the topic of "filling in" race information. For added context, an updated 2022 analysis of the software can be found here, as well as an example of its use during COVID.

Ethics of Securing Data:
In 2021, the popular new stock trading app, Robinhood, had a major data breach. Read about the breach here and also read the 2022 lawsuit settlement that resulted from it.


The Data Science Oath
(adapted from the Hippocratic Oath -- a side-by-side comparison)

National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. https://doi.org/10.17226/25104.

I swear to fulfill, to the best of my ability and judgment, this covenant:

I will respect the hard-won scientific gains of those data scientists in whose steps I walk and gladly share such knowledge as is mine with those who follow.

I will apply, for the benefit of society, all measures which are required, avoiding misrepresentations of data and analysis results.

I will remember that there is art to data science as well as science and that consistency, candor, and compassion should outweigh the algorithm's precision or the interventionist's influence.

I will not be ashamed to say, "I know not," nor will I fail to call in my colleagues when the skills of another are needed for solving a problem.

I will respect the privacy of my data subjects, for their data are not disclosed to me that the world may know, so I will tread with care in matters of privacy and security. If it is given to me to do good with my analyses, all thanks. But it may also be within my power to do harm, and this responsibility must be faced with humbleness and awareness of my own limitations.

I will remember that my data are not just numbers without meaning or context, but represent real people and situations, and that my work may lead to unintended societal consequences, such as inequality, poverty, and disparities due to algorithmic bias. My responsibility must consider potential consequences of my extraction of meaning from data and ensure my analyses help make better decisions.

I will perform personalization where appropriate, but I will always look for a path to fair treatment and nondiscrimination.

I will remember that I remain a member of society, with special obligations to all my fellow human beings, those who need help and those who don't.

If I do not violate this oath, may I enjoy vitality and virtuosity, respected for my contributions and remembered for my leadership thereafter. May I always act to preserve the finest traditions of my calling and may I long experience the joy of helping those who can benefit from my work.


Download our oath worksheet for an in-class activity on aligning our ethical topics to the DS Oath.