Recently I read an article regarding ethics in data science. The ethics here is not about plagiarism, disclosure of confidential data, or dishonesty, but the decision in designing a model with the consideration of ethics. This sparked my thinking without any conclusions.
A lot of countries have a long and painful history of racism. In America, not to even mention the history of slavery, a recent verdict against a Chinese-American police officer induced a nationwide Asian-American campaign, given the history of Chinese Exclusion Act. Recruitment nowadays has to be technically not based on race, but we all know that racism in job market still virtually exists. When, like in the article, a public policy is enacted with the help of an algorithm, a tendency to racism can be problematic. For some algorithms, people might not know that race is taken in the model unless someone is monitoring. The data scientists can secretly put that in without cost. But is it ethical?
Or it can be that because the data is so historical that it carries a race-biased history, but we know that race is not a factor to a particular situation. We may simply throw away race in the model; or even worse, we need a “counter-term” to combat this dark history in the data to build a useful predictive model.
Sometimes, it might be favorable to put race in the model so that even the underprivileged peoples are also happy. For example, instead of public policy, I am writing a dating website. Race, gender and sexual orientation are important too, besides personality types, age difference etc.
Because a lot of algorithms, such as SVM or neural network, work like a black box, we do not immediately know the biased effect. But if it turns out it is not obvious or people are simply happy, it seems it does not matter. But is it?
Or do we actually over-consider? People might not care as much as you think, but the scientists may be held liable. Political correctness can be a killer. Maybe it is the reason why there are so many headline stories in the primary presidential campaign now.
- Cathy O’Neil, “The Ethical Data Scientist,” Slate.com (2016). [link]