Ethics in Data Science

There is a very noticeable effort in thinking about ethics towards data collection online and in general dealing with data about people. This is especially (or, rather, especially for me) clear in the field of Data Science. There are multiple reasons why Data Science broadly is seen as a tool to weaponize snooping on users to a ridiculous degree. Examples of that are: Palantir spying in New Orleans, China spying on its citizens Minority Report style as well as literally any adtech business.

As a quick aside for the latter example, one of the argument tends to be “If you are not paying for content, then you should be OK with adtech companies collecting information about to serve personalized ads. That’s how they make money, so deal with it or prepare to pay for all content”. Recently I’ve came across this argument once again on HackerNews and one of the replies resonated with me. I won’t be able to find a link, but the main idea is that ads exist for centuries. When you watch TV you don’t expect to be served personalized ads and yet those ad-spots are valuable to companies. This tells us that it is possible to have profitable ad-business without extensive and creepy data-collection.

Anyways, for me those examples are interesting, but what motivated me to write about this topic mostly comes from two more examples of attitude that practicing Data Scientists have towards this topic.

First of all, I’m listening to Dataframed podcast from Datacamp. The episode that I’m talking about is episode 11 named Data Science at BuzzFeed and the Digital Media Landscape (with Adam Kelleher). I don’t know about you, but to me BuzzFeed is a poster child of clickbait. Views differ, but personally I think that they are also a big reason of what is now called “fake news” since they showed a working example of how to create “news” that people will click on. Now, I wasn’t expecting Adam from BuzzFeed to come on this podcast with some sort of mea culpa about his employer, but in 1 hour that this podcast takes I thought one of the topics must have been on ethics of what BuzzFeed is doing and his take on it. Maybe I’ve missed it…

Second example comes from AMA by Yann LeCun, Eric Horvitz and Peter Norvig, specifically answers by Yann LeCun. There are multiple questionable answers by him, but my “favorite” is this one:

Question: What specific measures are you taking to insure these technologies will decrease inequality rather than increase it? How will it be placed in the hands of its users and creators rather than owners?

Answer: YLC: That’s a political question. I’m merely a scientist. For starters, we publish our research. Technological progress (not just AI) has a natural tendency to increase inequality. The way to prevent that from happening is through progressive fiscal policy. Sadly, in certain countries, people seem to elect leaders that enact the exact opposite policies. Blaming AI scientists for that would be a bit like blaming metallurgists or chemists for the high level of gun death in the US.

Just reading it again baffles me… Specifically the part about being “merely a scientist”.

On the one hand, I kinda get what he is saying. He is doing what he thinks is interesting and whether his work is used politically or not is not up to him.

On the other hand, when you are a Chief Scientist in one of the biggest companies on the planet with ~2 billion users you would expect to take some responsibility for your company since you expect that he will have at least some say in things Facebook is doing. Again, I don’t think he needs to be some sort of ethics crusader who goes around Facebook evangelizing, but at the same time what should people under him think? With this attitude why should anyone be surprised when Facebook runs experiments on people without their consent? They are merely scientists after all.

My take on it is fairly standard (or at least I thought so) and can be summarized in the following link - Manifesto for Data Practices as well as the attitude that GDPR introduces. There are many good points that this manifesto raises and I agree with all of them.

But words without actions are fairly meaningless, so I’ve decided to remove Google Analytics integration from this blog. Hopefully, it doesn’t sound too grandstandy, but frankly the main reason is that I’ve never even went to the console to check stats, so overall it is useless as is, so why even have it. Next on the chopping block is Disqus integration. I even have couple of links saved with possible alternatives so once I’ll manage to have some sort of comments section without too many hoops I’ll drop this integration as well.


There aren't any comments yet. Be the first to comment!

Leave a comment

Thank you

Your comment has been submitted and will be published once it has been approved.



Your post has not been submitted. Please return to the form and make sure that all fields are entered. Thank You!