Big Data. Magic Data.

I was listening to Steve Levitt’s People I Mostly Admire Podcast. The “I’m Not as Childlike as I’d Like to Be” episode which covers a lot of ground. About 30 minutes in there is a segment on big data. One of the key messages from Steve and Steve is that people believe that big data is a substitute for good thinking. As opposed to a complement to it. In my experience this is too often the case. But why? Maybe it is the titles/classifications we use when discussing the subject. “Big data”, “machine learning”, and “artificial intelligence” all have implicit assumptions built into their names. Maybe if we focused on “data science” we would not as easily be led astray. Why? I’ll steal a story.

There is a great article, “Machine Bias”, published in Propublica that covers big data “failures” in the US justice system. Like all overloaded systems, the justice system is turning to automation to reduce the backlog and to remove the human biases from the decisions. It succeeded in the efficiency goal, but failed miserably at removing human biases. It instead consistently applied the biases present in the historical “human” generated data, adversely impacting the fate of thousands of individuals. I wonder how the developer’s of these systems introduce themselves at parties? Do they say they are developing the next generation of artificial intelligence, or that they are a data scientist?

To have successful outcomes, we need the science. Without it, data is just data, algorithms are just algorithms, and machines are just machines. If we are scientists, we continually test the theories, improve the quality of the data we collect, and the algorithms we use to analyze it. We know this works. Facebook, Amazon, Microsoft, Alibaba, Tencent, and their cohorts have all become increasingly skilled at developing systems that keep our attention where they want it. They have achieved this by hiring skilled data scientist and giving them access to datasets, data generators (us) and allowing them to continually run experiments and adapt based on the results. We need this same level of science applied to the systems we rely on to support our more consequential decisions.

Some questions to consider when exploring further:

Can a machine be biased?
Who sees me as a data generator? How and why? Do I care?
Are we collecting data that can be used to improve our performance?
Does someone else have data that can help me?

To explore further:

Tristan Harris, just Google him!
Yuval Noah Hari – read his books. On this subject see Homo Deus. Google him as well.
- See his Wired Magazine interview video with Tristan Harris.
The book Dataclysm, Christian Rudar
Netflix – watch the social network