Causality and Big Data

You see there is only one constant. One universal. It is the only real truth: Causality. Action, reaction. Cause and effect.

The Frenchman

America Online’s 2006 crisis and Edward Snowden’s disclosure got great amounts of public attention, criticizing surveillance parameters and the ethics of big data usage. In this article, I want to look beyond privacy and rather talk about institutionalization.

The first thing we should ask is fundamentally how does a search engine work? As any statistical correlation study, big data also starts with sampling but rather than random distribution it collects everything possible where sampling becomes equal to all of the data. Interpreters then make several correlation studies, but usually interpretation ends up with probabilistic causality (factors increase the probability of another). This is normal because as the numbers get bigger, distortion becomes less visible and lets us see things that we have never thought about. Though as we have ‘smoking causes cancer’ signs on cigarette packages, we also should have ‘big data does not mean causality’ because things are getting wrong without any regulation and we have only started to see its implications.

Causality is the relationship between one specific event with another and is generally used to express the cause and the effect. From a logical point it is defined as, “If x is a necessary cause of y, then the presence of y necessarily implies the presence of x. The presence of x, however, does not imply that y will occur.” In reality every cause is an effect of another, forming an infinite casual chain. Mathematical equations have the constant K as a hidden variable to represent the influence of freewill, therefore, we can disregard freewill to manipulate information. Consequently, choice becomes meaningless and people become passive actors in the world not changing anything. It seems awkward, but with big data correlations, we are facing it every day.

Internet neutrality is a principle that means service providers use all sorts of data in a similar way rather than discriminating any type. A neutrality clause in human rights also protects users from telecom companies that may try channeling the users forcefully to their own sites for economic benefits. Although digital freedoms are promising, we are still missing data and algorithm neutrality. Consequently, companies like Facebook tell us which friends we are more close to by looking to our “like” clicks. Is it a correlation or causality? What are the probable consequences?

An insurance company may fine you higher according to your unhealthy food Foursquare check-ins; a bank may give you a loan based on your Amazon consumption rate, and maybe a resume analyzer compares our wordings with successful candidates. There are many other examples out there and most of them are improving our lives in a positive way, but can they be perfectly knowledgeable about all factors involved? Or from another perspective, do we still have free will? Maybe not. A prayer gathering may be seen as a probabilistic terrorist event, or irregular social network behavior can signal a person’s criminal behavior. Don’t get me wrong, big data is not bad at all. It also shows other correlations related to the same event if the analyst is asking the right questions and properly regarding room for error in correlation. That’s why a person flying with an eyebrow-raising name (Bin Laden for instance) can enter to the US without any immigration problems.

We need to set rules and regulations to at least educate ourselves rather than inflict punishment on people unjustly because most of the time different types of information from different sources do not align perfectly. Otherwise, we could have disregarded Thomas Edison, Albert Einstein or Steve Jobs because of their childhood success signals. Were they just lucky? Luck is not a rational explanation, and it is more likely that we are missing some indicators from our analysis.

Moreover, can we say that certain groups of people owning power are exploiting the means of information? Probably not. After all, we do not know the causality of their cruel manner but there is one thing certain, we should find a way to keep neutrality. Viktor Mayer Schönberger and Kenneth Cukier’s book Big Data comes up with several solutions. They recommend usage of general topics rather than smaller relationships, accepting the messiness of data, phrasing the usage as correlation instead of causality, redefining the justice and freedom act, forming lawyers to fight data cases and lastly, reducing permanent storage memory.

At last, we are humans, not games defined by rules to be as predictable, as chess peons. We are clearly not looking for a dictatorship. Causality is an infinite chain, and YouTube will never know why I liked that video, so I just use data and not let data use us.

Erdem Tokmakoglu

Erdem has recently graduated from Yonsei University's master's degree program, majoring in International Finance (minor in Global Strategy). Currently he is assisting Temka Tour's entrepreneurial drive. He has job experience in various business fields. Besides he is co-authoring a book about new management paradigms based on adoption of technological advancements.

Leave a Reply

Your email address will not be published.

Previous Story

Decentralization of Production and Customization Culture

Next Story

All That Glitters Is Not Gold

Latest from Technology

Beacons Over Mars

Lets begin with a question: How is trade created? At first an intellectual being defines a