| Welcome to Global Village Space

Friday, November 15, 2024

What is big data and why is there so much buzz around it?

In all my interactions with so-called ‘big data gurus’ and ‘data scientists’ in Pakistan, my one alarming observation was that they all proudly claim that data will show whatever they want it to show and do, writes Aneel Salman.

‘Policies in Pakistan are failing!’ This is a chant in almost every sphere of Pakistan whether public, private, NGOs or think tanks. The lack of solutions was earlier laid at the feet of ‘research’, that there simply was not enough of it and even if there was, no one within policy corridors was interested or had time to do it.

What is big data?

Over the past 2-3 years, a new buzzword, has entered the scene ‘big data’. Now, like the gold rush, every institution thinks the answer lies in it. No, not ordinary ‘data’ which the understaffed, under paid statisticians churn out at the Statistics Bureau and which no one (not even various government departments) ever trusts.

Algorithms require constant monitoring and tinkering to adapt to changing behaviors and trends.

Big data is extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions. The supposed idea is that with a large enough sample size, data points can show statistically significant correlations because at some level everything is related to everything else. Psychologist Paul Meehl famously called this the ‘Crud Factor’ leading us to believe there are real relationships in the data where in fact the linkage is or may be entirely trivial. But, for those within policy circles who already don’t keep abreast of anything except the next election cycle or are part of the groupthink yes-man or yes-woman mindset, throwing around words like big data or meta data is like impressing someone you want a favor from.

Read more: Zuckerberg denies knowing about data leakage on Facebook

In reality, all this big data is a pile of trash unless mined by geniuses. It’s a new phenomenon by data nerds to create and save their jobs and telling the computer to do the job by making things more complicated.

Companies like Cambridge Analytica amassed data on voters through the use of an app that collected data on approximately 50 million Facebook users, including 30 million psychographic profiles

According to Uta Frith, Developmental Psychologist at the University College London ‘data is often full of errors and the bigger the set the more errors there are likely to be. Interpretations can be distorted by random events. The idea that big data will enable more control of behavior may be a lot of hype.’

Is it accurate?

My main problem with data is its authenticity and veracity. Who does that? And then the bigger question, are they trustworthy and if so on what basis? Just because, something is coming out of the data set of a major financial or health institution, doesn’t mean it can be trusted. Because, lets face it, if the data is bad, the fancy machine learning tools being marketed and employed to analyze it will be utterly useless. In 2011, a blogger noticed that whenever actress Anne Hathaway was mentioned in the news, the stock prices of Warren Buffet’s Berkshire Hathaway (BRK.A) rose as well. The hypothesis was, apparently that the ‘automated, robotic trading programming was picking up the same chatter on the Internet about ‘Hathaway’ and applying it to the stock market.’ Co-relation is not causation: which is why the context is so critical.

Algorithms require constant monitoring and tinkering to adapt to changing behaviors and trends. In 2018, Strava’s online exercise-tracking map unwittingly revealed remote US military outposts and even the identities of soldiers based there. The situation shows how data collection can lead to unintended consequences.

Read more: Facebook sues South Korea data analytics firm

How big datasets are misused

Companies like Cambridge Analytica amassed data on voters through the use of an app that collected data on approximately 50 million Facebook users, including 30 million psychographic profiles. Political parties and governments continue to want access to social media intelligence and continue to develop profiles on us, to shape and manipulate us. There are data brokers who use this big data without our knowledge without any restraints and sell it to banks, insurance companies to earn big profits. The data which dummies are using to make decisions may not be organic nor real. In fact, most data on consumers is tailored and created by data brokers. In short, many decisions our policy makers are making using what they are calling big datasets is not authentic and organic, therefore, the policies and decisions being made are simply not sticking. The on-ground realities are different.

In all my interactions with so-called ‘big data gurus’ and ‘data scientists’ in Pakistan, my one alarming observation was that they all proudly claim that data will show whatever they want it to show and do

The real tragedy is that we, as citizens, have no choice but to go along with the so-called numbers being thrown at us, whether from the ministry of finance on inflation or commerce on trade or health on e.g. COVID-19. Data can so easily be non-simulated, made-up, fraudulent or for lack of a better word ‘synthetic’.

Read more: Australia consumer watchdog sues Google over use of personal data

Now, thanks to the magic of social media, organizations, parties, governments can literally create ‘following’ or ‘consent’ based on numbers on a screen or social media platform to convince us that what they are doing, saying and how they are doing and saying it is what’s the ‘truth’- is what WE as citizens have been wanting, demanding, craving all along.

Big data in Pakistan

In Pakistan, we like to follow the developed world fads whether they are useful for us or not. Since ‘big data’ is the new fad, let’s restructure our government departments and think tanks and collect trash data or rather put together by someone else, somewhere else about us or our region or our lives and the way we think, or should feel and what we want (or should want) and let’s find gold in it.

Governments have been collecting data on their citizens, listening to phone calls, tracking emails, but still terrorism and 5th, 6th (I lost count after the 4th) generation hybrid warfare takes place

Under the label of ‘big nudging (big data with nudging)’, our governments and organizations want to make sure that we do the things (or THINK the things) that it/they consider to be right, proper and here in Pakistan ‘patriotic’. Why have elections or referendums or do mass surveys/national debates when making constitutional changes and amendments like those in India or Pakistan, Singapore or even China for that matter? Why involve citizens in democratic processes when we, the citizens can now be governed by a data-empowered ‘wise king (or queen)’, who would be able to produce desired economic and social outcomes with a digital magic wand.

Read more: 115 Million Pakistani mobile users’ data is available for sale

In all my interactions with so-called ‘big data gurus’ and ‘data scientists’ in Pakistan, my one alarming observation was that they all proudly claim that data will show whatever they want it to show and do. In fact, they all firmly believe in the words of economist Ronald Coase, beat the data long enough and hard enough, and it will eventually confess anything.

American cultural imperialism 

Even the West is struggling with how to work with big data in an ethical and responsible way. GAFAs (Google, Apple, Facebook and Amazon) showed amazing data-driven success which has pushed states and companies to embrace the data-driven mindset frantically regardless of whether this approach is toxic or not. In reality, GAFAs is the new name of American cultural imperialism.

Walmart and Best Buy have one of the biggest databases, call it meta data but still their revenues are decreasing and the data analytics has not solved their problems. The irony is that one can have data without information, but thanks to the new reality we now live in, we have been forced to believe that we can’t have information without data. During the German swine flu epidemic in 2009, for example, everybody was encouraged to be vaccinated. However, we now know that a certain percentage of those who received the immunization were affected by narcolepsy.

Read more: US President Trump questions India coronavirus data

Governments have been collecting data on their citizens, listening to phone calls, tracking emails, but still terrorism and 5th, 6th (I lost count after the 4th) generation hybrid warfare takes place. When adequate transparency and democratic control are lacking, there can be ‘erosion of the system from the inside. Search algorithms and recommendation systems can be influenced.’

What about how big data can lead to the creation of echo chambers?  After all, if mass-scale manipulation is to stay unnoticed, ideas/opinions need to be sufficiently customized to each individual. So, in the end, all we get is our own opinions reflected back at us in our ‘echo chamber’. The latter causes social polarization resulting in the formation of separate groups that don’t understand each other and come increasingly in conflict with each other leading to extremism and discrimination (something we in Pakistan are all too familiar with).

Less is more

At my doctoral alma mater in the States, my Behavioral Economics Professor would always say that ‘less is more’ which meant that collect less but relevant data, talk to your citizens directly, observe and understand their behaviors, stay close to them and you will know when, why and where the problem is and even how best to solve it, rather than by looking for patterns in numbers which one can easily conjure out of thin air. Local, indigenous, grassroots knowledge and facts are important to reach good solutions and decisions. The great caliphs of Islam used to remain within proximity of their followers and were fully aware of their conditions – they did not need numbers on a screen or one-page policy memo to know the plight of the people they served.

There is no going back from big data and data analytics, but the real future lies in collective intelligence and socio-diversity, in crowdsourcing, online discussion, collaboration, deliberation platforms and citizen science, where ideas and innovation from real people and not bots or trolls or digital soldiers are considered, curated and constituted.

Aneel Salman is a Behavioral Economist based in Islamabad, Pakistan. He can be reached at aneelsalman@gmail.com.