The Trafficking of Data
Data Cartels: Artificial Intelligence Information Heists I
As of late a quick scan of the headlines would suggest that it’s hard to publish anything with a hint of controversy. In the process of preparing an article on ‘Data Cartels’ with the hope of publishing it in HBR or some other reputable outlet, I have come to conclude that it might be best to let this article rest somewhere else. I would normally share my thoughts on SSRN or anonymously on twitter, but this time it proved difficult because the series Data Cartels: Artificial Intelligence Information Heists happened to be littered with sci-hub links and SSRN’s parent, Elsevier wouldn’t be too happy about that. The writing that follows is also too long for twitter, and too unpolished to be sculpted into anything approaching a book. Please excuse the incoherency that follows, hopefully it would present you with some food for thought. If it does and you have some additional questions, reach out to me on d.snow@live.com
Introduction
The legal trafficking of financial data has reached epic proportions. It is a silent storm brewing in the background behind mountains of legalese. Data is being shared, swapped, transformed, and sold to the highest bidder or forfeited to the highest authority. This phenomenon is spurred on by the renewed science of prediction. Everyone is at fault, companies, investors, ministries, policy groups, and even regulators. Decades of prediction folly has harmed analysts and pundits alike. They need a solution for their battered reputations, and they need it now. This has been especially rampant in finance and economics. Expert financial asset managers have made many foot faults with substandard performance across the board. It is known within the industry that funds – those very funds where you have your pension and your retirement put away – vastly underperform naïve investment strategies: like asking your kids what to invest in, or letting a chimp throw a dart at a list of securities. Before the deluge of data, managers sought out strategies to obscure the true performance of their funds. In the age of data, many have now come to believe that they might in fact be able to outperform a random benchmark and need not just rely on their sales and marketing teams. This whiff of proving your expertise is really a core driver in this movement. The biggest concern is that large, merged datasets from multiple sources gets released in one go, as opposed to the single leaked scandals of the past.
And so, it starts that investors seek to squeeze out every ounce of profit in their allocation and trading strategies. They are buying and merging terabytes of data and throwing it at the latest open source machine learning model, hoping that it will stick. Finance, being an adaptive market does not always allow for these models to stick, and when they do, they don’t stick for long. So inevitably the whole enterprise collapses, some fail, some succeed, those that fail blame the model, those that succeed take the praise and build a monopoly. The only real loser is the individual whose data has been sold and scrutinised for their consumption habits. If only that was true, then we could deal with it outright with anonymisation and de-identification policies; it unfortunately is not, and this problem is bigger than what one might think at first. The first reason to be bleak is that financial data is extremely cheap. Not for everyone of course, but for the big players. These companies collectively have trillions of dollars under management and have an open cheque book when it comes to new alternative data sources. Whereas the brokers of financial data have many customers but are also increasingly competing with many other data providers. The market is progressively turning into a space where everyone is trafficking the same salt, and soon enough the salt would become cheap. Herein lies the problem, something that becomes cheap but are fundamentally destabilising could be a real problem for the functioning of civilisation. If all it took was a microwave to develop plutonium, the world would be out of luck[1].
It is not just investors that are at fault, the push for sensitive financial data has come from various policy initiatives and ministries with the hope to nowcast the economy or human behaviours. What we have once reprimanded governments for have become openly shared secrets among private companies and policy institutes. Policy institute are promoting ‘’national safety’’ but it is not unlike the 2000s when governments were acting under the guise of national security. The third push comes from large corporations who seek to understand user behaviour and characteristics like wealth and expendable income. Some might find fault in the fact that I have not called out corporations before state bodies, the reason why is simply because governments have access to or the ability to obtain access to far more sensitive data, and therefore warrants at least a similar level of scrutiny. in a time where practising professionals enthusiastically proclaim that "all data is credit data"[2] Nobody really know where to draw the line anymore, is it insider trading if you have exclusive access to a dataset, is it personal data if it has been anonymised.
[1] Even regulation to make the data more expensive can be useful. [2] https://conifer.rhizome.org/snowde/the-finance-parlour/20201104095616/https:/archive.nytimes.com/query.nytimes.com/gst/fullpage-9A0CE7DD153CF936A15750C0A9649D8B63.html