The Competition Landscape and Behavioural Consequences
Data Cartels: Artificial Intelligence Information Heists X
Future monopolies will primarily be close to an automatable user-activity interface and secondly possess the right to generate and access the data associated with that user-activity interface. The democratisation of software and forthcoming data protection laws leaves the benefit to companies that operate closer to the hardware (i.e., user), all else equal. For example, instead of Uber, the future would be Tesla or a Tesla equivalent. There is not a lot stopping a famous automaker from usurping Uber’s business model. Uber can get ‘closer’ to the customer to obtain more data, but that would amount to acquiring their own cars and removing data privacy constraints. Can Facebook get closer to the consumer for more data, yes, they can, but that might require the development of products to allow them to get closer to the user, e.g., a mobile phone or an operating system? For the most part, physical device companies can vertically integrate because they have access to a wider range and a greater quantity of data to better tailor user experience.
In 2018, due to the lack of physical devices or operating systems, Facebook secretly paid teens to install a Facebook research VPN to suck all the user’s web activity, similar to what they did with Onavo Protect app that was banned four months before. It’s literally all Onavo backend with a different UI. Any company can do this, and boy do they do it. Google has done something similar to Screenwise Meter for users as young as 13 allowing them to earn gift cards. In all fairness, Google took it to the next level; it was even sending listening devices to your house to listen to what you were watching or listening to and a router to intercept your entire household’s internet usage. Usually, these companies go through Nielson who might perpetuate the collecting, but they were brazen enough to do it themselves.
Facebook has an abysmal track record in their search to obtain granular user-level behavioural data for ‘user safety’ or ‘product improvements’. Facebook’s Onavo VPN service was used to gather data on how much time users spent on mobile apps, how much data they use, information on the device type, and country information. Facebook understands the importance of having device-level data, and they created the HTC First Facebook Phone as one possible solution, it unfortunately flopped. Amazon also understands the importance of device-level data and made the Fire Phone, which also failed. Only Google has partly succeeded with its Pixel, probably due to already having useful user-experience data from their Android operating system. More recently, every company worth its salt are branching out into physical devices; one good example is the fight for the smart home industry. All these companies aim to get closer to the user; closer to granular behavioural-type data.
Software companies have to find ways to obtain device-level data, which normally involves all sorts of trickery. Recently Uber has started video recording several rides to capture more objective data about what happens inside vehicles under the guise of user safety. This is reminiscent of companies using two-factor authentications simply to obtain your mobile data. Uber collaborates with Nauto, a company that uses machine learning to analyse video from vehicles—before that Uber also allowed riders or drivers to record audio and send it to them. Even earlier in 2016, it was shown that Uber tracked customers’ location after rides have ended. Some riders reported these trackers lasting days to weeks. The company then also vaguely claimed this was to protect customer safety. This form of data collection goes far beyond application level meta-data and masks the company’s future ambitions. Uber wants device-level data that can typically only be obtained from the phone manufacturer, operating system, and the vehicles themselves with the necessary data consent. Because they do not have those privileges, they must create scenarios or hack their way towards those privileges.
In recent years, we have seen the technology and finance sector vie for the same employees and the same computational resources. In many respects, the finance sector is now an echo of the technology sector. This could raise questions to the extent that finance is a supportive function that is meant to enable technological productivity. In contrast, the current behaviour of financial firms seems to highlight a heightened model of profiteering and capital capture. Like technology companies capturing your behaviour and information exchange, finance companies are capturing your transactional information.
Talent acquisition and retention can also be problematic for those that do not adopt alternative data techniques. Interesting datasets are needed for employee retention; the screen adaptions for Jordan Belfort, Patrick Bateman, and Gordon Geckos’ stories might not be enough to keep people in finance, especially now that big tech salaries have caught up. Data scientists want to explore new ideas and are attracted by the cutting-edge use cases that quantitative hedge offer above traditional portfolio managers’ daily mundanities. The previous generation’s greatest minds went into finance to learn how to make more money from money. They are actively competing against the other half of the greatest minds who are trying to entice you into clicking on more ads.
The importance of data in finance has become especially prominent at the turn of the century. Renaissance Technologies’ previous CEO Robert Mercer says that “There’s no data like more data” and the ZestFinance CEO Douglas Merrill proclaims that ‘’All data is credit data.’’ Some are even talking about pre-emptive data acquisition, hoping that one day it will provide one with the right answers and the right questions. This is even better captured by a description of the world’s most successful investment fund in the early 2000s. ‘’Soon, researchers were tracking newspaper and newswire stories, internet posts, and more obscure data—such as offshore insurance claims—racing to get their hands on pretty much any information that could be quantified and scrutinised for its predictive value. [Renaissance’s] Medallion fund became something of a data sponge, soaking up a terabyte, or one trillion bytes, of information annually, buying expensive disk drives and processors to digest, store, and analyse it all, looking for reliable patterns.’’
‘’Profits were piling up as Renaissance began digesting new kinds of information. The team collected every trade order, including those that hadn’t been completed, along with annual and quarterly earnings reports, records of stock trades by corporate executives, government reports, and economic predictions.’’ At Renaissance, the difference between profit and data didn’t draw much distinction. More data entails more trading signals, entails more profit.
Suppose Renaissance had all these profitable opportunities by assembling data. Why do large companies like Google who have access to so much data not participate in the spoils of algorithmic trading? A big reason is that what they could have made secret has already been made public, Google Search trends are public, Amazon’s orders can be proxied by reviews, and are made public, LinkedIn’s data can be scrapped. Furthermore, the data that has not been made public have strict legal consequences; Google can’t just access the emails of multinationals that use their free email service. Sergey Brin Google’s Co-founder once suggested that Google should start a hedge fund because it has so much data, but Eric Schmidt the CEO at the time said off all Sergey’s ideas this one was the worst since the company could face serious legal problems by starting a hedge-fund. It instead seems that companies that collect primary data would rather package and sell it to hedge-funds instead of using it themselves, like Visa. The reason some parties would sell data as opposed to use it, is because they don’t have the infrastructure to use it. Using a real estate analogy, you picked up a property for a low price, but you don’t have expertise in selling and auctioneering.
Let’s now look at the state of the playing field, many open-source algorithms are close to the industry state-of-the-art, data cleaning and engineering expertise can be fairly easily obtained from freelancer websites, and computing power although important, have become considerably cheaper and within a few decades would become a minor constraint, in the same way, that storage capacity is a minor concern today. However, domain knowledge in the form of assembling signals from cleaned data and bringing open-source algorithms to life by collecting new data sources would be in short supply as it is now. So, in the medium run, talent, computing power, and datasets would be the primary inputs to a successful finance business, and unfortunately, two of these concepts are a direct function of size. It would be one of the first times that it is not just the network-effect of a brand-name but also the compounding effect of the underlying technologies would promote concentration effects leading to even larger companies.
Another question would be the patentability of machine learning technologies. In 2018, the courts heard an argument from iSentium that Bloomberg has incorporated sentiment on social media after a falling out with iSentium and incorporated it into their terminal. It was held that the patent is invalid because it is directed to the abstract idea of interpreting a written statement on social media, they argue that selecting information, analysing it “with mathematical techniques and reporting the results is an abstract idea that is not eligible for patent protection”. There are a few similar cases like this where “[t]he patent examiner rejected the application on the grounds that the invention is not implemented on a specific apparatus, merely manipulates an abstract idea, and solves a purely mathematical problem”.
The competition for creativity could lead to underqualified employees leveraging extremely powerful datasets. If employees use a model without understanding how the model’s quality interacts with fairness, interpretability, robustness, privacy, and causal factors, they would fail in developing sustainable long-term solutions. The second constraint is whether a good data scientist with the appropriate skills to assess the above criteria also has some level of domain expertise. As an example, an AI company that developed a natural disaster simulation system acknowledged that they had misjudged the risks of many commercial areas because damage calculations relied largely on residential census data. In fact, the start-up did not have a single person with existing disaster management solutions while still selling a product other people relied on.
The competition over computing power could also lead to the use of more obscure models and further compressed data as signals due to processing efficiency. The model’s predictions will become more obscure. The code could also be rewritten in pre-compiled C++ code as opposed to more interpretable Python leading to an opaquer code infrastructure.
More than any other quote, the following indeed shows just how unimportant the machine learning algorithm has become, whereby a quantitative analyst says that only edge comes from adjusting hyperparameters (something that itself can be automated). “Even though most machine-learning algorithms are open source, it is possible to keep a technological edge thanks to fine-tuning of hyperparameters for specific tasks and agile management of machine-learning pipelines to continuously incorporate technological improvement,” Nicolas Jamet, a senior quantitative analyst at RAM said in the report.
The competition for data and expertise are happening in many forms in the financial industry. In hedge funds (HFs), there might be a quiet revolution brewing in the background as Sovereign wealth funds (SWFs) have started pulling the plug on hedge funds. Historically, SWFs have relied on HFs to manage a significant share of their investment portfolio. For years SWFs have asked for better standard in the HF industry, especially after the 2008 crisis during which institutional investors were unable to get their money out of the HF, and for a better fee model.
SWFs have decided to bring back their wealth management and bring expertise in-house by tapping into the best academic talents available. The world’s long-term asset owners manage around $100 trillion in investable capital, that is your public pension funds, sovereign wealth funds, endowments, and foundations.
I am not saying sovereign wealth funds are good, as a large concentration of wealth could be used to easily damage other countries and private institutions. And these funds can be mismanaged when they are meant for the public good, such as the recent case in Malaysia. As we all can attest, real-life has eclipsed fiction, and we do all find our self in a sort of post-fiction blues, and that holds true for the Malaysian sovereign wealth fund. The Malaysian SWF used money-laundering and fraud to help produce the Wolf of Wall Street, a film about fraud and scams. Sometimes it feels that the only difference between reality and fiction is who can come up with it first.
And for the most part, the well-run SWFs are right, these hedge funds are essentially giant marketing firms that absolves pension funds from doing due diligence, and they also know how to skirt their way around tax bodies, and certainly know work their way around lawyers and auditors, while relying on generating supposed cross-sectional ‘alpha’. AGQIX (Global Equity Fund), one of the world’s largest quantitative hedge funds’ oldest funds, has underperformed against MSCI World bench (5.45% vs 6.30%) since inception from 2010. QMNIX (Market-Neutral), has underperformed bench (3M T-bills) since 2014 inception (-1.16% vs 1.00%). These are not just cherry-picked examples; the average underperformance of funds is true across the board.
SWFs depend on other parties for the lion’s share of their access to technology, and as a result, incur hefty costs that compound over time. Hence, Investors end up subsidising other parties’ technology advantages. That doesn’t mean that you have to create your own ‘cloud’ but for the sake of extending welfare to portfolio managers, run your own factor model, I am sure I can do it in four lines of code. “Investors typically spend just 1 or 2 basis points (0.01–0.02%) of the total assets in their care on technology, data, research, innovation, and related efforts. Contrast that with the 50 or more (sometimes way more) basis points routinely spent on fees to external asset managers. But is it sane to think that external managers are doing so 20, 50, or even 100 times better than Investors could do themselves?” I doubt it, so it only makes sense that they should stop subsidising others’ technology (and cough-cough, data acquisition). If SWFs spent a mere “ten basis points of their total assets under management on technology and innovation, then they’d surpass the annual technology and innovation budgets for Apple, Amazon, Facebook, Google, Microsoft, and IBM . . . COMBINED”
 The Man who Solved the Market, Chapter 12