Machine Learning & Quant Finance

Share this post

Expansionary Effects of Competitive Data

blog.ml-quant.com

Expansionary Effects of Competitive Data

Data Cartels: Artificial Intelligence Information Heists IV

DS
Jan 5, 2021
Share this post

Expansionary Effects of Competitive Data

blog.ml-quant.com

The demand for alternative data is inversely correlated with hedge fund performance. Hinesh Kalian the director of data science at the Man Group said that since the pandemic ‘’the demand has skyrocketed”. Hedge Funds would be quick to mention that the demand is correlated with volatility. Either way there is a feeling that something has been fundamentally misunderstood and that more data might rectify the problem.  On Hinesh’s LinkedIn byline reads ‘’Do you have a dataset you want to monetize and you think we should consider? We want to hear from you! Reach me on unspecified@man.com’’, Alternative data is combined with fundamental datasets and used in the traditional setting, but also used in a new setting with now-casting and real-time prediction. Every two-letter acronym IB, WM, PE, S&T, HF, and ER, benefits from data.

Interestingly enough, this alternative data method caused increased volatility in retailers’ stocks, the most affected sector. As a matter of fact, credit card data has often helped insiders to correctly forecast earnings. However, in the cases of credit card data pointing at the wrong figures, the market has often witnessed wider-than-normal swings during the day of earnings, as institutional investors re-position themselves. Alternative data could put the whole industry into adverse feedback effect. There is no guarantee that customers will have access to, or be able to share in the value of, the AI-driven insights from their data, leading to greater information asymmetry and complexity to the advantage of industry over customers. Data- and AI-driven markets have tended towards platform monopolies, which reduce competition. Furthermore, existing digital platforms that exhibit monopolistic behaviours, such as Facebook. Regulators of financial markets do not have the skills to understand AI either – they have economic, not technological, expertise.

The demand increases because of the anecdotes that make one feel like you are missing out. Michael Spellacy, Accenture’s capital markets lead, said that hedge funds benefited from comparing social media posts with official government statements to gauge the virus’s impact, as well as collect data on the movement of Chinese container ships[1]. BNP Paribas Asset Management now spends about 10 per cent of its market data budget on alternative sources, up from an “insignificant” amount five years ago, according to chief executive Frédéric Janbon.

Traditionally data is sourced directly from the company and the exchange for fundamental analysis, one might consider attending investor calls, subscribing to a market feed, pursuing company filings, and deriving ratios. Alternative data is agnostic to the source and driven by the amount of signal (i.e. money making potential) embedded in the data. The sources that have become popular includes web traffic, search trends, social media posts, transaction data, news feeds, emails, location logs, satellite imagery, logistics data. This only scratches the surface, the point to make here is that it can be any dataset, even job listing, executive jet records etc. It is not true that alternative data is new in any sense, but it is becoming more useful due to the ability of modern models to better capture unstructured data and find patterns, than hand-engineered rule-based methods.

The type and quantity of data needed differs significantly depending on the trading strategy that asset managers pursue. Fundamental investment strategies might be more of a ‘small data’ problem rather than a big data problem. The amount of data trafficking could theoretically be kept to a minimum, e.g., if you want to investigate distressed debt investment opportunities, you might scour the internet to learn more about the executives, try and find some paper-work, understand the credit negotiations, judge rulings, and other legal manoeuvrings. In such a scenario the privacy seems contained to the company itself. However, that is a ruse, these formerly Fundamental shops have now turned into Quantamental shops where they can learn from large datasets to guide their fundamental premonitions. There are large databases of directors that have among other attributes like predicted IQ, type five personality weights, the number of dependents, college entry exam score, and employee datasets have also become popular e.g., where did they study, what grade did they obtain, what assets do they own.

Previously equity researchers or analysts of pharmaceutical companies might have paid attention to acquisition activity by subscribing to expensive financial news feeds, but now they predict acquisition activity and success way in advance. In the past, news reports or company disclosures were followed closely so that your firm could be the first to act on the announcement of a new drug or breakthrough. Analysts had to wait for the approval of a drug, but now they can predict the likelihood of approval by analysing the Food and Drug Administration’s commissioner historical record by sifting through scours of data. It doesn’t stop there; the dataset could be multifaceted thanks to the ability of machine learning models to compute a probability from multiple features. Robots or web scrapers could sit on job boards and constantly refresh until to uncover new titbits of information. This is not science fiction, in 2018, the shares in a small drug cancer company called Geron Corporation spiked 25% after the parent company Johnson & Johnson posted a job listing referring to the fact that a key regulatory decision is imminent.

Every investment vertical would have its own data staples, be it health care, information technology, real estate, energy, hospitality, or utilities. Professional career sites like LinkedIn can be scrapped to assess employee growth. The possibilities here are endless, imagine you can get hold of the orders being sent to offices, maybe bouquet of flowers and champagne both of which could be indicative of a good financial year, maybe late-night pizza or take-in deliveries made from the audit partners credit cards at Deloitte or EY, could signal botched accounts or suspicious activity from a publicly traded corporation. After you set the hypothesis, it is up to you to find the avenue to collect the data, you can use a human spotter, you can get an employee at Dominos, you can plug into Domino’s transaction data centre for a price. This data could pose national security risks, what if late night pizza order from the Pentagon is 40% correlated with foreign military intervention. The problem is company’s selling data do not know what their data is worth until it is in the hands of someone that can utilise it for some ulterior purpose.

I feel the need to reemphasis, not all alternative data is bad. Recent examples include satellite imagery to reveal the shadow-length to track the progress of construction projects or even the shadows cast onto the light of oil tanks to measure the true global supply of oil. Yes, this would give traders an advantage, but this should also be information that is publicly known. Instead of flying analyst to a construction project to measure the progress of a building, I can rely on satellite imagery. It has also been used by Google to help prevent illegal fishing[2]. However, this exact same satellite contractor can be used to spy on competitor companies or foreign powers. Like most things in life it has a good and a bad side, what we want to know is if the balance of good to bad could be justified. Alternative data could also be internal data[3]. As mentioned before, asset allocators, or buy-side analysts need data, while mostly relying on external data acquisition. Sell-side firms also have the incentive to collect new datasets, and they are able to collect a lot of data from their client’s internal transaction history. The question is whether they should be allowed to use this data to, for example, adjust their customers margin requirements by using some behavioural scoring model. Should internal data use be regulated or should only the acquisition of data be regulated.  As there is a shift from discretionary to systematic investment as a result of a wider range of data, then the systemic risk of dropping the price of data and making it widely available could have its own risks. If some alternative data is outlawed its “black market value” increases, and you would expect a higher potential alpha in accessing that data.

Firms like HiQ scrapes LinkedIn data and other publicly available data for business intelligence. LinkedIn did not like it as they felt it breached not only their terms of service, but also the Computer Fraud and Abuse Act and a few other laws. The Ninth Circuit affirmed that any data that required no authorization to access and was freely available by default was fair game for scraping, making web-scrapers around the world rejoice. LinkedIn appears to have interpreted the court’s ruling as meaning that any and all data that requires a login is private and LinkedIn can revoke access to it. As a result, LinkedIn is now requiring users to login before being able to browse the platform.

[1] https://www.winston-fox.co.uk/news-and-updates/news-detail/hedge-funds-altdata-appetite-exploding/2020/08/04/hedge-funds-altdata-appetite-exploding

[2] https://www.forbes.com/sites/bernardmarr/2018/04/09/the-amazing-ways-google-uses-artificial-intelligence-and-satellite-data-to-prevent-illegal-fishing/#5fcfab271c14

[3] Alternative investment and alternative internal data.

Share this post

Expansionary Effects of Competitive Data

blog.ml-quant.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Derek Snow
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing