Legal Ramifications of Alternative Data
Data Cartels: Artificial Intelligence Information Heists III
The legal risk for alternative data stems from competition, privacy, insider training, and discrimination laws. When user data is applied to some form of financial modelling, it is typically more intrusive and could be subject to regulatory overview. This could pertain to large swaths of financial data like credit card transactions or aggregated trade flows. Finance firms have, for example, recently started showing an interest in biometric and geolocation data, all of which could increase imminent regulatory action.
The FCO in a landmark decision in February 2019 said that the manner and extent to which data is collected could violate, not only data protection rules (e.g., GDPR), but also be an abuse of market power. This was aimed towards Facebook that collects its own data as well as acquire data through third-party sources. In 2020, Germany’s Federal Court of Justice agreed with the FCO’s decision prohibiting Facebook from merging user data without specific as opposed to general consent, referencing the potential adverse competitive effects of such action. The CMA has recently made movements on this front too, leading to the establishment of a `Digital Markets Unit’ to tackle market power.
In the U.S., the FTC and more than 40 states have sued Facebook based on them illegally buying up companies to stifle competition, without inorporating the privacy element. Whereas in Europe, there is seemingly some convergence between competition and privacy laws. This probably comes down to the U.S. not having adopted any substantial federal data privacy laws as have been the case with GDPR in Europe. It would mean that large financial companies that do business in Europe and rely on the collection and merging user-data should prepare for legislative developments in what seems to be a fast-evolving area of law.
Privacy regulation can be found in a morass of different laws; you have the Gramm-Leach-Bliley Act (GLBA); you have the new CCPA; you have the E.U.’s General Data Protection Regulation (GDPR). So far, these rules have largely created mechanisms to prevent customers data from being collected and used without consent. The applies not just to PII data in the U.S., or the broader personal data in the E.U., but also to data that can be backed into person data, i.e., data at risk of re-identification, and de-anonymisation. The data collected and distributed by alternative data providers are often de-identified and aggregated. Yet, one might still be able to identify users within a specific ZIP code region for certain types of purchases, as was shown with credit card data. Brokers and portfolio managers have to work closely with data auditors and compliance departments to validate not only the source of the data but also the quality of the applied de-identification and anonymisation techniques. There might soon be a need for dedicated data departments within financial institutions that can confirm data providence and data privacy.
With the recent advance of alternative data, there is a call to understand the extent to which alternative data could be considered insider information. Matthew Beddall, the CEO of investment manager Havelock London said that “Alt data at its worst could be a way of legalising insider information and regulation has lagged the trade in this type of data.” There is also an incentive to obtain more granular data, because aggregated data has less alpha potential. There is also evidence that alternative data could increase the information asymmetry between sophisticated investors.
Although alternative data might not directly stem from corporate insiders, the information is such that it sometimes exceeds that which management knows internally, especially when data has been gathered across social media databases, email databases, and transaction records, as it pertains to the company’s clients, suppliers, management, and employees. For example, VISA, MC and AMEX are known to sell ‘anonymised’ transaction data, discount brokerages like Robinhood are known to sell flow data, and providers like Return Path are said to sell personal email data covering approximately 70% of the worldwide email accounts.
Therefore, beyond the competition and privacy implications of alternative data, trading firms also have to assess whether alternative data falls within the scope of insider information. In contrast to the U.S., the U.K. and E.U. only considers two attributes for insider trading, and that is whether the information is material and non-public. The U.S. additionally requires a breach of duty or some form of misappropriation. All jurisdictions would also like to add the fact that if an announcement is routinely expected to move the market, then this information is also ‘inside’ even if the information were publicly sourced from a mosaic of facts.
This principle is reflected in a case where a WSJ journalist leaked information about a pre-published article to his roommate. The Court of Appeals for the Second Circuit said that he misappropriated the information that belonged to his employer, the court then speculated that it would not have been misappropriation had the Journal itself traded the information since it legally belonged to them. In the age of alternative data, it would be interesting to know what the courts think about the WSJ prepackaging their news articles and selling it to hedge-funds before they give access to the general public. At first, glance, pushing beyond the tastefulness of such an action, it would seem to be okay if done in a commercially agreeable manner, as there would be no breach in duty.
However, this might not necessarily be true at the state level, nor in the UK and the EU. The closest a company has come to prosecution in the US was under ex-Attorney General of New York, Eric Schneiderman who pressured Reuters under New York’s Martin act, to stop providing exclusive content to its premium subscribers before the general public. Other outlets participating in these tactics also caved to the pressure. It is unlikely that this would amount to insider trading in the US because it doesn’t involve non-public information in a strict sense, nor is there any misappropriation or breach of duty. Even though it it is not insider trading, New York’s Martin act does not allow for this form of exclusive content.
In saying that, the growth in alternative data could pose some risks around misappropriation at the federal level. In the Second Circuit’s 2009 decision in SEC v. Dorozhko, the defendant hacked into a healthcare company to gather financial data to inform his trading decisions. In this case, the data was not part of a legitimate commercial relationship between a buyer and seller, and it was deemed misappropriated. This court decision could have implications for web-scraping (a method to collect data online). It should be noted that the courts mostly feel that public websites are legally scrapable regardless of the warnings posted on the website’s terms of service, they believe that scraping public websites is not in breach with the Computer Fraud and Abuse Act (CFAA).
The Ninth Circuit affirmed that any data that required no authorisation to access and is freely available is fair game for scraping, making web-scrapers worldwide rejoice. LinkedIn who brought the original suit appears to have interpreted the court’s ruling as meaning that any and all data that is behind a registration wall is private and not scrapable. As a result, LinkedIn now requires users to login before they can browse the platform. Social media companies like LinkedIn would prefer to sell the data themselves, they have good data points on employee growth, expert networks, and usage patterns, all of which could be very valuable to other parties.
This case is not entirely resolved though, the reason being that the First Circuit court has said in an earlier judgement that the CFAA can indeed be used to go after data scrapers. For that reason, we might still see a Supreme Court case to settle the conflicting opinions. The Supreme Court has in fact stated that they are interested in hearing such a case. Claimants also realistically have more than just the CFAA at there disposal, they could also make a claim on the grounds of copyright infringement, common law misappropriation, or violation of FTC Section 5, however, these might be blunt instruments compared to being tried under the CFAA.
So, what might happen if you are scraping data behind a registration wall without express permission? The second court’s opinion in SEC v. Dorozhko, is that if a user registers for an account and later breaches the terms of service based on the false pretence that you would comply, it could meet the Second Circuit’s standard for “deception” under Section 10(b) and be a form of misappropriation thus leading to insider trading.
Another form of this type of affirmative misrepresentation could be a scraper that takes affirmative steps to evade scrutiny by a website operator by rotating I.P. addresses, the web-scraper can additionally introduce lags between requests in a random fashion to resemble a real user as a further evasion and misrepresentation tactic. The closest case resembling this description was brought by Craiglist where the courts agreed with Craigslist that an I.P. block is sufficient notice for access revocation under the CFAA even where the data is otherwise publicly available. Therefore, click-through agreements represent higher levels of legal risk and ignoring cease and desist requests or access revocations similarly pose a high risk – even when the data would otherwise be public to others. So, in general scraping publicly available data seems fair game for now as long as you don’t also violate copyright laws unless of course the scraping is done behind a registration wall, in which case it becomes a grey area.
Web-scraping primarily deals with data that is public or ‘almost’ public data; what if the alternative was not publicly accessible? In the Capital One case, two rogue analysts obtained material non-public information by analysing credit card transactions, and they obtained access without the consent of the data owners. As a result, they misappropriated the data, it was insider trading, and they were forced to pay $18mn in disgorgement and penalties. Another question might arise, what if Capital One agreed for the transaction data to be sold, and they indeed obtained user permission? In this scenario, if the data is sold into the open market, it will be deemed public. However, if it was known that the publication of this data is routinely expected to move the market, then it will be deemed ‘inside’, in which case, if user-permission for the collection or distribution had not been granted, it could be considered insider trading under the misappropriation theory and meet the Second Circuit’s standard for “deception” under Section 10(b). In the same vein, if it can be shown that alternative data has not been obtained legally, it could be classified as insider information under the misappropriation rule. So obtain your data legally! And make sure you have user permission!
In the U.K. and E.U., a breach of fiduciary duty or misappropriation is not even needed, so data acquirers have to be especiallly careful. This is not just true in theory, but has been evidenced in the past , e.g., Greenlight Capital was found to breach U.K. insider trading rules and was fined more than £7mn in 2012 by the U.K.’s Financial Services Authority when it was believed to be in line with U.S. standards. It, therefore, becomes essential to assess the meaning of non-public. As a result of this lower threshold, it is believed that the U.K., will bring the first case against alternative data providers and buyers because the decision does not hinge on whether or not the information was acquired in good faith, but simply on whether the information is non-public and material.
At first glance, alternative data appears non-public and material because fund managers would not pay exorbitant amounts of money for immaterial public information. However, the benefit of some dataset might be in their collection and processing, and the materiality might only be known once the data is in use. It is expected that a future case on alternative data for insider trading will create some bright-line test. Although exclusive content or alternative data could be obtained from publicly available sources, this data might require an inordinate amount of processing capacity that is only available to large market participants, or the data could be `publicly’ accessible but require a large fee for access, or the data is publicly `available’ but only for exclusive purchase; leading to the data being deemed non-public and inside and the practice being considered uncompetitive.
It is possible that in the future exclusive datasets would not be allowed in the UK. We can speculate about the future rules, such as making it a requirement to give all market participants access to the data at the same price as others. As reported in the financial times, several large hedge funds such as Man Group and AQR Capital Management say exclusive datasets are not worth the expense or legal risk. “exclusive datasets are a double-edged sword”. Future cases might touch on this topic of ‘expensive’ and ‘exclusive’ datasets more.
If, as a result, funds can’t buy exclusive datasets, it might lead to the internalisation of data expertise or the acquisition of data vendors and possibly even customer-facing service providers to obtain location and transactional data. However, as seen before, internalising and merging user-data have recently gained scrutiny from the FCO and others under both competition and privacy law. This leaves fund managers in a precarious position, the powers that be are not just interested in stopping exclusive data deals, but also the acquisitions of companies that would otherwise have a similar effect.
At this point, not much has been said about discrimination laws, and the ethics of alternative data use. If alternative data is used to provide financial advice to customers or to model credit lending decisions on, it could have especially damning discriminatory outcomes. A historical example would be U.S. credit bureaus that in the 1960s assessed creditworthiness using among others, attributes to measure “poorly kept yards” and “effeminate gestures” as suggested by circulating reports at the time. It is well known that data has been driving a lot of credit scoring decision making. This is in spite of the fact that already in 2004, the National Association of State Public Interest Research Groups (SPIRG) found that a whopping 80% of credit reports had errors in them. And is especially concerning in an era where CEOs of credit lending firms proclaim that “all data is credit data.’’ This won’t last another decade, instead I forsee data becoming a sin-word in finance.
 Note there is an appeals proceeding that is still pending on this decision that will likely be heard in 2021.}.
 In the EU the relevant body of code is the Market Abuse Regulation (MAR) and Markets in Financial Instruments Directive (MiFID).
 New York’s Martin Act, for providing early access to potentially market-moving information. https://dealbook.nytimes.com/2013/07/07/thomson-reuters-to-suspend-early-peeks-at-key-index/
 See EF Cultural Travel BV v. Zefer Corp., 318 F.3d 58, 63 (1st Cir. 2003)
 The Financial Services Authority, Decision Notice to David Einhorn, Jan. 12, 2012, http://www.fsa.gov.uk/static/pubs/decisions/dn-einhorn-greenlight.pdf