The Disappearance of Corporate Socialism
Data Cartels: Artificial Intelligence Information Heists VII
The basic theory is that the model can always be improved with additional quantity of data, and additional variables, i.e. both in data length and width. What is not always said is that models can be too good for the public’s own good. After OpenAI developed their GPT-2 language model they decided not to make the large model available for fear that it would be used for producing fake-news and social media ‘’bots’’. They backtracked after the release of GPT-3 and made it publicly available. This model is clearly too good to be out there in the wild and the disadvantages are likely to outweigh any advantages. This action falls in line with OpenAIs starting mantra to ‘’democratise artificial intelligence.’’ Elon Musk have before been prodded by journalist, saying that he would rather have everyone have access to dangerous AI, as opposed to it being concentrated in the hands of only some individuals.
The developer of the popular object detection algorithm YOLO stopped his research into computer vision to avoid enabling the potential misuse of the technology, citing among others military applications and privacy concerns. An excerpt from the developer follows:
But maybe a better question is: “What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook. I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to.... wait, you’re saying that’s exactly what it will be used for??
Models that are user-facing can also be so good that they produce the ‘’uncanny valley’’ effect. That is the negative feeling that occurs when you are observing or interacting with a robot that resembles human behaviour. Although it has been studies scientifically, it is possibly just a short-term hurdle that we have to get across[1].
Furthermore, the quality of the prediction accuracy could be in direct trade-off with fairness, interpretability, and privacy concepts. When you improve the fairness of the model, you could in the short-run experience degradation in prediction quality as the historical injustice requires time to exit to exit a negative feedback-loop. When you use a more interpretable model, you often have a simpler model with less parameters, and if you are trying to model complex phenomena the more interpretable models would perform worse. Highly complex models also have the ability to fit the data closely, which would allow adversaries to reconstruct the training dataset better than less accurate models that would otherwise introduce bias into the prediction problem.
The question of fairness also extends to the use of features that seem unrelated to the prediction problem. It is too expensive to find direct causal features, and could even be intrusive, so instead of judging the individual’s person characteristics, modern lending companies would end up using far removed criteria like you connections on social media, the activities the activities you partake in. These algorithms are more likely to give a loan if you are perceived to have a ‘’higher quality’’ network. Even worse than that when you don’t use social media, you are severely punished because of a lack of data, this reminds me of a quote by Eduardo Saverin a co-founder of Facebook, ‘’I don’t like showing my privacy online’’. This phenomena is commonplace in machine learning be largely a science of correlation. A company called Faception claimed to look at facial images and bone structure to access people’s IQ, although it is accurate, it could just be perpetuating biases and adverse feedback mechanism. Amazon[2] and third-party recruitment software like HierVue[3], scans video, voice, and textuals data to assess candidate quality, based on historical observations, in the case of Amazon the algorithm discounted candidates with female themed words like ‘’women’s’’ from e.g., ‘’women’s chess club captain’’ in their CV. In another story as explained by an employment attorney, “after an audit of the algorithm, the resume screening company found that the algorithm found two factors to be most indicative of job performance: their name was Jared, and whether they played high school lacrosse. Girouard’s client did not use the tool.”[4]
There is something to said about taking the uncertainty and randomness out of credit decision. Although fairness in accuracy certainly increases fairness in luck decreases e.g., less false negative predictions. In some cultures, the element of luck, second chances, and renewed beginnings take weight precedence over fairness in accuracy. We might not want to improve credit models anymore as it could exclude certain groups from the opportunity to obtain a loan. Compare this with fraud detection models, where we don’t want luck to play any role. This is happening in every commercial transaction, if you are a good customer you automatically are put though to quality phone operators with minimal wait time, on the other hand if you are a bad customer good luck having your request dealt with[5]. Loyalty cards have finally allowed a system for preferential treatment lining the pockets of shareholders to the expense of the average citizen.
It could become problematic if your credit history is so fixed into models that your life changes for good. In some countries credit data is used not just for credit modelling, but also for rental assessments, and job applications. This is clearly a horrendous idea, and your credit should have nothing to do with your ability to obtain a job. And if it does affect your chances to obtain a rental and obtain a job this information should be immediately shared with welfare to emphasise your position.
Like the 21st century miracle of matching your genetic makeup to your height, weight, and eye colour, and even your personality disposition, the data trail from human activity has similar predictive power that embeds one into a deterministic probability space, what Fourcade and Healy calls ‘’life chances’’. Somewhere on someone’s computer your sit within a statistical cluster, where if you look to your left and to your right you would see your digital twin who you are unfortunately not acquainted with, so don’t fanaticise about establishing a consumer union just yet.
Automating arbitrage in such instances is thus also beneficial, as it reduces the length of time for which such mispricing can exist. However, there may be no additional benefit from exploiting arbitrage situations that would have resolved in milliseconds anyway. In such situations, instantaneous exercise of arbitrage opportunities can even be counterproductive in terms of the overall welfare of participants. For example, Wah and Wellman (2013) find that the high-frequency trading practice of latency arbitrage between fragmented markets can reduce total surplus in the market. In addition, the practice may engender a costly latency arms race (Budish et al. 2015).
The FCA released a scathing report showing that high frequency traders are a tax on ordinary traders, they even go as far as put a value on the tax. They seem to only disrupt otherwise slow to propagate but more orderly markets, they only provide liquidity when the going is good, but disappear during bad times sucking up all liquidity and further exacerbating the problem. Before the computerisation of trading, it was slow to execute sell orders leading to a stickier market, not only that but the market was bounded by the speed of human thought that at most could find refuge in Anki cards. This doesn’t mean that tax would solve the situation, in fact most policy advisors multiple the tax they will collect with current volumes, without understanding that a lot of the volume will evaporate, this is called static scoring and are in great use because of life illiteracy. This tax would need global partners, when it is done in isolation a la Sweden 1984, the policy fails. In Sweden future volumes were down 98 percent and options volume virtually disappeared, and the fees were only 3% from what the finance ministry originally forecasted, it ended up costing Sweden money to a fall in taxable revenues, needles to say it was repealed in 1991[6].
Josh Lauer in Creditworthy shows that during the 1960’s, credit decision was often made on the basis of judgment and character[7]. This was not fair or unbiased, but if allowed today, it would offer you an escape route of the hands of over-optimised correlation machines. Unfortunately, sincerity and character are not easily represented in a NumPy array. Moreover, there is little reason to include them as there is little reason to suspect that they would improve already successful data-driven models in this domain.
The paradox is that the solution to enable better modelling of people who are stuck in the central tendencies of the payment behaviour of their peers, might be the collection of more data, or the development of additional features to distinguish them from others with shared attributes. These finer-grained central tendencies might be fairer, but the additional features might not have that much explainable value, regardless, the renewed faith in correlation would always put borrowers to the whim of ‘’life chances’’ over statistics which they have no control over.
As with lending in credit, the introduction of large datasets might undermine the insurance industry. Whereas before insurance schemes faced asymmetric information, now they are better at predicting insurance premiums. Do these new datasets further endanger the mutualisation of risks and present new risk for discrimination and exclusion from coverage? For one we will not be able to spread the risks and subsidise those with risky behaviour. There is a real push in insurance to ‘segments of one’ i.e. the ultimate price discrimination that leads to fairer and more competitive quotes. Its accurate and fair in the meritocratic sense, but it is ultimately regressive, if you live in a more unsafe neighbourhood and eat an unhealthy diet you will be penalised more than ever. With the coming decrease in corporate socialism, governments will have to take an active stand.
[1] https://www.sciencedirect.com/science/article/pii/S2405844018339586
[2] https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G
[4] https://qz.com/1427621/companies-are-on-the-hook-if-their-hiring-algorithms-are-biased/
[5] https://www.wsj.com/articles/on-hold-for-45-minutes-it-might-be-your-secret-customer-score-1541084656
[6] Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading, Rishi K. Narang
[7] https://www.goodreads.com/book/show/33197482-creditworthy