Model Innovations, Only a Secondary Concern
Data Cartels: Artificial Intelligence Information Heists V
More data would simply just be more data without improvements in modelling. Modelling can be improved through model innovation and computer processing efficiency. The data frenzy that we are witnessing now is an outgrowth from the collection and storage of more data spurred on by increased interconnectivity and improvements in processing of this data which allows only to fully utilise the modelling innovations that has been around for some time. Like all progression there is a feedback loop, and the flux of data has also led to new model innovations like processing new types of data which in turn increases the demand for new data. We are currently in a position where all three developments: data collection, model innovation, and computer processing improvements have an effect on the future of machine learning. A good example is the development of OpenAI’s GPT-3 language model released one year after the GPT-2 model that now uses a larger percentage of the text available on the internet, a newly develop language model, and improved hardware. Currently the difficulty of improvements also flows from left to right in the beforementioned list.
The fastest improvements that can be gained by just adding additional data and the world is awash with financial firms looking for quick solutions. Machine learning as it has been referred to in modern years, have been around for at least half a century. In the start machine learning models did not need too many datapoints to work. These early models are referred to as parametric models, and they performed fairly well over the years. In 1632 Galileo Galilei set the theory-of-errors in motion after trying to predict planetary movements, he essentially wanted a method to draw the best line between a few datapoints. In those years there was a need to develop a robust formal equation that best fits a line and in 1805, and 1908 Legendre and Gauss discovered a neat solution called the least squared method. Within a few decades, the model was adopted by actuaries to price insurance premiums. Soon enough further innovations in the 1890s lead to the multiple regression that would now allow for additional data in the form of features, variables, or attributes to be included in the least square’s linear regression model. Each additional model revolution led to a model that was able to incorporate larger datasets more efficiently. In finance, the multiple regression was used to develop an arbitrage theory of capital asset pricing in 1973 whereas the normal regression model was used to develop the capital asset pricing model two decades earlier.
A further innovation appeared earlier in 1943, known as Ridge regression, which allowed modellers to add many more attributes to a prediction problem, because this method was able to automatically select the best features for the modelling problem. This method was introduced to economist in 1978 book by Phoebus Dhrymes. This was a boon to analysts who performed supervised regression tasks. Those performing supervised classification task also had a model known as logistic regression popularise by Gaddum in 1933 and Bliss in 1934. In 1954 already it was used by Farrel to study the ownership of car ownership of different vintages as a function of household income. The multi-dimensional or multivariate study of data became increasingly popular in the mid-1950s, and has led to many of the decomposition techniques that we use so frequently like principal component analysis and linear discriminant analysis both of which can be used to reduce large data sets down to small datasets using statistical compression techniques. So, since the 1950s, we not only had models that can deal with more data, but large dataset could also be compressed into fewer columns while maintaining a similar level of accuracy. In many respects, the models developed up and till the 1960s are still the models being used today.
The next innovation that seriously changed the amount of data that can be used in models is the development of nonparametric or data-based methods. In 1957 Rosenblatt developed the Perceptron, viewed by many as the first neural network. In 1963 Morgan & Sonquist developed the Automatic Interaction Detector, being the first supervised decision tree method. Four years later in 1967, Cover & Hart developed the famous Nearest Neighbour algorithm, and a few decades later in 1990 Schapire developed Boosting, the first ensemble method, a favoured technique used data science competitions such as Kaggle. Architectural and software developments from 1957 onwards have seen to the proliferation of these models, leading to its modern-day. We have within a few centauries gone from drawing lines through stars to automating a modern-day derivative operation using neural networks reinforcement learning models.
Improvements in processing power was instrumentals, as it allowed your average citizen scientist to experiment with these models and allowed corporations to apply them to their datasets at little costs. These improvements have promoted the release of software packages and solutions. Although the ridge regression was formalised in 1940, no statistical package was made available until 1985. Even the logistic regression models needed packaged routine for maximum likelihood estimation and the first computer packages that were able to perform this routine was released in 1977. The extent to which these models are used in recent years is well encapsulated in the following examples. The French investment firm, Nataxis runs more than 3mn simulations every night using unsupervised learning methods to establish new patterns of connections between assets.