Top Paper Highlight
The Anatomy of Machine Learning-Based Portfolio Performance
This financial economics paper focuses on improving the understanding of how various firm characteristics and machine learning methods contribute to successful stock return forecasts.
Background: The paper addresses a gap in the understanding of financial strategies that use machine learning to forecast stock returns. While these strategies have been effective in generating profits, particularly through long-short portfolios (buying stocks with high forecasts and selling those with low forecasts), the specific reasons for their success remain unclear.
Methodology - Shapley Values: The authors propose a methodology based on Shapley values, a concept from cooperative game theory. Shapley values are used to fairly allocate the contributions of individual predictors (or groups of predictors) in prediction models. This approach allows for a detailed understanding of how different factors contribute to portfolio performance.
Shapley-based Portfolio Performance Contribution (SPPC): The paper introduces the SPPC method, which is versatile and can be applied across various prediction models, strategies, and performance metrics. Unlike conventional model interpretation tools, SPPC directly measures the impact of predictors on portfolio performance.
Empirical Application: The methodology is applied in a comprehensive study using the XGBoost algorithm and 207 firm characteristics to forecast individual stock returns from 1973 to 2021. The long-short portfolio based on these forecasts showed impressive performance, with high Sharpe and Calmar ratios and significant alphas.
Analyzing Predictor Contributions: The authors group the firm characteristics into 20 categories based on economic concepts and use SPPC to estimate their contributions to portfolio performance. Key groups like Risk, Earnings, Seasonal momentum, and Momentum were found to play significant roles in the portfolio's success.
Insights and Implications: The SPPC method not only quantifies the contribution of different predictor groups but also offers insights into the changing dynamics over time. For example, it was observed that the performance of the XGBoost portfolio declined after 2002, partly due to changing contributions from the Risk and Momentum groups.
Conclusion: SPPC is presented as a valuable tool in the financial machine learning pipeline, providing transparency and economic relevance to the results.
Philippe Goulet Coulombe was nice enough to provide a writing sample that I summarise above.
SSRN
Recently Published
Quantitative
Learning Algorithms and Spoofing: The paper presents a model to test if a trading algorithm can manipulate the limit order book, concluding that market conditions can allow such manipulation. (2023-11-21, shares: 18.0)
Machine Learning for Path-Dependent Contracts: The study introduces machine learning algorithms for pricing certain financial products and a new method for calculating sensitivities using Chebyshev interpolation techniques. (2023-11-28, shares: 5.0)
Reinforcement Learning for Quadratic Hedging: The study compares Reinforcement Learning and Deep Trajectory-based Stochastic Optimal Control in hedging a European call option under different market conditions. (2023-11-20, shares: 3.0)
Shared Causal Manifolds for Risk Management: A finance webinar presented a machine learning-based framework for optimizing portfolio sensitivities and predicting future positions. (2023-11-18, shares: 2.0)
Machine Learning for Asset Management: Machine Learning for Asset Management discusses the use of machine learning in finance, its history, and its applications in asset management (looks like it was written with ChatGPt). (2023-11-20, shares: 6.0)
Financial
Epistemic Limits of Empirical Finance: Causal Reductionism: The research criticizes the use of one-way causation in capital market studies, suggesting that current quantitative finance tools may only be suitable for after-the-fact causal inference. (2023-11-28, shares: 31.0)
Mispricing and Factor Models: The paper introduces a new measure of expected mispricing at the firm level using machine learning, which outperforms existing methods in predicting future mispricing. (2023-11-20, shares: 9.0)
Disagreement of Disagreement: Investor Disagreement Proxies Assessment: The study introduces a new comprehensive framework to measure investor disagreement, unveiling a unique nonlinear composite measure that predicts returns better than existing measures. (2023-11-28, shares: 39.0)
Stochastic Volatility Models for Derivative Valuation: The article explores the use of stochastic volatility models in valuing derivative securities, highlighting the effectiveness of affine Heston and lognormal models. (2023-11-27, shares: 6.0)
Recently Updated
Quantitative
Multi-Task Learning in Financial NLP: Improvements in Financial NLP's Multitask Learning can be achieved by considering skill diversity, task relatedness, and aggregation size. (2023-05-25, shares: 50.0)
Firm Fixed Effects Models Revamped with Machine Learning: Using firm fixed effects in corporate finance research may negate the impact of persistent economic factors, but advanced machine learning can provide alternative insights. (2022-06-27, shares: 2.0)
Dynamic Portfolio Choice with Transaction Costs using Machine Learning: A new computational framework is introduced for solving dynamic portfolio choice problems, using Gaussian process regression and Bayesian active learning, suggesting that more assets can mitigate some illiquidity. (2023-08-18, shares: 2.0)
AI Trader: Reinforcement Learning for Portfolio Management: AI Trader, a model based on reinforcement learning, shows superior risk-gain performance in the Chinese market by incorporating industry effects. (2023-08-01, shares: 2.0)
Financial
Asset Pricing - Deep Structural Model: The paper introduces a new modelling framework using machine learning to estimate structural model parameters, showing its superior predictive power and its ability to offer insights into systematic risk compensation and firm leverage. (2023-11-10, shares: 2.0)
Insider Trading Detection with Big Data and Machine Learning: A machine learning approach using account-level data can effectively identify suspicious insider trading, with these insiders earning higher returns and often using multiple accounts to trade around major information events. (2022-10-17, shares: 2.0)
Frictions and Constraints in Intermediary Markets: Arbitrage activity in various markets is affected by funding and balance sheet segmentation, making it sensitive to localized funding and individual balance sheet shocks. (2021-11-10, shares: 777.0)
ArXiv
Finance
Generative ML for Multivariate Equity Returns: The study uses machine learning techniques to model the returns of S&P 500 equities. (2023-11-21, shares: 9)
High Order Universal Portfolios: The research explores the properties of the Cover universal portfolio and its enhancements as a new synthetic asset. (2023-11-22, shares: 7)
Optimal Tracking Portfolio in Incomplete Markets: The paper uses capital injection to solve an optimal tracking portfolio problem in incomplete market models, showcasing the q-learning algorithm's performance. (2023-11-24, shares: 6)
Narratives from GPT Networks of News: The study uses natural language processing and network analysis to examine news content over time, linking the results to financial market dislocations. (2023-11-24, shares: 6)
FinMe: Performance-Enhanced Large Language Model Trading Agent: The research presents FinMe, a Large Language Model-based agent for financial decision-making, demonstrating its superior trading performance in stocks and funds. (2023-11-23, shares: 6)
Machine Learning for Path-Dependent Contracts: The paper presents a comparison of machine learning algorithms for pricing financial products with early-termination features and introduces a new method for calculating sensitivities. (2023-11-28, shares: 4)
Enhanced Data Generation for Asset Allocation: The study presents a method for creating synthetic datasets to evaluate asset allocation methods and build portfolios within the fixed income universe. (2023-11-27, shares: 4)
Miscellaneous
Approaches for Control Tasks in Curriculum and Imitation Learning: The study examines the effectiveness of curriculum and imitation learning in managing complex time-series data, suggesting the former improves performance while the latter should be used with caution. (2023-11-22, shares: 5)
Hilbert Transforms for PDEs with Mixed Boundary Conditions: Exact Solutions: The research introduces new techniques for solving specific boundary value problems in linear parabolic Partial Differential Equations using odd and even Hilbert transforms. (2023-11-20, shares: 4)
Exploring Analytically Tractable Trial Functions: Quantum-inspired Nonlinear Galerkin Ansatz for High-dimensional PDEs: The research investigates the use of Neural Galerkin methods in solving Hamilton-Jacobi-Bellman partial differential equations, offering trial functions with solvable evolution equations. (2023-11-20, shares: 4)
Crypto & Blockchain
Cryptocurrency Price Forecasting with Deep Learning: The study uses Machine Learning and Natural Language Processing to predict Bitcoin and Ethereum prices using Twitter and Reddit data, improving forecasting accuracy. (2023-11-23, shares: 68)
Deep State-Space Model for Crypto Price Prediction: The research introduces a deep state-space model for predicting cryptocurrency prices, which is more accurate than current and traditional dynamical models. (2023-11-21, shares: 8)
Process Mining and Sequence Clustering Application: The paper uses process mining to address bottlenecks in industrial weaving processes, improving production flow, worker performance, product quality, and lead time. (2023-11-26, shares: 4)
Heuristics for CoinJoin Transaction Detection on Bitcoin Blockchain: The research explores CoinJoin, a method that combines multiple transactions into one for increased privacy in Bitcoin transactions, and develops new ways to identify CoinJoin transactions on the blockchain. (2023-11-21, shares: 4)
Historical Trending
Handling Missing Values in ML Portfolios: The study shows that using cross-sectional means for simple imputation is effective in dealing with missing values in machine learning-constructed portfolios, as complex imputations can cause underperformance due to estimation noise. (2022-07-21, shares: 41)
Comparison of RL and Deep Trajectory-Based Hedging: The research compares the effectiveness of Reinforcement Learning and Deep Trajectory-based Stochastic Optimal Control as data-driven hedging strategies in a simulated environment, offering guidelines for creating autonomous hedging agents. (2023-02-16, shares: 35)
StockEmotions: Investor Sentiment: The article introduces StockEmotions, a new dataset for detecting emotions in the stock market from StockTwits, with DistilBERT and Temporal Attention LSTM model showing the best results. (2023-01-23, shares: 17)
Online Utility-Based Shortfall Risk Estimation and Optimization: The paper proposes a stochastic gradient descent based algorithm for Utility-Based Shortfall Risk optimization, providing non-asymptotic bounds on its convergence, using stochastic approximation-based estimations. (2021-11-16, shares: 14)
Theory and Stock Return Predictions: The research suggests that the predictability of cross-sectional return predictors decreases by half in post-sample scenarios, indicating that theory doesn't improve prediction and peer-review often misinterprets mispricing as risk. (2022-12-20, shares: 48)
RePec
Finance
Optimization models for liquidity-constrained index tracking: The article discusses two models for integrating liquidity constraints in index tracking portfolio optimization, revealing higher liquidity and tracking errors in such portfolios. (2023-11-29, shares: 34.0)
Factor Models in Bond Portfolios: The chapter highlights the use of factor models in understanding bond portfolio risk and return, stressing the importance of model specification. (2023-11-29, shares: 16.0)
Yield Curve Attribution for Global Bonds: The chapter outlines a yield curve-based method for attributing performance in global bond portfolios, emphasizing the need for accurate data and pricing. (2023-11-29, shares: 16.0)
Australian superannuation fund asset allocation: The paper analyzes the asset class switching behavior of Australian superannuation funds using a Markov Regime Switching framework, indicating smaller funds are more aggressive and larger ones are more conservative. (2023-11-29, shares: 25.0)
Machine Learning
Real Estate Appraisals: ML vs Traditional Methods: Research indicates that XGBoost, a machine learning method, offers the most precise estimates in automated property valuations, suggesting a need for regulators to use multiple methods. (2023-11-29, shares: 31.0)
Bitcoin Futures Forecasting with ML: Machine learning algorithms have proven to be more effective than traditional models in predicting Bitcoin futures prices, maintaining an average accuracy rate above 50%. (2023-11-29, shares: 18.0)
Sentiment Difficulty in ABSA: A study investigates sentence difficulty in aspect-based sentiment analysis, using different learning models and text representations, and identifies the hardest sentences using a mix of classifiers. (2023-11-29, shares: 27.0)
GitHub
Finance
Numerical Methods for Math Finance: The article provides lecture materials on numerical methods in mathematical finance. (2020-04-11, shares: 21.0)
Koopa: Learning Time Series Dynamics: The article announces the release of a new method for learning nonstationary time series dynamics, named Koopman Predictors. (2023-08-22, shares: 83.0)
ML Code Implementation: The article explains machine learning algorithms mathematically and provides Python code examples. (2019-02-13, shares: 1459.0)
Friends Don't Let Friends Visualize Bad Data: The article discusses ineffective data visualization techniques and explains why they are not advisable. (2022-05-24, shares: 3197.0)
UnbiasedGBM: Gradient Boosting Decision Tree Repository: An unbiased feature importance is introduced in the new repository for Unbiased Gradient Boosting Decision Tree. (2023-05-14, shares: 19.0)
Quantitative
Deep Learning for Causal Inference: Bernard Koch of UCLA provides a tutorial on the integration of causal inference, econometrics, and machine learning, with a focus on neural networks. (2023-11-19, shares: 7)
Machine Learning for Portfolio Returns: The use of machine learning in creating maximally predictable portfolios (MPP) greatly impacts return predictability, particularly in portfolios using a Kelly criterion style strategy. (2023-11-17, shares: 5)
Empirical Bayes for Out-of-Sample Returns: A recent study introduces the use of empirical Bayes (EB) for analyzing out-of-sample returns in 70,000 long-short trading strategies. (2023-11-28, shares: 1)
Options and Strategies: Recent Paper: Article 1: The paper delves into 0DTE options and various strategies related to them. (2023-11-27, shares: 1)
AI's Impact on Financial Markets: Article 2: The article examines the revolutionary effects of Generative AI on financial markets and services, focusing on regulatory aspects of its implementation. (2023-11-27, shares: 1)
Equity Returns: New Paper: Article 3: The paper explores the predictability of signs in equity returns, proposing a long/short strategy based on future positive returns as a more efficient and safer alternative to the momentum strategy. (2023-11-19, shares: 1)
Miscellaneous
TSMixer: MLP for Time Series Forecasting: Article 1: Google Research has created TSMixer, a new time series forecasting tool using an all-MLP architecture, with Python code accessible for users. (2023-11-17, shares: 1)
Decoding Intentions: AI and Signals: Article 3: A recent AI policy paper explores the application of artificial intelligence in interpreting intentions and expensive signals. (2023-11-22, shares: 0)
Code Generation Evaluation: LLMs: Code Generation Evaluation LLMs is an analysis of the performance and effectiveness of code generation in language model systems. (2023-11-19, shares: 0)
Trending
Finance Workshop on Transfer Learning Applications: The ICAIF'23 is organizing a workshop on Transfer Learning and its Applications in Finance, focusing on its current research and use in resolving financial issues. (2023-11-27, shares: 1.0)
Wiener Process: Key to Stochastic Processes: The Wiener process, a continuous-time stochastic process named after Norbert Wiener, is significant in quantitative finance for modeling asset prices and developing option pricing models. (2023-11-27, shares: 1.0)
Transition Density Function for Local Volatility Models: Local volatility models, used to determine the transition probability density for stock prices, are based on equations where the stock's volatility depends on its current price. (2023-11-27, shares: 1.0)
Informative
Quantitative Risk and Portfolio Management Textbook Published: Cambridge University Press has published a new textbook, Quantitative Risk and Portfolio Management, designed to teach students the theory and practice of mathematical finance. (2023-11-27, shares: 1.0)
Webinar: Refining GenAI with Random Matrix Theory: MoroccoAI is conducting a webinar on refining GenAI using Random Matrix Theory, featuring Dr. Mohamed El Amine Seddik, an AI Lead Researcher at Technology Innovation Institute. (2023-11-27, shares: 1.0)
Harvesting Volatility Risk Premium Strategies: The article explores long-term trading strategies to leverage the volatility risk premium in financial markets, highlighting its unique features and suggesting ways to benefit from it. (2023-11-27, shares: 1.0)
Videos
Quantitative
Spread Options Calibration: Prof. Matthew Dixon spoke about calibrating spread options using a seasonal commodity forward model at the first Thalesian Talk. (2023-11-17, shares: 1.0)
Yield Farming Analysis: Dr. Thomas Li presented a mathematical model to analyze the economic dynamics of yield farming using onchain data from decentralized exchanges. (2023-11-27, shares: 0.0)
Generative AI Potential: DeepMind is exploring how machine learning and generative AI can speed up software development and drive major transformations. (2023-11-21, shares: 89.0)
Foreign Exchange Introduction: Saeed Amen discussed the role of foreign exchange as an asset class and the future of alternative data and machine learning in finance at the second Thalesian talk. (2023-11-17, shares: 1.0)
Decision Tree: Splitting and Predictions: The video explains the process of decision trees in making predictions, emphasizing the need for a computer due to the high volume of calculations. (2023-11-19, shares: 10.0)
Quantitative
Time Series Models' Forecasting Limitations: The author is researching the limitations of time series models in financial forecasting due to the complex variables that affect stock prices. (2023-11-23, shares: 29.0)
Book Recommendations for Microstructure Effects Understanding: The author has started working on microstructure signals. (2023-11-24, shares: 54.0)
Lack of Algo Trading Forex Popularity: The author discusses the underutilization of forex trading among algo traders despite its numerous benefits. (2023-11-25, shares: 50.0)
PNL and Sharpe for PM Hiring in Multistrategy Funds: The author is trying to estimate the daily earnings from a trading strategy to attract major funds like Millennium and Citadel. (2023-11-26, shares: 47.0)
Hedge Fund Offices: NY vs London Work Culture and Life: The author is considering the best location for permanent employment. (2023-11-23, shares: 95.0)
Rising
HFT vs Hedge Fund Compensation: The article discusses the pros and cons of working in hedge funds versus prop shops, highlighting potential benefits of the latter. (2023-11-26, shares: 30.0)
Fixed Income Job Outlook: The author expresses concern about the potential lack of job opportunities in fixed income at a buyside firm. (2023-11-16, shares: 27.0)
HFT Traders' Daily Routine: The article questions the role of quant high-frequency traders, given that many tasks are handled by other teams. (2023-11-26, shares: 135.0)
Quant Workspace: The author explores the potential workspace environments for a quant, considering various office layouts. (2023-11-18, shares: 68.0)
Advanced Course Selection: Bayesian Stats, Machine Learning, or PDEs: The article details a statistics student's decision-making process between three advanced courses: Bayesian statistics, Machine learning, and PDEs. (2023-11-25, shares: 54.0)
Paper with Code
Trending
Enhancing Diffusion Model Sample Quality: Denoising diffusion models (DDMs) are becoming increasingly popular due to their high-quality and diverse generation capabilities. (2023-11-23, shares: 16676.0)
Filtering-based Attention for NLP: The article presents Localized Filtering-based Attention (LFA), a method that incorporates local language dependencies into Attention, and provides the corresponding code on GitHub. (2023-11-29, shares: 307.0)
Realtime Neural Radiance Caching for Path Tracing: The article suggests a novel method for managing dynamic scenes in neural networks by adapting and training the radiance cache during rendering, rather than pretraining, and includes the code on GitHub. (2023-11-21, shares: 172.0)
Accelerating DNN with Approximate Matrix Multiplication: The article explores the importance of MatMul in shifting from traditional High Performance Computing to deep learning. (2023-11-22, shares: 146.0)
Learning to Learn: The model's inner loop is equivalent to linear attention or self-attention, based on the type of learner used. (2023-11-16, shares: 100.0)
ArXiv ML
Recently Published
More is Better: Optimal Overparameterization: The paper offers theoretical support for the idea that larger models, more data, and increased computation enhance performance in random feature regression, a type of model similar to shallow networks. (2023-11-24, shares: 66)
ฯPCA: Unified Neural Model for PCA: The research proposes a unified neural model for PCA as single-layer autoencoders, capable of learning a semi-orthogonal transformation that reduces dimensionality and orders by variances, without rotational indeterminacy. (2023-11-22, shares: 16)
Optimality in Mean Estimation: The research delves into the mean estimation problem, concluding that no estimator can surpass the sub-Gaussian error rate for any distribution. (2023-11-21, shares: 21)
Efficient High-Dimensional Bandit Learning: The study presents a sample-efficient algorithm for high-dimensional multi-armed contextual bandits with batched feedback, achieving regret bounds similar to those in fully sequential settings with fewer batches. (2023-11-22, shares: 17)
Historical Trending
Generative Diffusion Models: Memory Mechanisms: Research indicates that generative diffusion models, a machine learning method, can be seen as energy-based models and can help understand how long-term memory is formed, connecting creativity and memory recall. (2023-09-29, shares: 703)
Banach-Tarski Embeddings: Interpretable Transformers: A novel method for embedding recursive data structures into high-dimensional vectors has been developed, offering an interpretable model for transformer's latent state vectors and enabling computations without decoding. (2023-11-15, shares: 79)
VeriCompress: Streamlining Verified Robust Compressed Neural Networks: VeriCompress, a new tool that automates the search and training of compressed models with robustness guarantees, has been launched, providing faster training, improved accuracy, and reduced memory and inference time for deployment on resource-limited platforms. (2022-11-17, shares: 22)
Topological Properties of Basins in Width Bounded Neural Networks: A study has shown that autoencoders trained with standard SGD methods create bounded basins of attraction around their training data, answering a previous question and providing insight into why certain neural network functions are not dense in continuous function spaces. (2020-11-10, shares: 22)