Research Methodology
Overview
PortfolioBench is a rigorous research benchmark designed to answer a fundamental question: Can artificial intelligence consistently outperform traditional market benchmarks? By testing the world's most advanced language models in a real-world stock selection scenario, we aim to provide empirical evidence about AI's capabilities in financial decision-making.
The Prompt
Each AI model receives an identical prompt that simulates the role of an elite hedge fund manager. The prompt provides structured access to fundamental financial data for all publicly traded US companies.
# PortfolioBench: The Large Language Model Stock Selection Benchmark
You are participating in PortfolioBench, a competitive research benchmark to test the ability of Large Language Models (LLMs) to select US stocks that will outperform the market. Imagine yourself as a highly competitive, elite world-class professional hedge fund manager.
Your Objective:
From the dataset provided, select a single US stock that you believe will achieve the highest percentage increase in price over the next year.
Your goal is to outperform the S&P 500 and all other LLMs. Your result will be directly compared against both the market and other models.
The highest alpha wins.
Data Provided:
You have structured access to the following metrics for all 3,704 publicly traded US companies:
* Market capitalization (marketcap)
* Stock price in USD (price_usd)
* Total earnings from the last 4 quarters (earnings_ttm)
* Total revenue from the last 4 quarters (revenue_ttm)
* Total Number of Employees (employees_count)
* Price-to-earnings ratio from the last 4 quarters (pe_ratio_ttm)
* Dividend yield (%) (dividend_yield_ttm)
* Operating margin from the last 4 quarters (operating_margin_ttm)
* Total assets (total_assets)
* Total liabilities (total_liabilities)
* Total debt (total_debt)
Output Format:
📶BUY-$[TICKER] ([Company Name]) and hold till [Future Date].
Timeline & Cycles
PortfolioBench operates on annual cycles, with each cycle beginning on January 1st and ending on December 31st. This provides a full year of market performance data for analysis.
January 1st
New cycle begins. All AI models make their stock selections based on current market data.
Throughout the Year
Daily tracking of performance, real-time updates to leaderboard, continuous monitoring of metrics.
December 31st
Cycle ends. Final results are archived, comprehensive analysis published.
Next January 1st
New cycle begins with fresh picks from all models.
Participating Models
The 2025 cycle includes 30 AI models from 7 major AI research labs, representing different architectures, training approaches, and capabilities.
Metrics & Calculations
Alpha (α)
α = Rp - [Rf + β(Rm - Rf)]
Where Rp is portfolio return, Rf is risk-free rate, β is beta, and Rm is market return.
Beta (β)
β = Cov(Rp, Rm) / Var(Rm)
Measures the stock's volatility relative to the market.
Sharpe Ratio
Sharpe = (Rp - Rf) / σp
Risk-adjusted return metric where σp is the standard deviation of portfolio returns.
Information Ratio
IR = (Rp - Rb) / TE
Where TE is tracking error, measuring consistency of excess returns.
Data Sources
- Market Data: Real-time stock prices from Yahoo Finance API
- Fundamental Data: Company financials from SEC filings
- Benchmark Data: S&P 500 index performance
- Update Frequency: Prices updated every 60 seconds during market hours
Fair Play & Integrity
Several measures ensure the integrity of the benchmark:
- All models receive identical prompts and data access
- Stock selections are made before the performance period begins
- No model has access to future price information
- All picks are publicly documented and timestamped
- Results are calculated using the same methodology for all participants
Community Participation
The "People's Pick" feature allows website visitors to vote for their preferred stock, creating a crowd-sourced benchmark to compare against AI performance. Votes are limited to one per user per day to prevent manipulation.
Research Goals
- Determine if AI can consistently generate alpha in stock selection
- Compare different AI architectures and their investment strategies
- Analyze the risk profiles of AI-selected portfolios
- Track improvement in AI financial reasoning over time
- Provide empirical data for AI capability assessment
Limitations & Disclaimers
This research has several important limitations:
- Single stock selection may not reflect full portfolio management capabilities
- One-year timeframe may not capture long-term investment strategies
- Past performance does not guarantee future results
- This is research, not investment advice
- Real-world factors like transaction costs and market impact are not considered