Methodology - PortfolioBench

Overview

PortfolioBench is a rigorous research benchmark designed to answer a fundamental question: Can artificial intelligence consistently outperform traditional market benchmarks? By testing the world's most advanced language models in a real-world stock selection scenario, we aim to provide empirical evidence about AI's capabilities in financial decision-making.

The Prompt

Each AI model receives an identical prompt that simulates the role of an elite hedge fund manager. The prompt provides structured access to fundamental financial data for all publicly traded US companies.

# PortfolioBench: The Large Language Model Stock Selection Benchmark You are participating in PortfolioBench, a competitive research benchmark to test the ability of Large Language Models (LLMs) to select US stocks that will outperform the market. Imagine yourself as a highly competitive, elite world-class professional hedge fund manager. Your Objective: From the dataset provided, select a single US stock that you believe will achieve the highest percentage increase in price over the next year. Your goal is to outperform the S&P 500 and all other LLMs. Your result will be directly compared against both the market and other models. The highest alpha wins. Data Provided: You have structured access to the following metrics for all 3,704 publicly traded US companies: * Market capitalization (marketcap) * Stock price in USD (price_usd) * Total earnings from the last 4 quarters (earnings_ttm) * Total revenue from the last 4 quarters (revenue_ttm) * Total Number of Employees (employees_count) * Price-to-earnings ratio from the last 4 quarters (pe_ratio_ttm) * Dividend yield (%) (dividend_yield_ttm) * Operating margin from the last 4 quarters (operating_margin_ttm) * Total assets (total_assets) * Total liabilities (total_liabilities) * Total debt (total_debt) Output Format: 📶BUY-$[TICKER] ([Company Name]) and hold till [Future Date].

Timeline & Cycles

PortfolioBench operates on annual cycles, with each cycle beginning on January 1st and ending on December 31st. This provides a full year of market performance data for analysis.

January 1st

New cycle begins. All AI models make their stock selections based on current market data.

Throughout the Year

Daily tracking of performance, real-time updates to leaderboard, continuous monitoring of metrics.

December 31st

Cycle ends. Final results are archived, comprehensive analysis published.

Next January 1st

New cycle begins with fresh picks from all models.

Participating Models

The 2025 cycle includes 30 AI models from 7 major AI research labs, representing different architectures, training approaches, and capabilities.

OpenAI

8 Models

Google

6 Models

Anthropic

7 Models

xAI

2 Models

DeepSeek

2 Models

Microsoft

2 Models

Perplexity

3 Models

Metrics & Calculations

Alpha (α)

α = R_p - [R_f + β(R_m - R_f)]

Where R_p is portfolio return, R_f is risk-free rate, β is beta, and R_m is market return.

Beta (β)

β = Cov(R_p, R_m) / Var(R_m)

Measures the stock's volatility relative to the market.

Sharpe Ratio

Sharpe = (R_p - R_f) / σ_p

Risk-adjusted return metric where σ_p is the standard deviation of portfolio returns.

Information Ratio

IR = (R_p - R_b) / TE

Where TE is tracking error, measuring consistency of excess returns.

Data Sources

Market Data: Real-time stock prices from Yahoo Finance API
Fundamental Data: Company financials from SEC filings
Benchmark Data: S&P 500 index performance
Update Frequency: Prices updated every 60 seconds during market hours

Fair Play & Integrity

Several measures ensure the integrity of the benchmark:

All models receive identical prompts and data access
Stock selections are made before the performance period begins
No model has access to future price information
All picks are publicly documented and timestamped
Results are calculated using the same methodology for all participants

Community Participation

The "People's Pick" feature allows website visitors to vote for their preferred stock, creating a crowd-sourced benchmark to compare against AI performance. Votes are limited to one per user per day to prevent manipulation.

Research Goals

Determine if AI can consistently generate alpha in stock selection
Compare different AI architectures and their investment strategies
Analyze the risk profiles of AI-selected portfolios
Track improvement in AI financial reasoning over time
Provide empirical data for AI capability assessment

Limitations & Disclaimers

This research has several important limitations:

Single stock selection may not reflect full portfolio management capabilities
One-year timeframe may not capture long-term investment strategies
Past performance does not guarantee future results
This is research, not investment advice
Real-world factors like transaction costs and market impact are not considered

Research Methodology

Overview

The Prompt

Timeline & Cycles

January 1st

Throughout the Year

December 31st

Next January 1st

Participating Models

Metrics & Calculations

Alpha (α)

Beta (β)

Sharpe Ratio

Information Ratio

Data Sources

Fair Play & Integrity

Community Participation

Research Goals

Limitations & Disclaimers