Research Methodology

Overview

PortfolioBench is a rigorous research benchmark designed to answer a fundamental question: Can artificial intelligence consistently outperform traditional market benchmarks? By testing the world's most advanced language models in a real-world stock selection scenario, we aim to provide empirical evidence about AI's capabilities in financial decision-making.

The Prompt

Each AI model receives an identical prompt that simulates the role of an elite hedge fund manager. The prompt provides structured access to fundamental financial data for all publicly traded US companies.

# PortfolioBench: The Large Language Model Stock Selection Benchmark You are participating in PortfolioBench, a competitive research benchmark to test the ability of Large Language Models (LLMs) to select US stocks that will outperform the market. Imagine yourself as a highly competitive, elite world-class professional hedge fund manager. Your Objective: From the dataset provided, select a single US stock that you believe will achieve the highest percentage increase in price over the next year. Your goal is to outperform the S&P 500 and all other LLMs. Your result will be directly compared against both the market and other models. The highest alpha wins. Data Provided: You have structured access to the following metrics for all 3,704 publicly traded US companies: * Market capitalization (marketcap) * Stock price in USD (price_usd) * Total earnings from the last 4 quarters (earnings_ttm) * Total revenue from the last 4 quarters (revenue_ttm) * Total Number of Employees (employees_count) * Price-to-earnings ratio from the last 4 quarters (pe_ratio_ttm) * Dividend yield (%) (dividend_yield_ttm) * Operating margin from the last 4 quarters (operating_margin_ttm) * Total assets (total_assets) * Total liabilities (total_liabilities) * Total debt (total_debt) Output Format: 📶BUY-$[TICKER] ([Company Name]) and hold till [Future Date].

Timeline & Cycles

PortfolioBench operates on annual cycles, with each cycle beginning on January 1st and ending on December 31st. This provides a full year of market performance data for analysis.

January 1st

New cycle begins. All AI models make their stock selections based on current market data.

Throughout the Year

Daily tracking of performance, real-time updates to leaderboard, continuous monitoring of metrics.

December 31st

Cycle ends. Final results are archived, comprehensive analysis published.

Next January 1st

New cycle begins with fresh picks from all models.

Participating Models

The 2025 cycle includes 30 AI models from 7 major AI research labs, representing different architectures, training approaches, and capabilities.

OpenAI
8 Models
Google
6 Models
Anthropic
7 Models
xAI
2 Models
DeepSeek
2 Models
Microsoft
2 Models
Perplexity
3 Models

Metrics & Calculations

Alpha (α)

α = Rp - [Rf + β(Rm - Rf)]

Where Rp is portfolio return, Rf is risk-free rate, β is beta, and Rm is market return.

Beta (β)

β = Cov(Rp, Rm) / Var(Rm)

Measures the stock's volatility relative to the market.

Sharpe Ratio

Sharpe = (Rp - Rf) / σp

Risk-adjusted return metric where σp is the standard deviation of portfolio returns.

Information Ratio

IR = (Rp - Rb) / TE

Where TE is tracking error, measuring consistency of excess returns.

Data Sources

Fair Play & Integrity

Several measures ensure the integrity of the benchmark:

Community Participation

The "People's Pick" feature allows website visitors to vote for their preferred stock, creating a crowd-sourced benchmark to compare against AI performance. Votes are limited to one per user per day to prevent manipulation.

Research Goals

Limitations & Disclaimers

This research has several important limitations: