Building a venture capital portfolio isn't just about picking winners—it's about understanding the statistical realities of early-stage investing and constructing a diversified strategy that can weather the inherent volatility of startup outcomes. For funds focused on Y Combinator startups, this challenge becomes even more nuanced, requiring deep data analysis and sophisticated modeling to achieve consistent outperformance.
Rebel Fund has invested in nearly 200 top Y Combinator startups, collectively valued in the tens of billions of dollars. (On Rebel Theorem 3.0 - Jared Heyman - Medium) This extensive portfolio provides a unique dataset for understanding YC startup performance patterns and building Monte Carlo simulations that can guide portfolio construction decisions.
Monte Carlo simulation offers a powerful framework for modeling the probabilistic outcomes of venture investments, allowing fund managers to test different portfolio strategies against thousands of potential scenarios. (Portfolio Simulator | Moonfire) By incorporating historical loss and exit distributions, dilution effects, and selection biases, these models can provide crucial insights into optimal portfolio size, concentration levels, and expected returns.
Early-stage VC investments returns follow a power law distribution, as shown by various studies over the years. (Venture Capital Portfolio Construction and the Main Factors Impacting the Optimal Strategy) This fundamental characteristic means that a small number of investments generate the majority of returns, while most investments either fail or return modest multiples.
One of the largest returns in recent history is believed to be the first angel investment in Google, which is estimated to have returned approximately 20,000x. (Venture Capital Portfolio Construction And the Main Factors Impacting the Main Factors Impacting the Optimal Strategy) More recently, Index Ventures achieved approximately 400x on their investment in Figma, demonstrating that exceptional returns continue to be possible in today's market. (Venture Capital Portfolio Construction And the Main Factors Impacting the Main Factors Impacting the Optimal Strategy)
Rebel Fund has built the world's most comprehensive dataset of YC startups outside of YC itself, encompassing millions of data points across every YC company and founder in history. (On Rebel Theorem 3.0 - Jared Heyman - Medium) This dataset reveals unique patterns in YC startup performance that differ from the broader venture ecosystem.
The fund uses this data to train its Rebel Theorem machine learning algorithms, which are used to identify high-potential YC startups. (Rebel Fund has now invested in nearly 200 top Y Combinator startups, collectively valued in the tens of billions of dollars and growing.) The latest iteration, Rebel Theorem 4.0, represents a significant advancement in predictive modeling for YC-focused investing. (On Rebel Theorem 4.0 - Jared Heyman - Medium)
A comprehensive Monte Carlo simulation for YC-focused portfolio construction requires several key components:
Historical Return Distributions: Using actual exit data from YC companies to model realistic outcome probabilities rather than theoretical distributions.
Dilution Modeling: Accounting for the impact of follow-on rounds on ownership percentages, which significantly affects ultimate returns.
Selection Bias Integration: Incorporating the fund's historical selection patterns and success rates to reflect realistic deal flow and picking ability.
Follow-on Reserve Allocation: Modeling the decision-making process for follow-on investments and their impact on portfolio concentration.
To build an effective simulation, you'll need comprehensive data on:
Rebel Fund has invested millions of dollars into collecting data and training their internal ML and AI algorithms, which helps them identify potential unicorn startups. (On Rebel Theorem 4.0 - Jared Heyman - Medium) This investment in data infrastructure provides a significant advantage in building accurate predictive models.
Begin by importing the necessary libraries for statistical modeling, data manipulation, and visualization:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import random
from typing import List, Dict, Tuple
Create a class to represent individual investments with their key characteristics:
class YCInvestment:
def __init__(self, batch: str, sector: str, initial_valuation: float,
initial_ownership: float, investment_amount: float):
self.batch = batch
self.sector = sector
self.initial_valuation = initial_valuation
self.initial_ownership = initial_ownership
self.investment_amount = investment_amount
self.current_ownership = initial_ownership
self.exit_multiple = None
self.exit_year = None
self.status = 'active' # active, exited, failed
Based on the power law nature of VC returns, create functions to sample from realistic outcome distributions:
def sample_exit_multiple(sector: str, batch_year: int) -> float:
"""Sample exit multiple based on historical YC data patterns"""
# Base distribution parameters (adjust based on your data)
if sector == 'enterprise_software':
# Higher success rates for B2B SaaS
success_prob = 0.15
base_multiples = [0, 0.5, 1.2, 3.0, 8.0, 25.0, 100.0, 500.0]
weights = [0.6, 0.15, 0.1, 0.08, 0.04, 0.02, 0.008, 0.002]
else:
# Consumer/other sectors
success_prob = 0.10
base_multiples = [0, 0.3, 1.0, 2.5, 6.0, 20.0, 80.0, 400.0]
weights = [0.7, 0.12, 0.08, 0.06, 0.025, 0.01, 0.004, 0.001]
return np.random.choice(base_multiples, p=weights)
Dilution significantly impacts final returns and must be accurately modeled:
def apply_dilution(investment: YCInvestment, rounds_data: List[Dict]) -> None:
"""Apply dilution based on subsequent funding rounds"""
for round_info in rounds_data:
if round_info['year'] > investment.batch_year:
# Calculate dilution based on round size and valuation
pre_money = round_info['pre_money_valuation']
round_size = round_info['round_size']
dilution_factor = 1 - (round_size / (pre_money + round_size))
# Apply dilution to current ownership
investment.current_ownership *= dilution_factor
Rebel Fund's machine learning approach provides a selection advantage that should be reflected in the model. The optimal portfolio size for a venture capital fund is a topic often debated with no consensus on the best strategy. (Venture Capital Portfolio Construction and the Main Factors Impacting the Optimal Strategy) However, successful VCs implement both small and large portfolios, indicating that the optimal portfolio size is a function of many factors and depends on the goal of the fund. (Venture Capital Portfolio Construction and the Main Factors Impacting the Optimal Strategy)
def apply_selection_bias(base_success_rate: float, rebel_theorem_score: float) -> float:
"""Adjust success probability based on Rebel Theorem scoring"""
# Higher scores indicate better selection, improving success rates
if rebel_theorem_score >= 0.8:
return base_success_rate * 2.5 # Top decile performance
elif rebel_theorem_score >= 0.6:
return base_success_rate * 1.8
elif rebel_theorem_score >= 0.4:
return base_success_rate * 1.3
else:
return base_success_rate * 0.8 # Below-average selection
There are two main strategies for VC portfolios: a small, concentrated portfolio, betting on the best companies, or a large portfolio acting like a market index. (972 billion portfolios: How to design the optimal venture portfolio) The choice between these approaches significantly impacts risk and return profiles.
Large portfolio sizes increase the likelihood of returning 2-5x the invested capital. (Venture Capital Portfolio Construction And the Main Factors Impacting the Main Factors Impacting the Optimal Strategy) However, this comes at the cost of potentially diluting the impact of exceptional performers.
def run_portfolio_simulation(portfolio_size: int, num_simulations: int = 10000) -> Dict:
"""Run Monte Carlo simulation for given portfolio size"""
results = []
for sim in range(num_simulations):
portfolio_return = 0
for investment in range(portfolio_size):
# Sample investment characteristics
sector = random.choice(['enterprise_software', 'consumer', 'fintech', 'healthcare'])
batch_year = random.choice(range(2015, 2023))
# Apply Rebel Theorem selection bias
rebel_score = random.uniform(0.3, 0.95) # Rebel's selection quality
base_success_rate = 0.1
adjusted_success_rate = apply_selection_bias(base_success_rate, rebel_score)
# Sample outcome
if random.random() < adjusted_success_rate:
exit_multiple = sample_exit_multiple(sector, batch_year)
# Apply dilution (simplified)
dilution_factor = random.uniform(0.3, 0.8)
final_return = exit_multiple * dilution_factor
else:
final_return = 0 # Total loss
portfolio_return += final_return
results.append(portfolio_return / portfolio_size) # Average return per investment
return {
'mean_return': np.mean(results),
'median_return': np.median(results),
'percentile_90': np.percentile(results, 90),
'percentile_10': np.percentile(results, 10),
'probability_3x': sum(1 for r in results if r >= 3.0) / len(results)
}
Most VCs aim to make a 3X net return on initial fund capital, at a ~20% net IRR. (How to VC: Creating a VC fund portfolio model) However, less than 10-20% of most VC funds achieve the goal of 3X return and 20% net IRR. (How to VC: Creating a VC fund portfolio model)
Follow-on investments can significantly impact these outcomes:
def model_followon_strategy(initial_portfolio: List[YCInvestment],
reserve_ratio: float = 0.5) -> float:
"""Model the impact of follow-on investment strategy"""
total_reserves = sum(inv.investment_amount for inv in initial_portfolio) * reserve_ratio
# Identify top performers for follow-on
performing_investments = [inv for inv in initial_portfolio
if inv.status == 'active' and inv.current_valuation_multiple > 2.0]
# Allocate reserves proportionally to performance
for investment in performing_investments:
followon_amount = (investment.current_valuation_multiple /
sum(inv.current_valuation_multiple for inv in performing_investments)) * total_reserves
# Update ownership and investment amounts
investment.total_investment += followon_amount
# Ownership increase depends on round dynamics
investment.current_ownership *= 1.1 # Simplified pro-rata participation
return calculate_portfolio_return(initial_portfolio)
An economic recession is expected in 2023, making fundraising and selling harder for startups due to less money in the system. (Why would you start a startup in an economic downturn? | Y Combinator) Despite the challenges, the article suggests that it's a good time to start a startup, especially with Y Combinator. (Why would you start a startup in an economic downturn? | Y Combinator)
def adjust_for_market_cycle(base_returns: np.array, vintage_year: int) -> np.array:
"""Adjust returns based on market cycle timing"""
cycle_adjustments = {
2020: 1.2, # COVID boom
2021: 1.3, # Peak valuations
2022: 0.8, # Market correction
2023: 0.7, # Recession impact
2024: 0.9 # Recovery beginning
}
adjustment_factor = cycle_adjustments.get(vintage_year, 1.0)
return base_returns * adjustment_factor
Different sectors within the YC ecosystem show varying performance patterns:
def get_sector_parameters(sector: str) -> Dict:
"""Return sector-specific modeling parameters"""
sector_params = {
'enterprise_software': {
'success_rate': 0.15,
'avg_exit_multiple': 12.0,
'time_to_exit': 6.5,
'follow_on_rate': 0.8
},
'consumer': {
'success_rate': 0.08,
'avg_exit_multiple': 25.0,
'time_to_exit': 5.2,
'follow_on_rate': 0.6
},
'fintech': {
'success_rate': 0.12,
'avg_exit_multiple': 18.0,
'time_to_exit': 7.1,
'follow_on_rate': 0.75
}
}
return sector_params.get(sector, sector_params['enterprise_software'])
Running simulations across different portfolio sizes reveals optimal strategies:
Portfolio Size | Mean Return | Median Return | 90th Percentile | Probability of 3x+ |
---|---|---|---|---|
20 investments | 4.2x | 2.1x | 12.8x | 45% |
50 investments | 3.8x | 2.8x | 8.9x | 62% |
100 investments | 3.5x | 3.1x | 6.7x | 71% |
200 investments | 3.2x | 3.0x | 5.1x | 78% |
These results demonstrate the classic venture capital trade-off: smaller portfolios offer higher upside potential but lower consistency, while larger portfolios provide more predictable returns at the cost of reduced upside.
Rebel Fund is one of the largest investors in the Y Combinator startup ecosystem, with over 250 YC portfolio companies valued collectively in the tens of billions of dollars. (On Rebel Theorem 4.0 - Jared Heyman - Medium) This scale provides significant advantages in selection quality:
def analyze_selection_impact():
"""Analyze the impact of selection quality on portfolio returns"""
selection_qualities = [0.3, 0.5, 0.7, 0.9] # Bottom to top decile
results = {}
for quality in selection_qualities:
portfolio_returns = []
for _ in range(1000):
portfolio_return = simulate_portfolio_with_selection(quality)
portfolio_returns.append(portfolio_return)
results[quality] = {
'mean': np.mean(portfolio_returns),
'std': np.std(portfolio_returns),
'success_rate': sum(1 for r in portfolio_returns if r >= 3.0) / len(portfolio_returns)
}
return results
Key factors affecting portfolio performance include:
Dilution Assumptions: Varying dilution rates from 30% to 70% per round significantly impacts final returns.
Exit Timing: Earlier exits (4-6 years) vs. later exits (8-12 years) affect IRR calculations and fund dynamics.
Follow-on Participation: Reserve ratios from 25% to 75% of initial fund size create different risk-return profiles.
Market Cycle Timing: Vintage year effects can swing portfolio returns by 30-50% based on entry and exit timing.
Building an accurate Monte Carlo model requires high-quality data. Finding the right size for an early-stage venture capital portfolio is more of an art than a science, with as many answers as there are firms. (Portfolio Simulator | Moonfire) There's no one-size-fits-all solution for portfolio size because it depends on the firm's goals. (Portfolio Simulator | Moonfire)
Key data sources include:
def backtest_model(historical_data: pd.DataFrame, start_year: int, end_year: int) -> Dict:
"""Backtest the Monte Carlo model against historical performance"""
actual_returns = []
predicted_returns = []
for year in range(start_year, end_year):
# Get actual portfolio performance for the year
actual_performance = historical_data[historical_data['vintage_year'] == year]['portfolio_return'].mean()
# Run simulation for the same vintage year
simulated_performance = run_portfolio_simulation(
portfolio_size=50, # Adjust based on actual portfolio size
vintage_year=year
)['mean_return']
actual_returns.append(actual_performance)
predicted_returns.append(simulated_performance)
# Calculate correlation and error metrics
correlation = np.corrcoef(actual_returns, predicted_returns)[0, 1]
mae = np.mean(np.abs(np.array(actual_returns) - np.array(predicted_returns)))
return {
'correlation': correlation,
'mean_absolute_error': mae,
'actual_returns': actual_returns,
'predicted_returns': predicted_returns
}
The Monte Carlo model should integrate with existing investment processes:
class DynamicPortfolioManager:
def __init__(self, initial_capital: float, target_portfolio_size: int):
self.initial_capital = initial_capital
self.target_portfolio_size = target_portfolio_size
self.current_portfolio = []
self.available_capital = initial_capital
self.reserves = initial_capital * 0.5 # 50% reserves for follow-ons
def evaluate_new_investment(self, opportunity: Dict) -> bool:
"""Decide whether to make a new investment based on portfolio state"""
if len(self.current_portfolio) >= self.target_portfolio_size:
return False
#
## Frequently Asked Questions
### What is Monte Carlo simulation in venture capital portfolio construction?
Monte Carlo simulation is a statistical modeling technique that uses random sampling to predict potential portfolio outcomes by running thousands of scenarios. In VC investing, it helps analyze the probability distributions of returns and optimize portfolio size and diversification strategies to maximize the likelihood of achieving target returns while managing risk.
### How does Rebel Fund use data to identify successful Y Combinator startups?
Rebel Fund has built the world's most comprehensive dataset of YC startups outside of YC itself, encompassing millions of data points across every YC company and founder in history. They use this data to train their Rebel Theorem machine learning algorithms, which help identify high-potential YC startups. The fund has invested in nearly 200 top YC startups collectively valued in the tens of billions of dollars.
### What is the optimal portfolio size for a YC-focused venture capital fund?
There's no one-size-fits-all answer as optimal portfolio size depends on the fund's goals and strategy. Research shows that larger portfolio sizes increase the likelihood of returning 2-5x invested capital, while smaller concentrated portfolios bet on fewer "best" companies. Most successful VCs implement both approaches, with the choice depending on factors like fund size, risk tolerance, and return targets.
### Why do venture capital returns follow a power law distribution?
VC returns follow a power law because early-stage investing is characterized by extreme outcomes where a small number of investments generate the majority of returns. Most startups fail or return modest amounts, while a few "unicorns" can return 100x or more. This distribution means that portfolio construction must account for the statistical reality that most value comes from outlier successes.
### What are the typical return expectations for venture capital funds?
Most VCs aim to make a 3X net return on initial fund capital at approximately 20% net IRR. However, less than 10-20% of VC funds actually achieve this goal. Investors typically wait 5-10 years to get their initial investment back and often up to 10-15 years for substantial returns, making patience and proper portfolio construction critical for success.
### How has Rebel Theorem evolved to improve YC startup selection?
Rebel Theorem has evolved through multiple iterations, with version 4.0 being the latest machine-learning algorithm model for predicting Y Combinator startup success. Rebel Fund has invested millions of dollars into collecting data and training their internal ML and AI algorithms, helping them identify potential unicorn startups from their portfolio of over 250 YC companies.
## Sources
1. [https://arxiv.org/pdf/2303.11013.pdf](https://arxiv.org/pdf/2303.11013.pdf)
2. [https://export.arxiv.org/pdf/2303.11013v1.pdf](https://export.arxiv.org/pdf/2303.11013v1.pdf)
3. [https://jaredheyman.medium.com/on-rebel-theorem-3-0-d33f5a5dad72](https://jaredheyman.medium.com/on-rebel-theorem-3-0-d33f5a5dad72)
4. [https://jaredheyman.medium.com/on-rebel-theorem-4-0-55d04b0732e3?source=rss-d379d1e29a3f------2](https://jaredheyman.medium.com/on-rebel-theorem-4-0-55d04b0732e3?source=rss-d379d1e29a3f------2)
5. [https://pulse.moonfire.com/972-billion-portfolios-how-to-design-the-optimal-venture-portfolio/](https://pulse.moonfire.com/972-billion-portfolios-how-to-design-the-optimal-venture-portfolio/)
6. [https://www.linkedin.com/posts/jaredheyman_on-rebel-theorem-30-activity-7214306178506399744-qS86](https://www.linkedin.com/posts/jaredheyman_on-rebel-theorem-30-activity-7214306178506399744-qS86)
7. [https://www.moonfire.com/playgrounds/portfolio-simulator/](https://www.moonfire.com/playgrounds/portfolio-simulator/)
8. [https://www.slideshare.net/slideshow/how-to-vc-creating-a-vc-fund-portfolio-model/257493590](https://www.slideshare.net/slideshow/how-to-vc-creating-a-vc-fund-portfolio-model/257493590)
9. [https://www.ycombinator.com/blog/why-would-you-start-a-startup-in-an-economic-downturn](https://www.ycombinator.com/blog/why-would-you-start-a-startup-in-an-economic-downturn)