Does ADC Gold Lead Determine Victory? A League of Legends Data Analysis

Authors: Kyle Zhao, Philip Chen Course: DSC 80 - The Practice and Application of Data Science, UCSD

Introduction

League of Legends (LoL) is a team-based multiplayer online battle arena (MOBA) game where two teams of five players compete to destroy the opposing team’s nexus. Each player assumes one of five roles: Top lane, Jungle, Mid lane, Bot lane (ADC - Attack Damage Carry), and Support.

This project analyzes professional League of Legends esports match data from 2022, sourced from Oracle’s Elixir, containing detailed statistics from over 10,000 competitive matches. Our analysis focuses on understanding how early-game advantages, particularly for the ADC role, impact match outcomes.

Research Question

Does having an ADC (Bot lane) with a gold lead at 15 minutes significantly impact the likelihood of winning the match?

This question is important because:

The ADC role is considered a “carry” position that scales with gold
The 15-minute mark is a key game state checkpoint in professional play
Understanding early-game advantages can inform strategic decisions

Dataset Description

The dataset contains approximately 150,000 rows (12 rows per game: 10 player rows + 2 team summary rows).

Key columns for our analysis:

Column	Description
`gameid`	Unique identifier for each match
`result`	Binary outcome (1 = win, 0 = loss)
`position`	Player’s role (top, jng, mid, bot, sup)
`kills`, `deaths`, `assists`	Combat statistics
`golddiffat15`	Gold difference at 15 minutes
`xpdiffat15`	Experience difference at 15 minutes
`csdiffat15`	Creep score difference at 15 minutes
`damagetochampions`	Total damage dealt to enemy champions
`monsterkills`	Neutral objectives killed
`minionkills`	Minions killed (CS - creep score)

Data Cleaning and Exploratory Data Analysis

Data Cleaning

Our data cleaning process involved several key steps tied to the data generating process:

Selected relevant columns – Focused on gameplay metrics needed for our questions to reduce noise.
Filtered for data completeness – Removed rows marked incomplete and games ending before 15 minutes; early surrenders don’t generate 15-minute stats, and including them would bias toward short games.
Separated player and team data – Team rows have position team; player rows are positions. This prevents aggregating team summaries with individual stats.
Handled missing values – Kept only rows with 15-minute data for analyses that require it, so metrics are comparable across games.

Here’s the head of our cleaned dataset:

gameid	position	side	kills	deaths	assists	golddiffat15
ESPORTSTMNT01_2690210	top	Blue	2	3	2	-1240
ESPORTSTMNT01_2690210	jng	Blue	2	5	6	321
ESPORTSTMNT01_2690210	mid	Blue	2	2	3	-543
ESPORTSTMNT01_2690210	bot	Blue	2	4	2	892
ESPORTSTMNT01_2690210	sup	Blue	1	5	6	-124

Univariate Analysis

The distribution of kills is right-skewed, with most players having between 0-5 kills per game.

Different positions show varying gold difference patterns at 15 minutes.

Bivariate Analysis

Teams whose ADC has a gold lead at 15 minutes win approximately 65-70% of games.

Interesting Aggregates

Position	Result	Avg Kills	Avg Deaths	Avg Assists
bot	Loss	3.2	4.8	5.1
bot	Win	5.8	2.3	7.2

This table shows winning ADCs (bot) have much stronger KDA averages than losing ADCs, highlighting how ADC performance correlates with team success.

Assessment of Missingness

NMAR Analysis

We believe that golddiffat15, xpdiffat15, and csdiffat15 are likely NMAR because they’re missing when games end before 15 minutes. The missingness depends on game duration, which is not directly observed.

Additional data like match duration or surrender flags would help explain the missingness, potentially making it MAR instead.

Missingness Dependency

Results:

League: p-value < 0.005 — Missingness DEPENDS on league
Result: p-value ≈ 1.0 — Missingness does NOT depend on result

Hypothesis Testing

Hypotheses:

H₀: Teams with ADC gold lead win at the same rate as teams without
H₁: Teams with ADC gold lead win more often

Significance level: α = 0.05
Test statistic: Difference in win rates (ADC gold lead vs no lead), which directly measures the effect size we care about.

Result: p-value < 0.001

We reject the null hypothesis. There is strong evidence that ADC gold leads at 15 minutes significantly increase win probability.

Framing a Prediction Problem

Problem: Predict whether a team will win based on 15-minute statistics

Type: Binary Classification Response Variable: result (1 = win, 0 = loss) Metrics: Accuracy and F1-Score

Baseline Model

Model: Logistic Regression with 2 features

xpdiffat15
csdiffat15

All features are quantitative; no categorical encodings are needed. Implemented as an sklearn Pipeline with StandardScaler + LogisticRegression.

Performance (two-feature baseline):

Test Accuracy: ~72%
Test F1-Score: ~72%

Is it good? Reasonable for a minimal resource-only baseline; leaves room to improve by adding gold-based and engineered features.

Final Model

Model: Random Forest Classifier

New Features:

gold_xp_ratio - Captures gold efficiency
total_resource_lead - Combined advantage metric
golddiffat10 - Earlier game state (if available)

Why these features?

Gold/XP ratio captures efficiency of resource conversion.
Total resource lead aggregates normalized gold/XP/CS to summarize early strength.
10-minute gold diff captures trajectory/tempo before 15 minutes.

Best Hyperparameters:

n_estimators: 100
max_depth: 15
min_samples_split: 2

Performance (with engineered features):

Test Accuracy: ~75% (+~3 percentage points over the two-feature baseline)
Test F1-Score: ~75% (+~3 percentage points over the two-feature baseline)

Tuning: GridSearchCV over depth/estimators/min_samples_split.
Why RF? Handles nonlinear interactions without heavy preprocessing and is robust to mixed-scale quantitative features.
Why it improved: Added gold-based signals and aggregated resource measures capture early-game advantage better than XP/CS alone.

Fairness Analysis

Question: Does our model perform differently for close vs stomp games?

Groups:

Close games: golddiffat15 ≤ 2000
Stomp games: golddiffat15 > 2000

Hypotheses (α = 0.05):

H₀: Accuracy is the same for close and stomp games.
H₁: Accuracy differs between close and stomp games.

Result: p-value = 0.156

We fail to reject the null hypothesis. The model is fair across game types.

Conclusion

Key findings:

ADC gold leads at 15 min predict wins (~31% difference)
Early-game prediction achieves ~75% accuracy
Feature engineering improves performance by about 3 percentage points over the baseline
Model performs fairly across different game states

Project Repository: github.com/philip-chen6/LOL-analysis