Introduction
Baseball prediction can start simple. If you know a hitter's recent hit rate and expected at-bats, you can estimate next-game production.
In this project, you will build a lightweight prediction model and flag breakout-watch performances.
What we're building
By the end of this project, you will:
- Store player data in dictionaries
- Detect trends in recent production
- Predict next-game hits from recent rates
- Classify breakout potential with clear rules
Why this matters
If you can build and explain a baseline model, you are learning the same workflow used in larger analytics systems.
Part 1: Quick Start
Set up your project in Cursor
For this project, you will create your files from scratch (no starter pack download).
Before you start: quick vocabulary
- A folder is a container that holds files.
- A file is one item inside a folder.
- A file extension is the ending in the file name:
.pymeans Python code..csvmeans table-style data.
- Pick an easy location for your project folder:
- Mac:
Desktopin Finder - Windows:
Desktopin File Explorer - Chromebook:
My filesin the Files app
- Mac:
- Create a folder called
baseball-performance-predictor. - In Cursor, go to
File > Open Folderand openbaseball-performance-predictor. - In Cursor's left file explorer:
- click New Folder and create
data - click New File and create
performance_predictor.py - open
data, click New File, and createrecent_games.csv
- click New Folder and create
- If you do not see file extensions on your computer, that is okay. Type the full names exactly, including
.pyand.csv.
Writing the code
Type this into performance_predictor.py:
recent_hits = [1, 2, 3, 1, 2]
recent_at_bats = [4, 5, 5, 4, 5]
projected_at_bats = 4
hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats)
predicted_hits = hits_per_at_bat * projected_at_bats
print(f"Predicted hits next game: {predicted_hits:.2f}")
You just built a simple model:
prediction = hits per at-bat x projected at-bats
Running the code
Save the file (Cmd + S on Mac, Ctrl + S on Windows/Chromebook).
Run the file with Cursor's built-in Run Python File play button in the top-right.
If the play button is missing, install/enable the Python extension in Cursor and reopen performance_predictor.py.
Expected output:
Predicted hits next game: 1.60
If your output matches, your setup is working.
Part 2: Project Milestones
Milestone 1: Store player data in a dictionary
Dictionaries let you name each value so your code stays readable.
Add this below your existing code:
player = {
"name": "Diego Alvarez",
"team": "Greenville Hawks",
"season_average_hits": 1.35,
"recent_hits": [1, 2, 3, 1, 2],
"recent_at_bats": [4, 5, 5, 4, 5]
}
print(player["name"])
print(player["recent_hits"])
Expected output:
Diego Alvarez
[1, 2, 3, 1, 2]
Milestone 2: Build reusable prediction functions
Now wrap your model logic into reusable functions.
Add this below Milestone 1:
def average(values):
return sum(values) / len(values)
def predict_hits(recent_hits, recent_at_bats, projected_at_bats):
hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats)
return hits_per_at_bat * projected_at_bats
projected_at_bats = 4
prediction = predict_hits(
player["recent_hits"],
player["recent_at_bats"],
projected_at_bats
)
print(f"{player['name']} predicted hits: {prediction:.2f}")
Expected output:
Diego Alvarez predicted hits: 1.60
Milestone 3: Detect trend direction
Prediction is not just one number. You also want to know if recent games are trending up or down.
Add this below your current code:
recent = player["recent_hits"]
first_half_avg = average(recent[:2])
second_half_avg = average(recent[-2:])
if second_half_avg > first_half_avg:
trend = "up"
elif second_half_avg < first_half_avg:
trend = "down"
else:
trend = "flat"
print(f"Trend: {trend}")
Expected output:
Trend: up
Milestone 4: Add breakout watch logic
Now convert prediction + season average into a label.
Add this below your current code:
def breakout_label(predicted_value, season_average, breakout_margin):
if predicted_value >= season_average + breakout_margin:
return "Breakout watch"
if predicted_value >= season_average:
return "On pace"
return "Below season pace"
tag = breakout_label(prediction, player["season_average_hits"], 0.35)
print(f"Tag: {tag}")
Expected output:
Tag: On pace
Milestone 5: Load recent game logs from CSV
Now keep game data in a CSV so you can update predictions without editing Python logic.
- Open
data/recent_games.csv. - Paste the data below.
- Save the file.
- Make sure the file name is exactly
recent_games.csv(notrecent_games.csv.txt).
Paste this into data/recent_games.csv:
date,hits,at_bats
2026-02-01,1,4
2026-02-03,2,5
2026-02-05,3,5
2026-02-07,1,4
2026-02-10,2,5
Now load that CSV in Python and re-run the prediction from file data. Add this below your existing code:
import csv
recent_hits = []
recent_at_bats = []
with open("data/recent_games.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
recent_hits.append(float(row["hits"]))
recent_at_bats.append(float(row["at_bats"]))
player["recent_hits"] = recent_hits
player["recent_at_bats"] = recent_at_bats
prediction = predict_hits(recent_hits, recent_at_bats, projected_at_bats)
first_half_avg = average(recent_hits[:2])
second_half_avg = average(recent_hits[-2:])
if second_half_avg > first_half_avg:
trend = "up"
elif second_half_avg < first_half_avg:
trend = "down"
else:
trend = "flat"
print(f"CSV-based prediction: {prediction:.2f}")
Click Run Python File again and verify your CSV-based prediction prints.
Expected output (your exact decimal may vary slightly):
CSV-based prediction: 1.60
Milestone 6: Build a final prediction card
Now combine prediction, trend, and breakout tag in one clean report block.
Add this below Milestone 5:
tag = breakout_label(prediction, player["season_average_hits"], 0.35)
print("=" * 50)
print(f"PREDICTION CARD: {player['name']}")
print("=" * 50)
print(f"Projected at-bats: {projected_at_bats}")
print(f"Predicted hits: {prediction:.2f}")
print(f"Season average hits: {player['season_average_hits']:.2f}")
print(f"Trend: {trend}")
print(f"Tag: {tag}")
print("=" * 50)
Expected result:
- A final card with projected at-bats, prediction, season average, trend, and breakout tag.
If you made it this far, you built a complete baseball prediction workflow from scratch.
Common Fixes
FileNotFoundError: confirm path is exactlydata/recent_games.csv.ValueError: checkhitsandat_batscolumns are numeric.- No play button: install/enable Python extension in Cursor and reopen
performance_predictor.py.
Bonus Exercises: Push It Further with Your Agent
Use this short prompt structure for better AI help:
- Goal: what you want to build
- Context: file name + what code already exists
- Constraints: keep beginner-friendly, minimal changes
- Output format: ask for exact code + where to paste + expected output
Bonus 1: Predict Multiple Players
Try this prompt:
You are my Python tutor and coding assistant.
Goal:
Predict hits for 3 hitters and print one prediction card per player.
Context:
I am editing performance_predictor.py.
I already have predict_hits(...), breakout_label(...), and a final prediction card format.
Constraints:
- Keep my current single-player flow working.
- Add only what is needed for multi-player support.
- Use beginner-friendly Python.
Output format:
1) Plan (3 bullets)
2) CSV format I should use
3) Exact code changes
4) Where to paste each code block
5) Example output for 2 players
Bonus 2: Add a Prediction Range
Try this prompt:
You are my Python tutor and coding assistant.
Goal:
Show low / medium / high hit predictions instead of one value.
Context:
Current code calculates one prediction from recent hits and at-bats.
Constraints:
- Keep formulas simple (no advanced libraries).
- Explain the math in one short paragraph.
- Keep my current prediction card and add range lines.
Output format:
1) Range formula and reason
2) Code changes only
3) Expected output block
4) One way to tune the range width
Bonus 3: Compare Prediction vs Actual
Try this prompt:
You are my Python tutor and coding assistant.
Goal:
Compare predicted hits to actual hits and calculate error.
Context:
I already compute prediction in performance_predictor.py.
I want to add this after the final prediction card.
Constraints:
- Keep this beginner-level.
- Show both signed error and absolute error.
- Do not remove existing outputs.
Output format:
1) Code block to add
2) Where to paste it
3) Expected output with sample numbers
4) One sentence explaining how to interpret error