RL Foundations
CartPole — Deep Q-Network from Scratch
A clean-room DQN implementation on the classic CartPole control task — experience replay, target networks, and epsilon-greedy exploration built from first principles.
Overview
A from-scratch implementation of Deep Q-Networks on OpenAI Gym's CartPole-v1. No Stable-Baselines3, no RLlib — the Q-network, replay buffer, target-network sync, and epsilon-greedy exploration schedule are all written directly in PyTorch. The goal was to internalize the moving parts of DQN and build a reference implementation small enough to read end-to-end.
The Problem
DQN is easy to pip-install but hard to understand unless you build it yourself. The subtle parts — why experience replay matters, why a separate target network is needed, why epsilon decay matters, how to diagnose a non-learning agent — only click when you've debugged each one. This project deliberately reinvents the wheel as a learning exercise.
My Role & Contribution
- Implemented the full DQN algorithm — network, replay buffer, target network, training loop
- Tuned hyperparameters (learning rate, buffer size, target sync frequency, epsilon schedule) to reach CartPole's solved threshold
- Documented the implementation so it reads as a reference for others learning DQN
Approach
- Small MLP Q-network in PyTorch — two hidden layers, ReLU activations, linear output over the action space
- Replay buffer implemented as a fixed-size deque with uniform random sampling
- Separate target network, soft- or hard-synced from the online network at a configured interval
- Epsilon-greedy exploration with a decaying schedule from full exploration to near-greedy
- Smooth-L1 (Huber) loss between predicted Q-values and Bellman targets
- Matplotlib training curves showing reward, loss, and epsilon over episodes
Tech Stack
Python
PyTorch
OpenAI Gym / Gymnasium
NumPy
Matplotlib
Results & Impact
- Agent reliably solves CartPole-v1 (sustained 500-step episodes) within a small training budget
- Reference implementation short enough to read end-to-end
// TODO: add diagrams / screenshots