M.Tech Research
PPO in Matrix Games — Iterated Prisoner's Dilemma
Applying Proximal Policy Optimization to classical matrix games to study emergent cooperation, policy convergence, and game-theoretic stability between learning agents.
Overview
A study of how Proximal Policy Optimization behaves in classical matrix games — the Iterated Prisoner's Dilemma in particular. Two PPO agents learn concurrently in the same environment, and the interesting question is what they converge to: mutual defection, tit-for-tat-like reciprocity, or something stranger. The project implements a custom Gymnasium environment, trains the agents, and analyzes the resulting policies through a game-theoretic lens.
The Problem
Classical game theory gives closed-form predictions for matrix games, but modern deep RL agents don't necessarily converge to those equilibria — especially when both agents learn at once. Understanding the gap between theoretical equilibria and empirically learned policies is important for any multi-agent system that will operate in mixed-motive settings.
My Role & Contribution
- Built the custom Gymnasium matrix-game environment supporting arbitrary payoff matrices
- Ran training sweeps across hyperparameters and analyzed convergence behavior
- Compared learned policies against game-theoretic baselines (tit-for-tat, always-defect, always-cooperate)
Approach
- Custom Gymnasium environment wrapping the iterated matrix game with configurable payoff matrix and history length
- Stable-Baselines3 PPO as the learning algorithm, with recurrent and MLP policies for comparison
- Self-play and fixed-opponent training regimes to isolate the effect of co-adaptation
- Analysis of learned action distributions, cooperation rates over training, and stability under perturbation
- Visualization of training dynamics and equilibrium regions with Matplotlib
Tech Stack
Python
PyTorch
Stable-Baselines3
NumPy
Matplotlib
Gymnasium
Results & Impact
- Empirical characterization of where PPO converges vs. game-theoretic equilibria
- Open-source code and environment others can build on for further matrix-game RL study
// TODO: add diagrams / screenshots