UAV Multi-Agent Reinforcement Learning (Formation Control)

Decentralized formation control for quadrotors using a shared-policy MAPPO setup, implemented in PyBullet with a focus on stability, reward design, and repeatable experimentation.

Highlights

Shared-policy MAPPO for cooperative flight with collision-avoidance shaping.
Formation stability benchmarks (spacing tolerance, drift, oscillation checks).
Structured logging and seed-controlled runs for reproducibility.

Approach

Environment: PyBullet swarm scenario with configurable spawn spacing and noise; curriculum to increase difficulty across episodes.
Reward shaping: Distance-to-formation center, heading alignment, and penalties for collisions or excessive thrust.
Training: MAPPO with attention-based critic experiments; batch normalization for sensor inputs; entropy tuning for safer exploration.
Evaluation: Success thresholds on formation error, collision counts, and episode stability windows; saved rollouts for visual review.

Artifacts

Code and experiments: github.com/ememchijioke/uav-multi-agent-mappo
Logs and configs: reproducible run scripts with seed and hyperparameter tracking.