Reinforcement Learning Experiments (CarRacing-v2)

Environment setup: Frame stacking, action smoothing, and observation normalization; wrappers for reproducibility and cleanup.
Training loop: Config-driven runs, periodic evaluation episodes, and artifact saving for later comparison.
Analysis: Simple notebooks for metric inspection and hyperparameter sweeps; attention to reset conditions and failure modes.

Learning-driven PPO/SAC experiments on CarRacing-v2 that emphasize clean pipelines, structured logging, and reproducible evaluation for control tasks.