PPO Agent Playing LunarLander-v3

This is a trained model of a PPO agent playing LunarLander-v3.

Hyperparameters

{'exp_name': 'lunar-lander-hf-V2'
'model_version': '0'
'use_checkpoint': False
'convolutional': False
'keep_last_k': 2
'seed': 42
'record_video': False
'remove_old_video': False
'fps': 30
'n_eval_steps': 20000
'env_id': 'LunarLander-v3'
'n_envs': 4
'eps': 1e-05
'learning_rate': 0.00025
'anneal_lr': True
'gae': True
'gae_lambda': 0.98
'gamma': 0.999
'total_timesteps': 1000000
'n_steps': 128
'batch_size': 512
'n_mini_batches': 4
'mini_batch_size': 128
'update_epochs': 4
'norm_adv': True
'clip_coef': 0.2
'clip_vloss': True
'entropy_coef': 0.01
'vf_coef': 0.5
'max_grad_norm': 0.5
'target_kl': 0.015
'torch_deterministic': True
'cuda': True
'device': device(type='cuda')
'track_run': False
'wandb_project_name': 'RL'
'wandb_entity': None
'TEMP': '../temp'
'VIDEOS': '../videos'
'MODELS': '../../rl-module/models'
'runs_path': '../temp/runs'
'wandb_path': '../temp/wandb'
'videos_path': '../videos/lunar-lander-hf-V2'
'models_path': '../../rl-module/models/lunar-lander-hf-V2'
'checkpoint_path': '../temp/lunar-lander-hf-V2'}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v3
self-reported

267.15 +/- 24.68