General structure
Model discrete
Model Discrete
PPO
MlpPolicy
Curriculum Learning
Architecture
Policy network (π): [800, 100, 100]
Value function (V): [800, 100, 100]
Inputs
channel 1 : agent
channel 2 : visibility
channel 3 : target (2)/distractors(1)
Outputs
Discrete actions (agent + mask)
Hyperparameters
gamma : 0.9
Entropy : 0.02
Model continuous
Model
CNN
PPO
MlpPolicy
Architecture
CNN
2 convolutional layers
Max pooling
PPO
Policy network (π): [128, 128, 128]
Value function (V): [128, 128, 128]
Inputs
CNN
Environment state
PPO
Perceived target position (CNN)
Continous agent position
Previous velocity
Mask position
Previous mask action
Outputs
CNN
Target position
PPO
Acceleration (x, y)
Mask shift
Hyperparameters
Gamma : 0.95
Lambda : 0.95
Entropy : 0.05
Clip_range 0.3
Learning rate : 0.0004
Model gaze control
Model
CNN
PPO
MlpPolicy
Architecture
CNN
2 convolutional layers
Max pooling
PPO
Policy network (π): [256, 64, 32]
Value function (V): [256, 64, 32]
Inputs
CNN
Environment state
PPO
Perceived target position (CNN)
Discrete gaze position
Percieved mask position
Outputs
CNN
Target position
PPO
Gaze position (x, y)
Percieved mask position
Hyperparameters
Gamma : 0.99
Lambda : 0.99
Entropy : 0.01
Clip_range 0.2 to 0.1
Learning rate : 1e-4 to 1e-6