Command-Line Interface

The command line interface is the central point to train agents and run evaluation rollouts. Just make sure that the training and configuration is properly set in the config-file before running the commands.

Training

python3 cli.py train [OPTIONS]

--algorithm: Specify the algorithm to train. Options are PPO, PPO-MASKED, TD3, DQN, DQN-MASKED, MPS-TD3, and PAM.
--environment: Specify the environment to use. Options are obstacle_avoidance, oil_extraction, and fuel_saving. Add one of -random, -masking, -euclidean, -discretization, -p_discretization for random replacement, continuous masking, euclidean projection, discretization, and parameterized discretization
--hpo: Flag to indicate whether that hyperparameter optimization should be performed.

For example, run the following command with the standard configuration to train PPO with random replacement without dynamic obstacles:

python3 cli.py train --algorithm PPO --environment obstacle_avoidance-random

Keep in mind that the training can take some time (up to 72 hours). PPO is generally faster than the off-policy algorithms.

Alternatively, the already converged models, which were used in the experiments of this thesis, can be used in the evaluation.

Evaluation

python3 cli.py evaluate

For example, run the above command with the standard configuration to reproduce the results with the pretrained MPS-TD3 agent and complex restrictions (14 obstacles):

Afterward, you can load the step- and episode-results files in the evaluation notebook to visualize the outcomes.