Dynamic interval restrictions on action spaces
in deep reinforcement learning for obstacle avoidance

A full documentation can be found here.

Videos of the evaluation rollouts are uploaded to YouTube

Abstract

Deep reinforcement learning algorithms typically act on the same set of actions. However, this is not sufficient for a wide range of real-world applications where different subsets are available at each step. In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles. When actions that lead to collisions are avoided, the continuous action space is split into variable parts. Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets, and the available actions are learned from the observations. Therefore, we propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals. We demonstrate their performance in an obstacle avoidance task and compare the methods to penalties, projection, replacement, as well as discrete and continuous masking from the literature. The results suggest that discrete masking of action-values is the only effective method when constraints did not emerge during training. When restrictions are learned, the decision between projection, masking, and our ConstraintNet modification seems to depend on the task at hand. We compare the results with varying complexity and give directions for future work.

Quickstart

Adhere to the following steps:

Start by cloning this GitHub repository

$  git clone https://github.com/timg339/master_thesis.git
$  cd master_thesis

Install the dependencies

$  pip install -r requirements.txt

Optionally train an agent or skip this step and use the provided converged algorithms

$ python3 cli.py train --environment obstacle_avoidance-random --algorithm PPO

Evaluate the approach without exploration

$ python3 cli.py evaluate

Explore the results in the corresponding notebook

$ jupyter notebook

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Contents:

Indices and tables