Learnable Reward Functions for Reinforcement Learning
Aug 9, 2022
This project started with the goal of improving learning speed for long horizon, sparse reward tasks in Reinforcement Learning (RL). The approach consists of learning a smooth reward function from a sparse reward function, effectively increasing the amount of feedback received by the RL agent for learning. In general, the idea was to improve learning stability and convergence speed, while maintaining a reward structure coherent with the original task. The learnable reward function is modeled by a neural network. The training loop consists of simultaneously training a REINFORCE agent alongside the reward function. The agent weights backpropagate through the reward function, giving necessary feedback for learning. Since feedback is sparse at first, we use curriculum learning to gradually increase task difficulty according to the difficulty of the task. After testing on the CartPole environment, the agent was able to learn a meaningful reward function as shown in the image below.Heatmap of learnt reward on CartPole environment.