Current Setup:
DQN and Q-learning are already done with Discrete Control Actions
To be introduced:
Policy Gradient Algorithm with Continuous Control Actions
Major Changes:
- Add a continuous control environment derived from DQN environment by Change the action space
- Add a brain to the f1rl module to execute the actions