An reinforcement learning project to learn the fundamentals and underlying algorithms behind my favourite form of AI.
- refactor Q-Learning script to a monobehavior to enable simultaneous agents.
- update readme
- add an arrow visualisation to show preferred direction at each state.
This took me so much time to understand, and still might be wrong, but here is my understanding:
Q = Q-value The quality score.
Q(s,a) = The Q-value of a state and action pair.
s = current state. In this project, position on the grid.
a = action. All the possible actions an agent can make.
α = alpha. How quickly the agent learns, or the degree to which an action influences future decisions (0-1).
r = reward. The reward for an action in the current state.
g = gamma. The discount factor. How much does the agent value future rewards (1) over immediate rewards (0)? Or, the battle between watching YT shorts about RL learning vs. actually finishing this project.