Reinforcement Learning
Learning from interaction with the environment
- Value function
- Q-learning - Q values associated with state-action pairs
Q(s,a) = Q(s,a) + ? [r + ? maxa’Q(s’,a’) – Q(s,a)]
where
- ?: learning rate
- r: reward achieved by executing a in s
- ?: discount factor
- maxa’Q(s’,a’): maximum Q for actions in the next state