Reinforcement Learning

Learning from interaction with the environment

Value function
- Q-learning - Q values associated with state-action pairs

Q(s,a) = Q(s,a) + ? [r + ? maxa’Q(s’,a’) – Q(s,a)]

?: learning rate
r: reward achieved by executing a in s
?: discount factor
maxa’Q(s’,a’): maximum Q for actions in the next state

Previous slide Next slide Back to first slide View graphic version