Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

PSet 13: Q-Learning

(videos: Q-Learning Example)

Q-learning update rule:

Qπ(s,a)Qπ(s,a)+α[R(s,a,s)+γmaxaQπ(s,a)Qπ(s,a)]Q^{\pi}(s,a) \leftarrow Q^{\pi}(s,a) + \alpha [R(s,a,s') + \gamma \max_{a'} Q^{\pi}(s', a') - Q^{\pi}(s,a)]
  1. Pick up where I left off in the Q-learning video. Do two more trials, one where you go N,N,E,E,S,E, and one where you go N,N,E,E,E. Show your work.

  2. Given your results above, if you are in <3,2>, output East, but accidentally go North, which Q value would be affected and how would it change?