Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

PSet 11: Value Iteration

(videos: up to Iteration Example)

  1. In the video, we ran value iteration on the grid world for two iterations. Repeat for two more iterations. Draw a new copy of the grid for each iteration so I can see what’s going on. (Note: Dr. Crabbe said something not true in the video- he said the value of the square to the left of the gems (0.8) never changes. It does change the next iteration.) Use the same values for constants as in the video: γ=0.9γ=0.9, and the action outcome distribution is ⟨0.1,0.8,0.1⟩

    Use the Q(s,a)Q_*(s,a) equation to update the values:

    V(s)=maxaQ(s,a)V_∗(s)=max_a Q_∗(s,a)
    Q(s,a)=sT(s,a,s)[R(s,a,s)+γV(s)]Q_∗(s,a)=\sum_{s'} T(s,a,s')*[R(s,a,s')+γV_∗(s')]