PSet 11: Value Iteration
(videos: up to Iteration Example)
In the video, we ran value iteration on the grid world for two iterations. Repeat for two more iterations. Draw a new copy of the grid for each iteration so I can see what’s going on. (Note: Dr. Crabbe said something not true in the video- he said the value of the square to the left of the gems (0.8) never changes. It does change the next iteration.) Use the same values for constants as in the video: , and the action outcome distribution is ⟨0.1,0.8,0.1⟩
Use the equation to update the values: