Related to Value Based Methods.

It’s a Model-Free Policy Evaluation method together with TD Learning.

Monte Carlo uses an entire episode of experience before learning.

  • this means we can only apply it to episodic MDPs.

So basically it takes averages of actual returns over episodes?