Related to Value Based Methods.
It’s a Model-Free Policy Evaluation method together with TD Learning.
Monte Carlo uses an entire episode of experience before learning.
- this means we can only apply it to episodic MDPs.
So basically it takes averages of actual returns over episodes?