🚀 Costin Chitic

Search

Recent Notes

Actor-Critic Methods
Jun 27, 2026
Deep Q-Learning
Jun 27, 2026
Monte Carlo Learning
Jun 27, 2026
Proximal Policy Optimization (PPO)
Jun 27, 2026
Q-Learning
Jun 27, 2026

❯

❯

Monte Carlo Learning

Monte Carlo Learning

Jun 27, 20261 min read

Related to Value Based Methods.

It’s a Model-Free Policy Evaluation method together with TD Learning.

Monte Carlo uses an entire episode of experience before learning.

this means we can only apply it to episodic MDPs.

V (S_{t}) \leftarrow V (S_{t}) + α [G_{t} - V (S_{t})]

So basically it takes averages of actual returns over episodes?

Graph View

Backlinks

Temporal Difference Learning
Value Based Methods
Policy Gradient Methods
AI for Robotics

Created with Quartz v4.2.3 © 2026

GitHub
Discord Community