This course delved into online learning and reinforcement learning algorithms. I implemented several key algorithms, including multi-armed bandit algorithms like EXP3 and Hedge, as well as value iteration, policy evaluation, and temporal difference methods.