Mathematical Foundations of Curiosity in Reinforcement Learning

reinforcement learningk-armed banditthompson samplingbayesian inferenceexploration vs exploitation

May 6, 2026Source

This document explores the mathematical foundation of curiosity in artificial intelligence using the K-armed bandit problem. The author argues that curiosity is not a separate exploration heuristic but rather the consequence of optimal decision-making under uncertainty.

Key Concepts: • The K-armed bandit problem involves balancing exploitation (choosing the currently best arm) with exploration (sampling uncertain arms). • Bayesian inference is used to maintain a posterior distribution over an arm's expected mean, where the distribution narrows as more data is collected. • The author defines curiosity as the behavior that arises when agents act optimally based on uncertain beliefs.

Algorithms and Frameworks:

Thompson Sampling: A decision rule where an agent draws a sample from each arm's posterior and chooses the one with the highest value.
Bayesian Bandit: Iteratively updates posterior beliefs based on rewards, effectively allowing the algorithm to explore arms with wider probability tails.

Comparison: The author benchmarks the Bayesian approach against standard methods from Sutton & Barto (2018), including UCB (Upper Confidence Bound), Optimistic Greedy, Gradient Bandit, and Epsilon-Greedy. UCB performed best at 1.5490, with the Bayesian method following closely at 1.5372.

Entities and Tools: • Author: Francesco Sacco. • Advisor/Reviewer: Matteo Peluso. • Referenced Works: W.R. Thompson (1933), Sutton & Barto (Reinforcement Learning textbook), and Russo et al. (tutorial on Thompson Sampling). • Technologies: Python, NumPy, GitHub, and interactive web-based data visualization.

Core Conclusion: Curiosity is what optimal action looks like when beliefs are uncertain. By framing curiosity as probability matching, the author suggests this approach extends beyond bandits to complex environments like chess engines, where curiosity determines which branches of a game tree require further analysis.

Want AI summaries like this for everything you read?

Timeln saves articles, videos, and posts — then summarizes, tags, and connects them so you never lose a good find again.

Save anything

one click

AI summaries

instant

Connected ideas

automatic

Start saving for free

Free forever · No credit card · 30 seconds to start