This document explores the mathematical foundation of curiosity in artificial intelligence using the K-armed bandit problem. The author argues that curiosity is not a separate exploration heuristic but rather the consequence of optimal decision-making under uncertainty.
Key Concepts: • The K-armed bandit problem involves balancing exploitation (choosing the currently best arm) with exploration (sampling uncertain arms). • Bayesian inference is used to maintain a posterior distribution over an arm's expected mean, where the distribution narrows as more data is collected. • The author defines curiosity as the behavior that arises when agents act optimally based on uncertain beliefs.
Algorithms and Frameworks:
Comparison: The author benchmarks the Bayesian approach against standard methods from Sutton & Barto (2018), including UCB (Upper Confidence Bound), Optimistic Greedy, Gradient Bandit, and Epsilon-Greedy. UCB performed best at 1.5490, with the Bayesian method following closely at 1.5372.
Entities and Tools: • Author: Francesco Sacco. • Advisor/Reviewer: Matteo Peluso. • Referenced Works: W.R. Thompson (1933), Sutton & Barto (Reinforcement Learning textbook), and Russo et al. (tutorial on Thompson Sampling). • Technologies: Python, NumPy, GitHub, and interactive web-based data visualization.
Core Conclusion: Curiosity is what optimal action looks like when beliefs are uncertain. By framing curiosity as probability matching, the author suggests this approach extends beyond bandits to complex environments like chess engines, where curiosity determines which branches of a game tree require further analysis.
Timeln saves articles, videos, and posts — then summarizes, tags, and connects them so you never lose a good find again.
Save anything
one click
AI summaries
instant
Connected ideas
automatic
Free forever · No credit card · 30 seconds to start