Imagine a child with two new toys: one that predictably lights up, and another that flickers randomly. Which one holds their attention longer? This simple scenario encapsulates a profound question about human and artificial intelligence: how do intelligent agents balance the innate drive to seek new knowledge (curiosity) with the desire to master and control their environment (competence)? This paper delves into this fundamental interplay, bridging cognitive theories of intrinsic motivation with the practical realm of reinforcement learning (RL).
Humans are naturally drawn to both the unknown, driven by a need to reduce uncertainty, and the predictable, motivated by a desire to exert influence. Curiosity, often linked to novelty and information gain, pushes us to explore and refine our mental models of the world. Conversely, competence, associated with empowerment and skill learning, compels us to predict and control outcomes. While these drives might seem sequential—first learn, then act—they are, in reality, deeply recursive. A child learns to walk (competence) to access new areas (curiosity), and the desire to explore new places (curiosity) fuels the mastery of locomotion (competence). This bidirectional relationship is crucial for adaptive exploration.
However, existing RL agents often struggle with this delicate balance. Curiosity-driven agents can get stuck in the "noisy TV problem," endlessly distracted by uncontrollable stimuli that offer no real opportunity for mastery. Competence-focused agents, on the other hand, frequently assume fixed world models, failing to account for how exploration itself might reshape their understanding of the environment. This paper aims to unravel how agents dynamically adjust their focus between these two powerful intrinsic motivations.
Problem/Goal
The core problem addressed by this research is understanding how agents effectively balance curiosity and competence during learning and exploration, particularly as their internal representations (world models) evolve. The paper asks: under what conditions should an agent prioritize curiosity over competence, and how do these priorities shift as its understanding of the world deepens? The ultimate goal is to shed light on the mechanisms that enable both humans and artificial agents to navigate the complex trade-off between learning and doing, uncertainty and mastery.
Specifically, the authors investigate three intrinsic motivations in environments with sparse rewards:
Novelty: The drive to seek out unfamiliar states and experiences.
Information Gain: The motivation to reduce epistemic uncertainty by exploring actions whose outcomes are less predictable, thereby improving the world model.
Empowerment: The desire to maximize perceived control over future states, prioritizing situations where actions reliably lead to diverse and predictable outcomes.
The research aims to determine if these motivations are functionally redundant or if each offers unique advantages depending on the environmental context. It also seeks to understand how environmental structures — deterministic or randomly changing, safe or dangerous — and the agent's evolving world model systematically influence the effectiveness of different exploration strategies.
Solution
To explore this intricate balance, the researchers employed two distinct types of model-based RL agents in grid-world environments:
Tabular Q-learning Agent: This agent uses predefined, handcrafted state representations. This setup provides a controlled baseline, allowing the researchers to isolate how different intrinsic motivations influence exploratory behavior when the agent's understanding of "states" is fixed. It focuses on refining its transition model of the environment.
Dreamer World-Model Agent: This agent autonomously learns latent state representations from raw pixel observations. This more advanced agent allows for a crucial bi-directional interaction: the intrinsic motivations are derived from dynamically evolving latent states, and exploration, in turn, reshapes these learned representations. This setup helps investigate whether agents can bypass perceptual distractions (like a "noisy TV" producing irreducible randomness) to focus on controllable uncertainties.
The environment itself was a heterogeneous "playground" designed to mimic real-world challenges, featuring lava (irreversible penalties), ice (random transitions), and walls (constraining mobility). This design introduced fundamental trade-offs between risk, uncertainty, and control.
The agents were evaluated based on their ability to discover new states and the number of "deaths" (falling into lava). The researchers also tested combinations of intrinsic motivations, particularly information gain and empowerment, to see if a synergistic effect could lead to more robust exploration.
Conclusion/Learning
The findings reveal a nuanced interplay between curiosity and competence, demonstrating that these intrinsic motivations play complementary roles in guiding exploration and enhancing generalization.
For the Tabular agent, information gain proved most effective at thorough exploration but came at a cost: the agent often risked death by trying to perfectly predict stochastic ice cells. Empowerment, conversely, was overly cautious, often staying in the starting state and avoiding exploration entirely. Crucially, a simple sum of information gain and empowerment led to a more balanced approach, achieving a higher discovery-to-death ratio. This hybrid agent explored most of the environment while intelligently avoiding uncontrollable areas as soon as they were recognized.
The Dreamer agent, which learned its own world representations, mirrored some of these findings but with less pronounced synergistic effects. Empowerment alone still led to poor exploration, but the Dreamer agent could explore its immediate starting area more effectively than the Tabular agent. When tested on unary-state grid worlds (environments with isolated features like only harmful or only randomly altering elements), the results highlighted the context-specific nature of each motivation. Novelty and information gain excelled in deterministic environments, while empowerment was more effective in stochastic settings. However, hybrid approaches (information gain + empowerment) generally performed as well as or better than individual motivations, suggesting a promising compromise.
A key learning from the Dreamer agent's behavior was how it handled different types of uncertainty. Novelty-driven exploration could get stuck in trivial forms of novelty, where the world model's internal representations changed slightly, giving a subjective sense of novelty without leading to meaningful exploration. Information gain, while good in deterministic contexts, struggled in random ones, often conflating reducible (epistemic) uncertainty with irreducible (aleatoric) uncertainty. This led to the agent fixating on inherently unpredictable transitions (like randomly moving walls) rather than truly exploring. Empowerment, by prioritizing control, sometimes hindered exploration in deterministic environments by keeping the agent in a "comfort zone" (regions with predictable dynamics). However, in unpredictable environments, this "safety first" approach became adaptive, allowing the agent to roam continuously to maintain influence over outcomes, contrasting sharply with information gain's tendency to get trapped by irreducible noise.
In essence, the research formalizes adaptive exploration as a dynamic balance between pursuing the unknown (curiosity) and the controllable (competence). While each intrinsic motivation has context-specific advantages, combining them often yields synergistic benefits, illustrating their complementarity.
Key Takeaways
Curiosity and Competence are Complementary: Neither curiosity (novelty, information gain) nor competence (empowerment) is universally optimal for exploration; they offer distinct advantages depending on the environment.
Hybrid Strategies Excel: Combining information gain and empowerment leads to more balanced and effective exploration, especially in environments with both reducible and irreducible uncertainties.
World Models Matter: The agent's internal representation of the world (its "world model") significantly mediates the trade-off between curiosity and competence, influencing exploration patterns.
Distinguishing Uncertainty: Effective agents learn to differentiate between reducible (epistemic) uncertainty, which can be learned, and irreducible (aleatoric) uncertainty, which cannot. Empowerment helps avoid fixation on irreducible noise.
Adaptive Prioritization: Humans and sophisticated AI agents need to dynamically adjust their focus between seeking new knowledge and asserting control, a process mirrored by the co-evolution of world models and exploration strategies.