The Brain's Predictive Hierarchy
Learning, Perceiving, and Acting Through Hierarchical Prediction
In robotics, cognitive systems are typically framed as a pipeline: from sensory input to perception, planning, control, and action. In biological brains, these boundaries are far less distinct—so tightly coupled that treating them separately misses the bigger picture. Instead, hierarchy, not modular stages, defines the meaningful boundaries, enabling composable capabilities with remarkable efficiency. In this post, I’ll argue that perception and action are not separate processes but two facets of a single predictive mechanism reconciling sensory input with internal goals or preferences, with hierarchy serving as the organizing principle.
From Chaos to Control: Learning in Layers
In their first days, babies move chaotically. Soon, they begin tracking faces with their eyes and grasping objects, while their limbs still move aimlessly. Gradually, tracking extends to the head, then the whole body. Before long, they’re running and talking about everything they see and think.
Learning is hierarchical: each layer builds on the stability of the one below. Gazing is a clear example. Visual saccades become more precise as we learn to identify objects and understand spatial structure. Only once this structure is internalized can head movements become purposeful — to track beyond eye limits, locate sounds, or explore the environment.
Action can only be controlled when there’s an expected outcome to measure against. The earliest sensorimotor learning happens in reflex arcs: tight loops between sensory receptors and motor effectors. These loops form localized models of the physical world, learning the dynamics of muscle stretch and feedback. Once these models stabilize, higher loops build more comprehensive skills and reflexes. This layered stabilization is efficient, as higher levels don't need to constantly micro-manage or relearn the foundational dynamics handled reliably below.
Hierarchical learning also governs memory and abstract thought. Episodic memory depends on prior concepts — which is why we remember so little from early childhood: we lacked the building blocks to form full memories. As our conceptual toolbox grows, we can encode increasingly complex ideas. Language reflects this: once a concept is well understood, we give it a name to compress it. Scientific jargon epitomizes this compression.
Predictive Brains: A Hierarchy of Expectations
We wouldn’t invest so much energy into learning unless it paid off biologically. Living systems have internal needs — to maintain temperature, nutrition, social standing, reproduction. A model of the world allows us to estimate where we stand relative to these needs, and more importantly, how to act to satisfy them.
A mature brain discards most sensory input. This challenges the idea that perception is passive processing, as many robotic systems assume. Instead of constantly reconstructing reality, the brain uses its internal model to act and predict – a concept central to theories like predictive coding (or predictive processing). If incoming input matches expectations, it’s discarded early. Only surprising signals propagate upward to update the model, which is where most cognitive effort is spent.
Consider running through a forest. The motor system continually updates foot placement — high uncertainty, high processing. But if the trail is familiar, the broader path doesn’t need updating. A fallen tree might trigger a local update; a storm might call for a full reroute. Or an emergency might trigger an abrupt context switch. These are different scales of "prediction failure", requiring updates at different model depths.
This example illustrates a few big advantages of hierarchical predictive models:
Predictions can fail progressively
Computational cost scales with surprise (as only surprising signals require costly updates)
Updates and context switches are efficient
Hierarchies also span time and synaptic plasticity. Early layers predict short-term features and are fast-changing and highly plastic. Deeper layers operate across longer timescales and adapt more slowly.
Perception as Action: The Enactive Mind
Perception and action are intertwined. Living systems perceive the world in terms of what can happen to or through them — they hold enactive models. Prediction inherently involves choosing an action, a sequence of actions, or even a long-term plan. As neuropsychoanalyst Mark Solms puts it, capturing this active, goal-oriented nature: "We perceive hierarchical prioritized deviations from our expectations".
We don’t passively predict what might happen—we prioritize futures based on our goals. Our preferences shape both prediction and perception, because our actions influence outcomes.
Here is an interesting perspective shift. If the world (including our body) perfectly matched our internal goals, no prediction errors would occur. Sensory input would align with expectations and be discarded before reaching awareness. Without prediction errors, we would not perceive anything.
Nonetheless, the world is ever-changing and even our internal biology is subject to randomness. Managing change is the nature of living things, and action is the means by which we keep things in order. Action is the direct result of our mental negotiation between what we expect and what we prefer, selecting actions that are most likely to bring outcomes towards our goals.
If a hungry cat sees food, a plan is set in motion. If there's a gap in the way, it will quickly assess whether it can jump, and how. The high-level goal cascades down the predictive hierarchy, eventually activating relevant sensors and muscles, while filtering out distractions.
Passive observation has marginal biological value. We tend to perceive what we can act on, and ignore the rest. Our knowledge grows from interactions, while non-actionable information tends to be neglected. Humans are somewhat exceptional in this regard. Enabled by the depth and abstraction of our hierarchical models, we can attend to information beyond immediate survival needs, defining what matters within complex cultural or imagined realities. But for most other animals, reality goes as far as the eyes can see.
Towards Machine Intelligence
Hierarchical prediction is one of the key features that allow human intelligence to run inference on a palm-sized, 20-watt computer 🧠. On the training side, hierarchical action makes sampling so efficient that nearly all learning happens online, reusing the very same inference compute. Contrast that with modern AI models: a state-of-the-art chatbot like GPT-4 requires massive offline training involving thousands of GPUs, consuming megawatts of power over months, just to operate a system that, while powerful in pattern matching, still lacks the real-world adaptability, embodiment, and integrated online learning seen in biological systems operating continuously on ~20 watts.
This difference isn’t just about energy efficiency—it’s about the nature of intelligence itself.
Biological systems don’t passively process data; they act, predict, and refine their models dynamically, updating only what needs to change. Their perception isn’t just a feedforward pipeline but an active, goal-driven process shaped by hierarchical expectations.
For AI and robotics, the lesson is clear: intelligence isn’t just about scaling computation or accumulating vast datasets. True autonomy will require architectures that integrate perception and action into a predictive, hierarchical framework—learning efficiently, acting purposefully, and minimizing unnecessary computation. If we want machines that can think and adapt like humans, we must move beyond brute-force pattern recognition and embrace the principles that make brains work so effortlessly.
Some of the top books that shaped my understanding and ideas in this post:
- Active Inference: The Free Energy Principle in Mind, Brain, and Behavior by Thomas Parr, Giovanni Pezzulo, Karl J. Friston
- The Brain from Inside Out by György Buzsaki