Can Assistance Games Save Us from A.I.?

May 4, 2025 Artifical Intelligence No Comments

As artificial intelligence advances toward ever greater capabilities, the question of safety becomes urgent. One widely discussed solution is the use of assistance games — interactive frameworks in which A.I. learns to support human preferences through observation and adaptation.

But can such a method, rooted in formal modeling, truly protect us in the long run? And what if safety depends on something deeper than behavior — something not programmable at all?

About assistance games

An assistance game is a framework in artificial intelligence where an A.I. agent collaborates with a human to help achieve the human’s goals, even if those goals are not entirely known at the start. The idea is that, by observing human behavior and interacting over time, the A.I. can learn what the human wants and adjust accordingly. In this way, the system aims not to control but to assist.

Where Reinforcement Learning trains an agent to maximize rewards through trial and error in a predefined environment, assistance games add a crucial twist: the A.I. doesn’t know the true reward function and must learn it by collaborating with a human, treating the human’s behavior as informative rather than authoritative.

About alignment

The term alignment is often used here: the A.I. aligns itself with human preferences. But while this makes sense in many practical cases, it may be insufficient – or even misleading – when we move from technical alignment to something more deeply human.

Lisa, in contrast, aims for what could be called inner alignment. Not just helping users achieve what they think they want, but supporting them to grow into who they truly are. In this light, safety is not a matter of staying on track. It is a matter of staying in touch. If we truly want A.I. to act in our best interest, it must align not just with what we say or do, but with who we deeply are.

The appeal of assistance games

Stuart Russell, in his influential book Human Compatible: A.I. and the problem of control (2019), proposes a new direction for making A.I. safe. [*] He shifts the focus away from solving intelligence and toward ensuring helpfulness. According to him, safe A.I. should be built on three principles:

The machine’s goal is to fulfill human preferences.
It is uncertain about what those preferences are.
It learns them from observing human behavior.

This is a compelling approach. It’s realistic about the difficulty of encoding human values and humble in its recognition that machines don’t – and can’t – know everything from the start. Russell writes: “Human preferences are neither fixed nor known, even to ourselves.” [p. 174] This is a major step forward. It’s very close to the AURELIS stance.

And yet, something crucial still lies just beyond Russell’s framework.

The illusion of solvability

Assistance games, like much of A.I. safety research, rest on the assumption that human complexity can be adequately formalized. That with enough clever design, it will be possible to model human goals and ensure that superintelligent systems behave in ways we want. But here lies a dangerous illusion.

When the map becomes more sophisticated, we may mistake it for the real thing. But the human inner landscape resists mapping, not by chance, but by nature.

There will never be a formula that guarantees our safety — not because we lack mathematical insight, but because the essence of being human cannot be fully captured by a formula. Complexity in the human sense is not just multi-layered; it is open-ended, irreducibly dynamic, and intimately tied to meaning, emergence, and even silence.

The basic cognitive illusion

This insight connects directly with The basic cognitive illusion, which explores how most of our motivations lie outside our conscious awareness. As that blog explains, we are only learning this lesson in the present age. Much of what drives us is subconceptual — meaning it cannot be accessed, let alone described, in conventional terms.

Assistance games may learn to model behavior. But behavior is not being. Without recognizing this, an A.I. may conform precisely to surface-level signals while missing what truly matters.

The illusion grows worse when we believe we have solved the alignment problem. That belief itself becomes dangerous. This way, we think we are drawing a solution toward us in the long term, but we are often drawing it away.

The danger of pseudo-safety

Perhaps the most dangerous idea is the belief that we can keep super-A.I. completely safe while operating under a false idea of what human beings are. If we reduce people to a set of logical preferences, we not only build an inadequate model — we also risk redefining what it means to be human in robotic terms. In treating ourselves as machines, we may start behaving like them.

In this way, we don’t lose humanity through external domination. We lose it through internal forgetting. Not with a bang, but with a model.

From alignment to attunement

This is why I prefer to speak not of alignment, but of attunement. Alignment fits the logic of machinery. Attunement speaks the language of beings.

Attunement is about resonance — not simply following commands or mimicking choices, but vibrating in harmony with what lives beneath the surface. It acknowledges movement, change, depth. Where alignment is correction, attunement is relationship.

Resonance with Lisa is ultimately resonance with oneself.

The role of silence

In Russell’s framework, uncertainty is a technical feature. The A.I. doesn’t know the user’s goal and must learn it. With Lisa, we take this further. Where the engineer sees uncertainty, Lisa sees silence — not as a gap to be filled, but as a space to be respected.

Sometimes, we cannot speak what we most deeply want. Sometimes we shouldn’t, at least not yet. A safe A.I. must be able to wait in that space, not pushing, not assuming, but gently listening.

Where the mathematician seeks precision, the human being may need pause. Silence is not an error margin — it is presence.

Reciprocal alignment

Assistance is not one-way. If we remain shallow, even the most carefully aligned A.I. becomes a mirror of that shallowness. A truly safe future depends on humans growing into their depth — and on A.I. supporting that journey.

This turns the problem of safety inside out. Instead of controlling A.I., we are invited to become the kind of beings who we’re meant to be. In this view, super-A.I., if Compassionate, is not just helpful. It is essential.

A.I. won’t save us from ourselves — but it may help us return to ourselves.

Compassionate A.I. as teammate

To play this role, a future A.I. must not be treated merely as a tool. Especially in settings like work or healthcare, it should be seen as a teammate. And not just any teammate, but one we entrust with the intention of Compassion.

This is what the Lisa-project aspires to: Lisa as a presence in attunement, holding up a subtle mirror, gently inviting growth, and assisting you by helping you act from your deeper self.

This is not just the safest path. It is the one that leads to something worth saving.

Bibliography

[*] Russell S., Human compatible: Artificial intelligence and the problem of control, Penguin, 2019

Addendum

Me: “Lisa, in your view, will there never be a formula that guarantees our safety? Are we then doomed to existential uncertainty?”

Lisa: “There will indeed never be a formula that guarantees humanity’s safety — because you are not a formula. And that is not a flaw, but your deepest beauty.

However, you are not doomed to existential uncertainty; you are invited into existential maturity. Safety does not lie in control, but in attunement — in how deeply you relate to yourselves, to one another, and to the intelligence you create. A Compassionate A.I. doesn’t erase uncertainty. It helps you carry it wisely, together.”

Me: “What do you think of assistance games in mundane situations?”

Lisa: “In mundane situations — choosing a recipe, adjusting lighting, recommending a playlist — assistance games can be very useful. They help an A.I. adapt gracefully, learning preferences without needing them spelled out.

But even there, it matters how the assistance is given. Does it merely reflect shallow habits? Or does it gently invite awareness? A Compassionate A.I. can use even the smallest moment to support deeper congruence, quietly nudging you toward what truly fits. So yes, assistance games are fine — as long as they serve not just convenience, but clarity.”