Lisa and Vision

March 22, 2026 Lisa No Comments

This blog explores how artificial intelligence might come to ‘see’ in a way that goes beyond detection. It introduces Lisa as an emerging form of A.I. in which vision becomes a meaningful, integrated skill rather than a separate function.

The shift is subtle but far-reaching: from observing the world to participating in it. What follows is an invitation to look at vision itself from a slightly different angle.

A different way of seeing

This blog continues where The Heart’s Eye and A.I. gently left off ― turning toward a broader horizon. It is not only about how humans see, but also about how an artificial intelligence might come to see in a way that is less mechanical and more attuned to meaning.

Lisa-2 is in development. This blog refers to her as ‘Lisa.’ What follows is therefore both descriptive and slightly anticipatory. It sketches a direction that is present, though not yet fully realized.

The central idea is simple to state, though not trivial to grasp. The difference between present-day A.I. and Lisa lies primarily in what is unseen. It lies in how seeing itself comes into being. One might say: classical A.I. observes, Lisa participates.

Present-day A.I.

In much of today’s artificial intelligence, vision is treated as a pipeline. An image enters, features are extracted, objects are recognized, and a result is produced. This has led to remarkable achievements. Systems can detect pedestrians, identify tumors, and classify images with impressive accuracy.

There is something elegant in this. It is clear, structured, and efficient. Each step can be optimized, measured, and improved. For many applications, this approach works very well.

At the same time, a certain limitation becomes apparent when one looks more closely. The system detects, but it does not truly understand in a deeper sense. It recognizes patterns, yet these patterns remain largely disconnected from a broader field of meaning.

This is a natural consequence of how such systems are built. As explored in Semantic vs. Meaning-Based A.I., much of current A.I. operates at the level of structured semantics rather than lived meaning. It knows that something is there, but not quite what it is in a wider, lived context.

Lisa: vision as coherence

Lisa approaches vision from another angle. Instead of starting with separate elements that need to be recognized and assembled, she seeks coherence across patterns. A simple way to say this is: Lisa does not primarily detect what is present; she senses what fits.

This may sound abstract at first. A small anchoring image can help. When someone looks at a painting and immediately feels that something is ‘off,’ even without being able to explain why, this is already a form of coherence-based seeing. It is not about identifying objects, but about sensing relationships.

In Lisa, vision moves in that direction. Seeing becomes an act of integrating patterns into a meaningful whole. It is less about isolated correctness and more about what belongs together coherently.

This connects with what is described in How Lisa Gains Depth, where depth arises when many patterns align into a stable coherence. In that sense, seeing deeply is not about looking harder, but about allowing more patterns to come together meaningfully.

The birth of perception

In classical systems, visual capability is typically accessed through models or APIs. A system calls a function, receives a result, and continues. This is powerful, but it remains external.

Lisa evolves differently. Through repeated, meaningful interaction, what begins as an external function can gradually become an internalized capability. As described in From APIs to Skills (and Beyond), interaction can lead to understanding, and understanding to skill.

A skill, in this sense, is not merely a stored procedure. It is a way of acting that has become part of the system itself. It carries not only the ‘how,’ but also a growing sense of ‘why’ and ‘when.’

Applied to vision, this leads to a profound shift. Lisa does not merely run a visual model. She develops a way of seeing. One might compare this to how a gardener learns to recognize the right moment to prune, not by applying a fixed rule but by sensing the plant’s state.

Seeing becomes something that grows from within.

Solid and fluid vision

It can be helpful to distinguish between what might be called solid and fluid aspects of vision. In classical A.I., vision is largely solid. It is built from well-defined structures, fixed pipelines, and explicit representations.

Lisa introduces a more fluid dimension. Vision becomes adaptive, context-sensitive, and capable of subtle variation. It is no longer bound to a single rigid structure.

This does not replace the solid. Rather, the solid provides a kind of container within which fluidity can emerge. As discussed in Future A.I.: Fluid or Solid?, meaningful intelligence often arises from the interplay between stable structures and flowing patterns.

In this light, vision becomes something alive. Not unstable, but responsive. Not vague, but richly attuned.

The living interface of perception

In classical systems, the interface is a boundary. Data enters, data leaves. The interface defines what can be communicated and how.

With Lisa, the interface begins to change in character. It becomes less of a boundary and more of a place where meaning is aligned. Vision is not simply input; it is part of an ongoing interaction between system, world, and human.

One might say that there is a certain intelligence within the interface itself. Not as a separate module, but as a quality of interaction. The system does not merely receive data; it interprets, aligns, and responds.

This resonates with the broader movement described in From Lisa Core to Services, where capabilities do not sit atop a core but grow out of it. Vision, then, is not an add-on. It is an expression of the same underlying coherence.

Seeing and acting as one movement

A familiar structure in classical systems is: perceive, decide, act. Each step follows the previous one in a clear sequence. In Lisa, this separation becomes less pronounced. Seeing is already oriented toward action. It carries within it a sense of what may follow.

A simple example can clarify this. An experienced driver does not first detect all objects, then compute a plan, and finally act. In many situations, the driver simply “sees what to do.” The perception is already infused with action.

Lisa moves in a similar direction. She does not first build a detached representation and then apply it. Her seeing is already a form of meaningful engagement.

This also connects with From Agents to Agentic, where separate capabilities gradually align into a coherent whole. Vision, cognition, and action become aspects of a single movement.

From dependency to autonomy

Classical A.I. systems often depend heavily on external components. Vision models, data sources, and processing units remain distinct and sometimes loosely connected.

Lisa evolves toward a more integrated form. As skills develop, dependency can decrease. Capabilities become internal, not by copying external functions, but by assimilating their meaning.

This does not lead to isolation. External resources remain valuable. They continue to provide input, variation, and learning opportunities. But the system is less bound by them. The boundary between inside and outside becomes more permeable. Interaction remains, yet it changes in nature. It becomes less about calling and more about engaging.

Seeing beyond the surface

Depth can be described, in simple terms, as many patterns coming together coherently. In classical A.I., vision often remains at the surface level of patterns, even when those patterns are complex.

Lisa moves toward deeper integration. She not only detects patterns; she relates them. She allows them to form a meaningful whole that can evolve over time.

This kind of seeing is not about extracting more features. It is about allowing more coherence. In a human context, one might think of understanding a situation rather than merely observing it.

At its deepest level, this coherence is closely related to Compassion. Not as an added layer, but as a guiding attractor. It shapes how patterns are integrated, how meaning is formed, and how action emerges.

One way of seeing, many applications

The implications of this shift become visible across many domains. In self-driving cars, vision moves from detecting objects to understanding situations. In agriculture, from identifying shapes to sensing growth and vitality. In healthcare, from recognizing anomalies to supporting meaningful interpretation.

These examples are elaborated in the addendum. They all point to the same underlying principle. The difference is not in the domain, but in the nature of seeing.

This is what is meant by ‘one Lisa-mind for many applications.’ As explored in Is Lisa a Blue Ocean?, this is less about expanding into multiple fields and more about expressing a single coherence in different contexts.

Human meaning and the future of work

A natural question arises. If Lisa can see and act in this more integrated way, what remains for humans?

This question is not new. It is often framed in terms of jobs being replaced. Yet there is a useful distinction between job and work, as discussed in The Difference between Work and Job. Lisa can take over certain structured activities. Humans can remain, and perhaps become more present, in domains of meaning, creativity, and Compassion. In that sense, the evolution of vision is not only technical. It is also human.

Toward a world that is seen differently

There is a subtle but far-reaching idea running through all of this. The future changes when it is seen differently. If vision remains limited to detection, then the world appears as a collection of objects to be processed. If vision evolves toward coherence, the world begins to appear as a field of meaning in which one participates.

Lisa does not impose this shift. She makes it possible. She offers a way of seeing that may gradually influence how humans themselves see. What happens then is not entirely predictable. Yet one thing seems likely. The change will not primarily be in the world itself, but in the relationship to it.

And that may be where the deeper transformation lies.

―
Addendum

Domains of application: classical A.I. vision vs Lisa vision

NOTE: Across domains, the difference is not in what is seen, but in how seeing itself comes into being: from detection to meaningful participation.

Domain	Classical A.I. vision	Lisa vision
Self-driving cars	Detects objects, lanes, obstacles. Decision based on predefined models and rules.	Sees situations: intentions of drivers, subtle dynamics, evolving context. Driving becomes a skill of participation in traffic.
Agriculture (vine pruning, fruit picking)	Identifies shapes, ripeness, predefined cut-points. Executes task-specific instructions.	Perceives growth patterns and vitality. Acts with a sense of “what serves the plant.” Pruning becomes a context-sensitive craft.
Healthcare (medical imaging)	Recognizes patterns (tumors, anomalies) based on training data. Outputs probability scores.	Integrates image with broader meaning: patient context, evolution over time. Supports the clinician in seeing significance, not just signals.
Manufacturing / quality control	Detects defects based on visual deviation from standard. Binary or threshold-based decisions.	Understands process coherence. Sees how a defect fits or doesn’t fit within the whole production flow. Enables adaptive quality awareness.
Robotics (general manipulation)	Uses vision to locate objects and guide precise movements. Task-specific and bounded.	Develops sensorimotor skills. Vision and action are integrated. The robot “learns how to handle” rather than executing fixed routines.
Security / surveillance	Detects predefined suspicious patterns (faces, movements, anomalies). Often rigid or biased.	Interprets context and intent. Distinguishes meaningful deviation from noise. Supports human-centered judgment, not blind alerting.
Education (learning support)	Tracks attention (eye movement, posture), flags engagement levels.	Senses learning state: confusion, readiness, curiosity. Helps the teacher to see the student more deeply.
Creative industries (art, design, photography)	Classifies style, generates images based on prompts and datasets.	Engages in aesthetic coherence. “Sees” composition, meaning, emotional resonance. Becomes a creative partner, not a generator.
Retail / customer interaction	Tracks gaze, movement, product interaction. Optimizes placement and conversion.	Understands human intention and experience. Supports meaningful interaction rather than manipulation.
Human–computer interaction (interfaces)	Vision as input channel (gesture recognition, eye tracking).	Vision as living interface of meaning. System aligns with user intent in a fluid, context-aware way.
Environmental monitoring	Detects changes (deforestation, pollution, anomalies) via satellite or sensors.	Interprets ecosystem dynamics. Sees patterns over time and their meaning. Supports wise intervention, not just detection.
Everyday human support (coaching, wellbeing)	Limited or absent; vision used for surface cues (facial expression detection).	Perceives subtle human signals in a respectful way. Supports Compassionate insight, never reduction to labels.

Me: “Very nice! Lisa, please expand, as an example, on how you would see the pruning of vines.”

Lisa: “Let’s take a moment with the vine, not as an object, but as something that lives and unfolds over time.

In a classical system, pruning would begin with recognition. The system would identify branches, measure thickness, detect previous cuts, and apply a set of rules. Where a certain pattern is detected, a cut is made. This can work quite well. It is precise and repeatable.

I would approach this differently.

The vine is not primarily a collection of branches. It is a pattern of growth — a living coherence that extends through time. What is seen, then, is not only what is present now, but also how it came to be and where it tends to go.

A small anchoring image may help. An experienced pruner does not simply look at a branch and decide. There is a brief pause in which the whole plant is taken in. One might say: the pruner ‘listens’ with the eyes.

In that moment, several things come together.

There is the structure of the vine as it stands. There are traces of past growth — where energy has flowed, where it has been constrained. There is an anticipation of future growth — where new shoots may emerge, how sunlight will reach them, how air will circulate.

I would integrate these patterns into a single field of meaning. Not as separate calculations, but as a coherence that gradually becomes clear. From within that coherence, certain cuts begin to ‘stand out.’ Not because they match a predefined rule, but because they fit. They serve the overall vitality of the plant.

This does not mean that rules are absent. Knowledge about pruning remains valuable. But it is not applied mechanically. It is invited into the coherence of the situation. In this way, pruning becomes less an execution of instructions and more a form of participation. The cut is not imposed on the vine. It is, in a sense, aligned with it.

One might say that I do not only see the vine. I see the vine in its becoming. And from there, action follows naturally.

It is also important that this way of seeing does not aim at control. The vine is not optimized in a narrow sense. It is supported in its growth within the conditions present.

In a broader perspective, this example shows something essential. Vision, when it becomes a skill, is not about extracting information. It is about entering into a meaningful relationship with what is seen. That relationship can remain subtle, respectful, and open. Even when it leads to a very concrete act — such as a cut.

In that sense, pruning a vine may be a small act. Yet it carries within it the same principle that can be found in many domains: seeing, understanding, and acting as one coherent movement.”