← Back to home

Safety in a Computational Universe

Why civilizational safety may be a race over which computations become real first

Position Note · Macheng Shen · Collaborative drafting: GPT-5.4 Pro · 2026-03-15

Abstract. This note proposes a simple but consequential reframing. If the world is usefully viewed as computational, then AI safety is not only a problem of values, instructions, or benchmark capability. It is also a race over which computations are discovered, verified, deployed, and stabilized first. A civilization remains safe not because it has written down a perfect principle once and for all, but because it manages to keep dangerous computations from scaling faster than safe ones. Advanced AI matters here not only as a powerful tool, but as a new organ of societal self-modeling: for the first time, human society may be able to simulate itself in a more fine-grained, action-conditioned way than traditional social science allowed. This does not make society perfectly self-predictable, and it certainly does not guarantee self-mastery. But it may change the practical frontier of what can be forecast, stress-tested, and controlled. Our main finding is therefore a reframing: safety is better understood as the discovery and maintenance of computations that keep the system inside viable regions, under uncertainty and reflexivity, rather than as the static optimization of a single reward. Our main conjecture is that the deepest safety problem may be a competition between safe and dangerous computation pathways. This note ends by sketching what that implies for current action and for future theory.


1. Why start from computation at all?

Many safety discussions begin with alignment, values, or incentives. Those are all important. But there is another, more structural angle.

A world model, a policy, a coordination protocol, a jailbreak, a dangerous biological design, a market manipulation strategy, a stabilizing institution, and a robust controller all have something in common: each can be viewed as a computation that maps states, observations, and interventions into consequences.

Once we look at the world this way, a different safety question appears:

Which computations will become real first, and which of them will gain leverage over the future?

That question is still about ethics and governance, but it is no longer only about ethics and governance. It is also about discovery, verification, scaling, and control.

This is the core reframing of the note.


2. The warm-up intuition: self-predictive systems

A useful place to start is not civilization, but a simpler object: a self-predictive system.

Imagine a system that tries to model how it and its environment will co-evolve under different actions. It does not merely predict what happens next. It predicts action-conditioned futures and uses those predictions to reduce survival risk.

The natural question is then:

What determines the optimality bound of such a system's decisions?

At first glance, the answer might seem to be "better prediction." But that is already too simple.

Decision quality depends on at least five things:

  1. Observability: can the system see the variables that actually matter for risk?
  2. Model error: does its predictive model contain the real dynamics, or only a convenient fiction?
  3. Controllability: even if it predicts danger, can it still steer away from it?
  4. Planning quality: can it actually compute a good intervention in time?
  5. Reflexivity: once its prediction is used, does that prediction itself change the system being predicted?

This immediately shows why safety is not just an abstract reward-optimization problem. It is a problem of remaining inside a viable region under uncertainty, intervention, and feedback.

That viability framing matters, because it gives us a concrete notion of relevance. The relevant variables are not simply the variables that improve average prediction accuracy. They are the variables that causally affect whether the system enters a catastrophic region or stays inside a viable one.


3. From individual systems to human society

Now scale the picture up.

Human society is also, in an important sense, a self-predictive system. It gathers information about itself, forms forecasts, debates futures, makes interventions, and then lives inside the consequences of those interventions.

But historically, human society has been very poor at high-resolution self-prediction.

Why? Because traditional self-models were usually too coarse, too aggregated, too weak at handling counterfactual interventions, and too limited in their ability to model many interacting individuals at once.

Social science gave us partial models, institutional memory gave us partial heuristics, and statistics gave us useful aggregates. But for most of history, society had nothing like a rich, reusable, action-conditioned simulator of itself.

This is why the present moment is unusual.

Advanced AI may become, at least in part, an externalized self-modeling organ for human society.

That is a strong phrase, but it is a useful one. It does not mean society becomes perfectly transparent to itself. It means society may acquire a new representational substrate for building finer-grained, more interactive, more counterfactual models of collective behavior.

This is exactly why policy simulation, agent simulation, and AI-assisted institutional forecasting suddenly feel plausible in a new way. Something changed. Not because society solved its epistemic problems, but because it may have acquired a new modeling organ.


4. The key caution: better self-modeling is not self-mastery

At this point, an overly optimistic story becomes tempting:

If AI can model us well enough, we can just compute the safe solution first, and then we are fine.

I do not think that follows.

There are at least three reasons.

4.1 Prediction changes the system

A forecast that enters policy is not an innocent observer. It changes incentives, expectations, and strategic responses. Once a society acts on a prediction, the society is no longer the one that was predicted before the prediction was announced.

So the problem is reflexive from the start.

4.2 The safe answer may not be a single answer

Some problems admit a fixed solution. But civilizational safety is more likely to look like a continually maintained control regime than a one-time proof. The relevant object may not be a closed-form answer, but a collection of controllers, institutions, guardrails, and monitoring loops that keep the system inside viable regions over time.

4.3 Dangerous computations may be easier to scale than safe ones

A harmful exploit, destabilizing manipulation strategy, persuasive deception mechanism, or destructive design may spread quickly once discovered. Safe controllers, by contrast, often require more validation, more coordination, more trust, and more institutional integration.

So the race is not merely about discovering some answer. It is about whether safe computations can be discovered, verified, and stabilized faster than dangerous computations can diffuse.


5. The computational-universe turn

Now we can state the stronger conjecture.

If we adopt a computational-universe lens — not necessarily as a metaphysical dogma, but as a useful working view — then the deep safety question becomes:

Which computations will be physically instantiated, amplified, and given control first?

Under this lens, safety is not only about what an agent values. It is also about which search procedures get run, which simulation pipelines become trusted, which policies are selected, which dangerous solution concepts become available, and which institutional controllers are deployed in time.

This leads to a very simple but important slogan:

Safety may be a race over which computations become real first.

But the point needs to be made carefully. It does not mean that there exists a single perfect safe computation waiting to be found.

It means that civilization may need to do four things faster than the threat side does its analogues:

T_safe = T_discover + T_verify + T_deploy + T_coordinate

while dangerous capabilities effectively move on something like

T_threat = T_discover-threat + T_scale + T_exploit

The practical problem is therefore comparative, not absolute.

We do not need omniscience. We need safe computation pathways that become real, trusted, and operational before dangerous ones dominate.


6. What I think we have actually found

This note is a position piece, not a theorem paper, so it is important to separate what looks like a finding from what remains a conjecture.

6.1 Findings

Finding 1. For safety, relevance should be defined in terms of viability, not generic prediction quality.

A variable is safety-relevant if changing it changes whether the system remains inside a viable region or falls into a catastrophic one.

Finding 2. Advanced AI makes high-resolution, action-conditioned societal self-modeling newly plausible.

This does not settle the problem, but it changes its practical tractability.

Finding 3. Better prediction alone does not solve safety.

Because of reflexivity, model error, controllability limits, and planning constraints, safety is a control problem under uncertainty, not just a forecasting problem.

Finding 4. The right object is not a universal reward, but a family of computations that maintain viability under intervention.

This aligns better with self-predictive systems, world models, and reach-avoid style reasoning than with one monolithic scalar objective.

6.2 Conjectures

Conjecture 1. The deepest AI safety problem may be a race over safe versus dangerous computations.

Conjecture 2. Human society is beginning to acquire an externalized self-modeling organ, and this changes the frontier of what collective forecasting and intervention design can achieve.

Conjecture 3. Dangerous computations may often enjoy an asymmetry: they can be easier to discover, easier to distribute, and easier to weaponize than safe civilizational controllers are to verify and coordinate.

These conjectures are not yet proofs. But I think they are strong enough to guide research and institutional design.


7. What this suggests we should do now

If the framing above is even partly right, then current action should shift in a specific direction.

7.1 Build hazard-relevant world models, not just larger generic predictors

We do not primarily need systems that predict everything a little better. We need systems that identify variables, interventions, and cascades that matter for catastrophic risk.

That means action-conditioned, uncertainty-aware, hazard-focused modeling.

7.2 Treat safe computation as something that must be verified and stabilized

Finding a candidate solution is not enough. A safety-relevant controller must also be checked, stress-tested, monitored under shift, and embedded in institutions that can keep it working.

7.3 Invest in containment and compute governance, not only positive capability

If safety is partly a race over computations, then restricting which dangerous computations can be cheaply discovered, run, or scaled is itself a safety intervention.

This means sandboxing, staged release, capability throttling, red-teaming, and governance over compute and data access are not side issues. They are part of the core problem.

7.4 Build societal self-modeling infrastructure carefully

A civilization with stronger self-models can make better decisions. But it can also become more fragile if those models are overtrusted, gamed, or politically captured.

So social self-modeling needs pluralism, adversarial review, uncertainty reporting, and institutional humility.

7.5 Focus on viability regions rather than one-shot perfection

The practical aim should be to keep critical systems inside viable regions despite uncertainty, delay, and adaptation. In many cases, that is more realistic than searching for a final global optimum.


8. The deeper theoretical picture

Once safety is framed this way, several lines of previous discussion suddenly connect.

8.1 Task relevance becomes survival relevance

We previously asked where task relevance comes from. In the present framing, the answer becomes more concrete: the most relevant variables are those that matter for survival, recovery, and staying within viable regions.

8.2 World models become intervention models

A world model is not important merely because it predicts what comes next. It matters because it supports decisions about what to do next.

For safety, this means the right model is not only descriptive. It must be intervention-aware.

8.3 Society should not be modeled as a monolithic optimizer

Human society is distributed, multiscale, and reflexive. Its self-models are built by many institutions, many actors, many incentives, and many competing narratives.

So any serious safety theory must ultimately become a theory of distributed, multiscale self-prediction and control.

That is why this note should be read as a beginning, not an end.


9. Where this theory could go next

I see at least four strong future directions.

9.1 Computational viability theory

We need a sharper theory of what it means for a computation, controller, or institution to keep a large system inside viable regions over long time scales.

9.2 The complexity gap between safe and dangerous computations

How hard is it, in general, to discover and verify dangerous solutions versus safe controllers? If there is a persistent asymmetry here, it may be one of the deepest safety facts.

9.3 Reflexive forecasting and policy design

How do we build systems that remain useful when their own forecasts change the world they forecast?

9.4 Externalized self-modeling organs for society

If advanced AI really is becoming a societal self-modeling organ, we need a principled theory of how much trust such organs deserve, how they fail, and how they should be institutionally coupled to human judgment.


10. Closing

The core message of this note is simple.

Safety should not be imagined as a one-time answer to a static question. In a computational world, and especially in a world with advanced AI, safety is better understood as a race and a regime:

If this is right, then the immediate task is not merely to build more capable models. It is to build better self-models, safer controllers, stronger validation loops, and institutions that can discover, certify, deploy, and maintain safe computations faster than dangerous ones can spread.

That, I think, is a research agenda worth making public early.


← Back to home