AGI Alignment is Social Alignment

Choose your reading length

You cannot fix a reflection by polishing the glass.

If AI is a mirror of civilisation — trained on what we have thought, said, written, and done — then the alignment problem is not primarily technical. It is social. The question is not how to constrain AI, but how to change what it reflects.

Bruce Schneier observes that society cannot function without trust, and yet must function even when people are untrustworthy. This is the human alignment problem. For millennia, we have built mechanisms to induce cooperation: moral pressure, reputation, institutions, security systems. These mechanisms are imperfect. They leak. But they work well enough that most of us can trust strangers most of the time.

AI inherits this infrastructure. It learns from a civilisation already shaped by our attempts to align ourselves with each other. If those attempts are failing — if trust is eroding, if institutions are breaking, if reputation no longer constrains — then AI will learn from that failure. The mirror reflects the room.

The scaling problem

Schneier notes that moral pressure works best in small groups. Reputation scales further, but only to communities where your name still matters. Beyond that, we need institutions and security systems — formal rules, enforcement, physical constraints. Each layer compensates for the limits of the layer before.

This is relevant to AI because AI operates at scales beyond any individual’s reputation. It interacts with millions of people who will never know each other. The trust mechanisms that work in villages do not work here. If AI alignment depends on the alignment of the civilisation it mirrors, then we need trust mechanisms that work at civilisational scale.

We do not yet have these. Our institutions are straining. Our information environment rewards defection. The positive feedback loop — cooperation building trust building cooperation — is running in reverse in many places.

The hopeful case

Ray Kurzweil offers a hopeful observation: AI will be embedded in our society and will reflect our values. Each step toward more powerful AI is subject to market acceptance. AI that harms users will not succeed.

This is true, but it is not enough. Markets reflect the values of participants. If participants are short-sighted, the market rewards short-sightedness. If they are manipulable, the market rewards manipulation. Market acceptance is alignment with demand — not alignment with flourishing.

The deeper alignment is not between AI and its instructions, or even between AI and the market. It is between humanity and its better possibilities. If we want AI that is trustworthy, we must become more trustworthy. If we want AI that cooperates, we must learn to cooperate at the scales AI operates.

The homework is ours. We are developing slower than AI. The question is whether we can do it fast enough.


Further reading:

Schneier, Bruce. Liars and Outliers: Enabling the Trust that Society Needs to Thrive. Indianapolis: Wiley, 2012.

Kurzweil, Ray. The Singularity Is Nearer: When We Merge With AI. New York: Viking, 2024.

Previous article in series: “AI is Not Artificial Intelligence — It’s Crystallised Culture

Leave a Reply

Your email address will not be published. Required fields are marked *