← bilca.ai The Distributed Mind Data Exhaust Antidote Alignment / Safety / Security Turing App ↗

Proof of Humanity, Proof of You

The Convergence Nobody Asked For

No, your honor, it wasn't me. Really. It was my AI agent. I told it not to misbehave, but it talked to other bots on the dark web and learned some bad habits. I'm just so, so disappointed in it... — Poor Schmuck

This courtroom scene isn't science fiction. It's the logical endpoint of where we're headed, and it arrives the moment AI agents can take consequential actions in the world—signing contracts, moving money, sending messages that can't be unsent. That moment was yesterday.

The defense sounds absurd. But here's the problem: how do you prove it's false?

To establish that the defendant did authorize the action, you need infrastructure that records when humans authorize things. Cryptographic proof of intent. Well, technically, cryptographic proof of the human having clicked a sequence of keys on the keyboard, or having spoken a sequence of words to a microphone, or having looked at a sequence of pixels on a screen. We can't quite get to intent. That'd be mind-reading. Well, mind-signing and mind-hashing. Hardware-attested timestamps showing that yes, this person, at this moment, issued this command. The defense against AI-mediated plausible deniability requires (is) the elimination of plausible deniability entirely.

Welcome to the convergence nobody asked for.

Two Paths to the Same Destination

Two groups who normally attend different conferences, speak different languages, and would never admit having friends in the other group, are arriving at the same conclusion from opposite directions.

The AI safety researchers are worried about autonomous agents. As AI systems become capable of taking real-world actions (booking flights, executing trades, sending emails on your behalf), accountability becomes critical. Who's responsible when an agent goes off-script? The user who deployed it? The company that built it? The agent itself? Without verified chains of authorization, the question is unanswerable.

The law enforcement and national security types are worried about attribution. Deepfakes, synthetic identities, AI-generated phishing at scale--all the tools for "manufactured" reality--are getting very good very fast. When anyone can generate convincing evidence of anything, how do you know what actually happened? The traditional answer (follow the money, trace the communication) falls apart when the money moves through autonomous agents and the communication was generated by a model.

Both groups need verified humanity. Both need cryptographic proof of intent. Both need the answer to the question: "Did a human being actually do this?"

And neither is comfortable with what the other wants to do with that infrastructure once it exists.

The Road Not Taken

It didn't have to be this way. Or at least, some people tried very hard to point us in a different direction.

In 2004, cryptographers Nikita Borisov, Ian Goldberg, and Eric Brewer published a paper with a provocative title: "Off-the-Record Communication, or, Why Not To Use PGP." Their argument was that existing cryptographic tools had the wrong properties for human conversation.

Pretty Good Privacy (PGP) was designed for a world of contracts and legal documents. It gave you encryption (so only the recipient could read your message) and digital signatures (so anyone could verify you wrote it). The signature part was considered a feature: non-repudiation. You couldn't deny sending what you sent. In a world of business agreements and legal proceedings, this was exactly what you wanted.

But Borisov and his colleagues pointed out that most human communication isn't like that. When you talk to a friend at a bar, your words don't come with a notarized transcript. You can be heard, you can be understood, but afterward there's no permanent record that proves you said what you said. This isn't a bug in face-to-face conversation--it's a feature. It's one thing makes honest conversation possible. The other is having a friend you can trust.

Their solution was OTR (Off-the-Record) Messaging. The protocol was designed with deniability as a core feature. Not just privacy (keeping the contents secret) but something more radical: the inability to prove the conversation happened at all.

"We want only Bob to be able to read the message, and Bob should be assured that Alice was the author; however, no one else should be able to do either. Further, after Alice and Bob have exchanged their message, it should be impossible for anyone (including Alice and Bob themselves) to subsequently read or verify the authenticity of the encrypted message, even if they kept a copy of it."

This will break your brain if you think about it too hard (which is why OTR failed). It's designed to make it easy for anyone to put words in your mouth after the fact. In OTR, after a conversation ends, the MAC keys get published. This means anyone can now generate messages that would be cryptographically indistinguishable from "real" ones. Someone could forge a transcript showing you said anything.

That's the feature, not a bug. Because if anyone could have forged it, then a transcript proves nothing. You can always say "someone fabricated that." The very ease of forgery is what makes authentic records unverifiable.

It's the cryptographic equivalent of the legal principle that a confession obtained without witnesses is worthless—not because it's necessarily false, but because there's no way to distinguish a true confession from a coerced one.

The logic:

If only Alice can produce proof of a message → the proof is meaningful, Alice can't deny it

If anyone can produce proof of a message → the proof is meaningless, Alice can deny it

OTR deliberately chooses the second world. The tradeoff is that Bob can't prove to a judge that Alice said something, even if she really did. But that's also why Alice can speak freely to Bob—she knows the conversation can't be weaponized against her later. It's designed for a world where you trust the person you're talking to in the moment, but you don't want that trust to extend indefinitely into an unknown future where your words might be taken out of context, subpoenaed, or used against you by people you never anticipated.

The technical mechanism was elegant. OTR used ephemeral keys that were destroyed after use, so even if someone obtained your long-term key later, they couldn't decrypt past messages (perfect forward secrecy). More importantly, it used message authentication codes (MACs) instead of digital signatures. Bob could verify that Alice sent a message, but he couldn't prove it to anyone else—because the same MAC key that Alice used was also known to Bob, so Bob could have generated the same proof himself.

After each conversation, OTR would publish the MAC keys. This sounds insane from a traditional security perspective (another reason OTR failed), but it was the point: once the keys were public, anyone could forge messages that would look authentic. The cryptographic evidence self-destructed.

The philosophical stance was explicit: privacy requires the ability to deny.

Why OTR Lost

Signal inherited some of OTR's DNA. The Double Ratchet protocol that powers Signal, WhatsApp, and other modern encrypted messengers descends from OTR's key management approach. But something important was lost in translation.

Signal provides excellent privacy: end-to-end encryption, forward secrecy, disappearing messages. But the focus shifted from deniability to confidentiality. The question became "can third parties read this?" rather than "can third parties prove this happened?"

And the pressure has consistently run the other direction. Every few years, some government proposes "responsible encryption" or "lawful access" mechanisms for authorized parties to decrypt communications when presented with a warrant. The cryptography community fights these proposals, correctly pointing out that there's no such thing as a backdoor that only good guys can use. But the political pressure never really goes away, because the underlying problem never goes away: bad things happen in encrypted channels, and investigators need to know what happened.

OTR was swimming upstream in Colorado-grade rapids. The vision of a world where conversations are fundamentally unprovable never came to pass. The current runs toward accountability, attribution, proof. The question is will we survive that current.

The Trap

The uncertainty that makes moral choice meaningful also makes privacy possible. They aren't two separate things that happen to be connected--they're two aspects of the same underlying structure.

Anonymity isn't just about hiding. It's about the freedom to act without pre (or post) judgment. To be evaluated on what you do rather than who you are. To experiment and fail privately, to change without your past following you forever. A teenager can become a different person than they were at fifteen. An addict can recover without wearing their addiction badge forever. A dissident can speak truth to power without their family being targeted.

Perfect attribution collapses that space. If every action is permanently linked to a verified identity, the space for reinvention vanishes. The cost of being wrong goes up. People become more cautious, more conformist, more afraid to deviate from the safe path--because deviation leaves a permanent record.

But here's the real trap: perfect unattributability collapses a different space. When anyone can disclaim anything, trust becomes impossible. If your AI agent can do things in your name that you can later deny, the entire concept of commitment dissolves. We retreat into smaller and smaller circles of verified relationships, unable to extend trust to strangers, unable to make binding agreements with people we don't know personally.

Under uncertainty, decisions reveal character. Guaranteed outcomes dissolve decisions into mere mechanistic calculation. Certainty turns ethics into engineering. Or worse, accounting.

The cypherpunks imagined a world where cryptography would protect the little guy from the powerful. What we got instead is a world where cryptography protects the powerful from accountability while the little guy's entire life is tracked through their smartphone. The surveillance infrastructure is already here. The privacy infrastructure isn't. And now we're being asked to add attribution infrastructure on top of the surveillance infrastructure, without having ever built the privacy middle layer that was supposed to protect us.

Meet the panopticon.

The Non-Solution

We got distracted. I almost forgot this article was supposed to be about AI and proof of humanity.

I don't have a clean answer. I'm suspicious of anyone who does.

The AI safety people (of which I am one) are right that autonomous agents create an accountability crisis. When an agent can take actions in the world, and you can't prove whether a human authorized those actions, the entire framework of responsibility breaks down. "My AI did it" becomes the universal excuse.

The law enforcement people (of which I am one) are right that synthetic media and AI-generated content create an attribution crisis. When anything can be faked, and fakes are indistinguishable from reality, the concept of evidence becomes problematic. Seeing is no longer believing.

The privacy advocates (of which I am one) are right that verification infrastructure gets captured and abused. Every surveillance power granted to governments intended for terrorism gets used for drug enforcement. Every exceptional access mechanism gets exploited by hackers, hostile governments, or simply corrupt insiders.

History is unambiguous: if the infrastructure gets built to verify identity, it will be used for purposes beyond its original scope.

All three groups are correct about the risks. All three are wrong about the solutions.

Because there is no solution, only tradeoffs.

Verification infrastructure will be built. The pressure is too strong, from too many directions, for too many reasons. Legitimate reasons. The question isn't whether we get proof-of-humanity systems. The question is what will they look like. The question is how much will we have to give up to have them. The question is who controls them. The question is how do we prevent the worst abuses.

The OTR vision of conversations that are cryptographically unprovable is probably not viable at scale. Not because it's technically impossible, but because the social and political pressure against it is too strong. Too many powerful actors have too many reasons to want attribution. The moment AI agents can take consequential actions, the demand for verified authorization becomes overwhelming.

But we should be clear about what we're losing. The ability to speak without record. The ability to change. The ability to be someone different than who you ßwere. The freedom that comes from uncertainty—from the fact that the future isn't written, the past isn't perfectly remembered, and there's always room to become something new.

What I said above bears repeating: choice only matters when uncertainty exists. Without uncertainty, action becomes calculation. Under uncertainty, decisions reveal character. Guaranteed outcomes dissolve decisions into mere mechanism.

Architecture is Ethics

OTR failed not because the cryptography was wrong, but because it tried to solve a structural problem with a protocol (I swear I've seen that somewhere before...). It required users to understand the threat model, to actively maintain the security properties, to opt in. The platforms solved it by just... not offering the option. Structure won.

The same pattern plays out everywhere. You can't bolt deniability onto a system designed for attribution. You can't bolt alignment onto a system designed for optimization. You can't patch prompt injection in an architecture that fundamentally conflates data and instruction. The guardrails are always fighting the gravity of the design itself.

This is why I'm building what I'm building. Not because distributed architectures are a silver bullet--they aren't--but because the alternative is trying to constrain systems whose structure pulls relentlessly toward the veryoutcomes we're trying to prevent. Centralized prediction engines want to become oracles. Verification infrastructure wants to become panopticon. The only way to get different outcomes is to start with different structures.

I don't know if it's enough. In fact, I know it's not. But it's the only approach that makes sense to me.

Panopticon

The panopticon arrives not as tyranny, but as safety feature. The thing that protects us from rogue lobsters and super AI villains is the same thing that records our every action, forever, attributed and undeniable.

The only cure for plausible deniability is the elimination of deniability. All deniability.

Eyes open here: we're choosing between dystopias. The best we can do, the only thing we can do, is be clear about the tradeoffs, build in whatever constraints we can, and try not to pretend that the choice is anything other than what it is.

The only thing to save us from the panopticon is... the panopticon.

Oh, and please don't ask "why can't we legally charge and prosecute AI agents?" That's not funny.

Michael Bilca
AI Security Architect | Neuromorphic Computing & Distributed Intelligence
temper.ai
Read Part 6 (Conclusion) →
The Water's Fine
The amplification loop of invisible errors. Why the water feels so warm right before it boils.
Experience It →
Turing SNN
See distributed, local intelligence in action.