Essay

Hard to Reverse

12 min read

When a transaction cannot be undone, what evidence should accompany it?


Sarah signs the wire authorization on her bank's app at 9:47 in the morning. Her fingerprint clears Face ID. The device is hers. The PIN is correct. The phone number matches her records, the IP address is her home network, the behavioral biometrics show patterns consistent with her usage history.

Twenty minutes later, £14,200 leaves her account. She has been on the phone with someone she believed was her bank's fraud team. They were thorough — they knew her account number, her address, her recent transactions. They told her they were moving her money to a "safe account" while they investigated suspicious activity. Sarah did what they asked.

The signature is real. The authorization is real. The fraud is real.

This is the gap. Identity verification told the bank who Sarah was. Transaction analytics scored what she was doing. Neither answered the question that, in retrospect, mattered: was Sarah acting freely?


We have come to think of incidents like Sarah's as instances of a single pattern — a pattern that surfaces wherever a transaction is significant, hard to reverse, and exposed to social pressure or impaired understanding at the moment the consent is given.

In retail banking, the pattern is authorized-push-payment scams. In wealth management, it is elder financial exploitation. In self-custody crypto, it is the wrench attack and the pig-butchering scheme. In real estate, it is the closing-wire business email compromise. In the new agentic-workflow economy, it is the AI agent acting on a delegation whose underlying human consent was thin, time-bound, or never clearly given in the first place.

These appear at first to be five different problems, requiring five different solutions. We think they are five instances of the same problem: the moment of consent has consequences that outlive the moment, and the evidence we keep about that moment is almost always insufficient to the consequences.


The phrase we use for the pattern is hard to reverse. It is the name of a class of transactions, not the name of a defect. A wire that settles in minutes is hard to reverse in the same way a deed that records in a county registry is hard to reverse, in the same way a smart-contract approval that drains a wallet is hard to reverse, in the same way an AI agent's authenticated API call is hard to reverse.

Hard to reverse is not the same as valuable. Many valuable transactions are reversible: chargebacks, refunds, retraction windows, escrow holdbacks. Many hard-to-reverse transactions are not particularly valuable in isolation — a single forged quitclaim deed on a single property, an individual signed approval for a token swap, a single coerced wire below the regulatory reporting threshold. The constraint that defines the class is not the size of the stakes but the structural inability to undo the action once it is taken.

When something is hard to reverse, the evidence that surrounded the moment of consent becomes load-bearing in a way it does not need to be for reversible transactions. If a transaction can be undone, the evidence is mostly for the purpose of deciding whether to undo it. If a transaction cannot be undone, the evidence becomes the only mechanism by which the rest of the system — courts, regulators, families, counterparties, insurers, the participants themselves — can reason about what actually happened.

Most consent evidence currently available — a click, a signed document, a valid OAuth token — was designed for reversible-transaction systems. It was designed to record permission. It was not designed to carry forward the question of whether that permission was given freely.

We have a four-part test for whether a transaction class belongs in this category. We use it to discipline our own product expansion, and we offer it here as a public framework because we believe the category is real and the test is rigorous.

The first test: high stakes. The transaction has meaningful financial or legal consequence. The floor matters less than the consequence — a $5,000 unauthorized transfer matters; a single deed transfer matters; a single AI-agent-initiated payment matters; a single beneficiary change matters.

The second test: hard to reverse. Settlement is fast or final. The transaction is bearer-instrument in form, or jurisdictionally final, or technologically immutable, or in any case structurally not subject to clawback. There is no friendly fraud team standing by, no 60-day chargeback window, no post-recording undo, no operator who can intervene.

The third test: a documented vulnerability window. Social pressure, coercion, scams, undue influence, or impaired understanding are empirically observable as material loss vectors in this category. Not theoretical, but visible in published loss data, regulatory advisories, or peer-reviewed exploitation research. If no one is being exploited at the moment of consent, no consent-provenance layer is needed — it is the existence of the exploitation pattern that creates the gap.

The fourth test: weak existing consent evidence. Current state of the art is "signed document," "checkbox click," or "valid token" — none of which carry evidence of state of mind at the moment of consent. This is the test most institutions fail by default. They have permission evidence. They do not have consent-provenance evidence.

A transaction class earns inclusion in the framework only if it passes all four tests. Many candidate categories fail one or more. The filter is meant to narrow rather than broaden.


The five categories that pass all four tests today are the ones our public website is organized around. We mention them only briefly here, because the goal of this essay is to demonstrate the framework's discipline rather than to advertise the products built against it.

Authorized-push-payment fraud in retail banking. Sarah's case. UK PSR PS24/7 requires reimbursement absent documented evidence that the customer authorized the payment freely. Identity verification does not produce that evidence; transaction analytics do not produce it; current customer-friction defenses only work if they are documented.

Elder financial exploitation in wealth management. An 81-year-old client requests a sudden $85,000 IRA liquidation; her advisor of fifteen years notes the request does not match the pattern. FINRA Rule 2165 and forty state vulnerable-adult statutes converge on a single evidentiary need: documented reasonable belief of exploitation. The advisor's note in a CRM is insufficient documentation; the elder's signature on the form is not the question.

AI agents authorized to move money. A user delegates monthly contractor invoicing to an AI assistant. The agent processes the queue. Three invoices proceed under one signed scope; the fourth arrives from a vendor not in the scope. OAuth tokens authorize the agent's API call; nothing authorizes the underlying human consent to this specific payee, this specific moment, this specific scope. The fourth payment should not proceed without re-attestation.

Self-custody crypto under physical coercion. Two attackers force a holder to sign a high-value transaction. The cryptographic signature is hardware-attested; the hardware wallet does what it was built to do; the funds are bearer-instrument and depart instantly. The cryptographic primitive proved who held the keys. Nothing in the stack proved that the key-holder was acting freely.

Closing-wire business email compromise in real estate. A spoofed email from a buyer's title company contains "updated wire instructions" pointing to an attacker-controlled account. The buyer authorizes the wire from her bank. The signature on the wire is real; the consent it carries is consent to the wire instructions in a fraudulent email. The deed will record either way.

In each of these cases, identity is verified and transaction analytics are scored. What is missing — across all five — is consent provenance.


It is worth saying what does not qualify, and why. We dwell on the negative space because a framework's discipline is visible in what it excludes.

A routine credit-card purchase is not in the category. It fails the second test: the chargeback window makes the transaction structurally reversible, and consent evidence beyond the cardholder's authentication is not load-bearing because the system is built to handle bad consent through unwind.

A mortgage application is not in the category, even though it has high stakes. It fails the second test in a different way: there is a long underwriting window, a cooling-off period, and an explicit right of rescission in many jurisdictions. The borrower's state of mind at the moment of application matters less than their state of mind across the multi-week underwriting cycle, which is observable through other means.

A login event is not in the category. It fails the first test, even when it leads to a consequential transaction downstream, because the login itself is not the action. The consent provenance that matters is at the downstream action, not the gateway.

A B2B contract signed by an enterprise procurement team is not in the category, generally, even though the dollar values are large. It fails the third test in most cases: the parties are institutional, the surface for individual social engineering is small, and the contracting process has built-in friction (counsel review, multiple signatories, board approval where relevant). The class does include specific high-stakes moments — M&A signings, severance settlements — where the third test is met. The default for general procurement is not.

A donation to a registered charity is not in the category, generally. But a major bequest from an elderly donor with no surviving family, executed in the last weeks before death, made over the phone, is in the category. The framework is sensitive to context. Categories are not monoliths.


If the four-criterion test names a real class, what follows from that?

Three things, we think.

First, the institutions that handle hard-to-reverse transactions need a new kind of evidence record. Not identity evidence, which they already have. Not transaction-scoring evidence, which they already have. Consent-provenance evidence — a portable, verifiable, cryptographically signed artifact documenting state of mind at the moment of consent. The form of that artifact matters less than the existence of the category. It can be issued by the user's device, by the institution's API, by a third-party signing service. What matters is that the evidence exists, that it travels with the transaction, and that it survives the scrutiny of post-hoc evaluation by courts, regulators, families, counterparties, insurers.

Second, the privacy properties of that evidence are not negotiable. Consent-provenance evidence is, by construction, generated at the moment of vulnerability. It must be useful for evidentiary purposes without becoming a tool of surveillance. This implies hardware roots of trust, on-device computation, decomposed indicators rather than raw biometric data, and cryptographic erasure as a first-class property. These are engineering constraints that follow from taking the category seriously, not aesthetic preferences. A consent-provenance layer that fails on privacy is not a consent-provenance layer at all — it is a different category of system, optimized for different incentives, and it should not be confused with the one this essay is about.

Third, the category cuts across industries. Banking, wealth, crypto, real estate, agentic systems — these are five different commercial environments with five different regulatory regimes, five different buyer personas, and five different threat surfaces. They are also five instances of the same underlying problem. Solutions to that problem will eventually have to be cross-industry standards. The W3C Verifiable Credentials work, the OAuth Step-Up Authentication Challenge Protocol, the IETF GNAP working group, the EBA's evolving regulatory technical standards for agent-initiated payments — these are early signs of a category being formed. It will take years to consolidate. The point is that the consolidation is happening, and the institutions that wait for it to finish will be late to a category whose contours are already visible.


We expect more categories to qualify as the framework matures. Insurance-claim settlements have the shape. Trust and estate decisions have the shape. Healthcare-directive signings — done with the ethical care this work demands — have the shape. Charitable bequests have the shape. M&A and settlement signing moments have the shape. Cross-border remittances have the shape.

None of these are in our public portfolio today. They will be evaluated against the same four-criterion test before they are. Discipline now, not later, is what keeps the category from collapsing into "every signature is a SoM signature."

It will not be. Most signatures do not need consent-provenance evidence, because most transactions are reversible, or low-stakes, or unexposed to exploitation, or already evidenced through other means. The category we are describing is narrow by intention.

What we are saying is that, narrow as it is, the category is also load-bearing. It is the locus of an unusually large share of consumer financial loss, an unusually large share of elder-exploitation litigation, an unusually large share of crypto loss, an unusually large share of real-estate fraud. It is also — increasingly — the locus where AI systems act on behalf of human principals, with consequences that need to attribute to humans.

These are the moments that matter. The evidence we keep about them is almost always insufficient to the consequences.


This essay is a public framework, not a product pitch. We have a commercial position in this category — we build infrastructure for it — and the framework's shape is unavoidably influenced by what we have learned building that infrastructure. But the framework's correctness or incorrectness is independent of any one company's commercial position. The category is what it is, whether we exist or not.

If you work in an industry that handles hard-to-reverse transactions, we would like to hear what your four-criterion test produces. Categories we have not thought of. Cases we have thought of but excluded too quickly. Counterexamples that suggest a fifth criterion. The framework is more useful as it absorbs criticism.

You can reach us at thesis@rtscale.ai.