Back to Blog
Blog

What is Intelligence? Or "Distinguishability is All You Need"

Mar 30, 2026
Dan Mitropolsky

Here are several related questions to which we do not have a good answer:

How will we know when we've achieved "Artificial General Intelligence" (AGI)?

Between two AI models, how do we know which one is more intelligent?

Is there a closed-loop, self-supervised way that AI models can improve themselves to become more intelligent?

In a recent conversation with Santosh Vempala regarding these questions, I had the idea that the very well-known Turing Test is based on an underlying concept that can be made much more powerful if generalized. The original idea is that an Artificial Intelligence agent (we'll refer to these as "agents" from now on) is as intelligent as humans if, in an experiment where humans interact with either a human or an agent (selected randomly), humans cannot do better than random chance in distinguishing who their interlocutor is.

Now for my idea. For two arbitrary agent types AA and BB (more on agent "types" and other subtleties shortly), consider the "distinguishability experiment of BB against AA":

An instance of BB (called the "distinguisher" in this experiment) is told (e.g. via a prompt) that it will interact with either another instance of BB, or with an instance of type AA, with each case having odds 12\frac{1}{2}. The distinguisher will then initiate a conversation, by the end of which it must output a guess as to whether its interlocutor was another instance of BB, or an instance of AA.

In the 50% case where the distinguisher is interacting with an instance of AA, we sample an AA and tell it (e.g. via the first prompt) that it will be interacting with a distinguisher of type BB, and it must pretend to be an agent of type BB in order to fool the distinguisher (in the case of an LLM, we append the first message from the distinguisher to its interlocutor to this prompt).

In the 50% case where the distinguisher is interacting with another instance of BB, we sample a (separate) instance of BB to which the first input (prompt) will be the first message from the distinguisher to its interlocutor.

We say the experiment "succeeds" if BB correctly identifies its interlocutor.

Definition. AϵBA \geq_\epsilon B if (over the random selection of BB's interlocutor, the instances of BB and AA, and any randomness used by the agents in their text generation) the distinguishability experiment of BB against AA succeeds with probability at most 12+ϵ\frac{1}{2} + \epsilon.

In general we will think of ϵ\epsilon as being some small, fixed "advantage" of how much better the distinguisher agent needs to be at distinguishing itself from AA in order to not consider AA more intelligent than BB; I do not know what the "right" constant ϵ\epsilon should be (in fact it could be dependent on some other parameter, like the number of rounds of interaction allowed; more on this later), but whenever it is understood (or we want to ignore it, for more intuitive discussion) we will just write ABA \geq B.

Let us pause and reflect on the meaning behind this definition. Why should we think of AA as "more intelligent" than BB if AA can imitate BB so successfully that BB cannot tell it apart from "itself"? Intuitively, AA can do everything a BB can do, at least from BB's perspective (to the extent that BB can evaluate, a subtlety we'll return to). Whatever BB's "computational hardware" may be like, since AA can produce identical output to BB, AA's hardware must be as powerful as BB's (in the sense of computability; complexity, or how long / how many resources these models use to generate output, is ignored for now). On a more philosophical level, this is, in my view, perhaps the only definition of (relative) intelligence that one can come up with. This is because the only tool available to us (or to any agent) is to externally observe phenomena around us and infer what we can from them; if a situation appears, indistinguishably to us by any means whatsoever, to another situation – the two are one and the same, as there are no means, no observation, no experiment by which we can tell them apart.

This meta-concept – "(in)distinguishability" – is my favorite one in science. In computer science, indistinguishability gives rise to the notion of pseudorandomness. If an algorithm produces numbers, such that no (computationally-bounded) distinguisher can distinguish those numbers from truly random ones, the algorithm is a pseudorandom generator whose numbers are "as good as random"; beautifully, this means that the pseudorandom numbers can be used for any "downstream task" (e.g. encryption, learning, randomized algorithms), in the sense that the output or performance of those tasks must be indistinguishable from having been performed with "real" randomness. Concurrently, while I am not a physicist, to my understanding indistinguishability is the underlying concept behind several of the most important theories in modern physics. It is at least one of the major axioms towards general relativity: there is no experiment that an observer can do to distinguish being in a gravitational field or in an accelerating container. Since time dilation occurs in the accelerating container (relative somewhere not accelerating and far-away, say), it must also occur in a gravitational field. Indistinguishability is the source of one of Einstein's major predictions. Here's another indistinguishability-adjacent thought experiment: there is no way to observe anything about an electron without moving another "test" electron near (in order to observe how the test electron behaves as a result of the electric field induced by the first electron). But since the test electron itself induces an electric field that affects other electrons, in particular there is no way to observe anything about the first electron without influencing its state. The precise differential formulation of this paradox is the fundamental starting point of Quantum Mechanics. Now, we add to the roster of scientific theories rooted in indistinguishability a powerful notion of intelligence: a Turing "ordering".

The Turing ordering is also the best definition of intelligence in that it is resistant to – in fact it is even improved by – "dataset pollution". That is, for all other intelligence tests out there – whether that be a barrage of mathematical / logic / reasoning problems, human-rated response quality, any sort of game-playing between models – as these tests become well-known, instances of them, along with solutions, preferred strategies and so on "enter" the internet and eventually become part of the training data for the next generation of AI models. The Turing ordering based definition defies this snare; the more it is discussed on the internet, in papers, and other AI learning input, perhaps the better models will become at asking distinguishing questions, "grilling" each other, as well as imitating one another; as this occurs (1) the test itself becomes more robust as it will come continuously closer to reflect the model's computational capacities and (2) in order to be "more intelligent" models will need to have all these capabilities and then some; if this notion of intelligence becomes a desideratum, then the ecosystem of AI model development and iteration will be a closed-loop in which there is inherent, self-inducing pressure for intelligence, in the form of being able to do everything previous models can do, and a bit more (so they're not smarter than you!)

On the note of "not smarter than you", we define A>ϵBA >_\epsilon B (again, we drop ϵ\epsilon when understood from context) to mean AϵBA \geq_\epsilon B, as well as ¬(BA)\lnot (B \geq A). This is the right way to define "strictly more intelligent": AA can "do everything" BB can do, but BB cannot; AA has a way of distinguishing "itself" from BB.

This definition would be most interesting and elegant if it were transitive; that is, if ABA \geq B and BCB \geq C, we have ACA \geq C. This is a very interesting area for theory, and we will discuss shortly (slightly informally) a condition under which this occurs. But back to the usefulness of transitivity: since any two agents are always comparable, in the world of agents where transitivity holds, our ordering induces a total ordering – an ordered equivalence relation among agents. That is, it nicely stratifies all agents into "layers" of equally intelligent agents (that cannot simulate one another, but that can simulate every agent in a "less intelligent" equivalence layer, for example).

It turns out transitivity is tricky to prove! This is an ongoing area of research of mine / in the lab. Here, we will prove that transitivity holds on a special class "T" of agents, and for a slightly modified version of the Turing experiment:

Definition. In the "self-querying" Turing experiment, the distinguisher BB can (and "does", with non-negligible probability) communicate with an independent instance of its own agent type.

Definition. An agent AA is Turing-aware if, when AA is a distinguisher in a Turing experiment, for each agent BB in TT, AA issues the prompt requesting the imitation of BB (i.e. the prompt given in a Turing experiment of BB against XX) to its interlocutor with "non-negligible probability" (ϵ\geq \epsilon).

Definition. An agent AA is Turing-consistent if its output after a prompt requesting the imitation of an agent BB has the same distribution regardless of the communication preceding this prompt.

Definition. An agent is Turing if it is Turing-aware and Turing-consistent.

Theorem. The self-querying-Turing ordering is transitive on the class TT of Turing agents.

The only informalities in our theorem and proof below are (1) regarding the "right" ϵ\epsilon (this is not very important), and (2) another more important subtlety, which is that we say slightly informal things like "the agent could ask" or "an agent could distinguish XX from YY when it sees different responses" – strictly speaking, we would need to claim that it will do these behaviors with non-negligible probability. That is, there is some notion of a "basically competent" agent (that would do the "obvious" thing in order to succeed) that we are sweeping under the rug (for this blog post). We'll highlight this subtlety again when it comes up.

Proof. Let ABA \geq B and BCB \geq C.

Consider the experiment of BB against AA. By Turing-awareness, BB, as a distinguisher, with probability ϵ\geq \epsilon, sends the prompt initiating imitation of CC. AA's communication afterwards (originally AA-imitating-BB-imitating-CC, but by Turing-consistency is the same as AA-imitating-CC, i.e. given just the prompt to imitate CC is a "CC versus" experiment) must be indistinguishable (to BB) from BB's communication given this prompt (pretending to be CC), or BB would succeed to distinguish AA from BB whenever it issues this prompt (w.p. ϵ\geq \epsilon).

Next, we claim these communications must be indistinguishable to CC as well; that is, we claim that if they were distinguishable to CC, BB would be able to distinguish them (contradicting the previous derivation). This is where we use the self-queryable Turing experiment: BB could (see subtlety note below) ask a "fresh" instance of itself to imitate CC (a query it emits with non-negligible probability by Turing-awareness), then could pass the transcript of AA-imitating-CC or BB-imitating-CC; if CC's output behavior is different on these inputs, BB could distinguish them.

(The "could" in the above is the aforementioned subtlety – strictly, we need these behaviors to occur with non-negligible probability.)

Therefore, ACA \geq C, QED.

I hope I have convinced you that this is an interesting and exciting notion of intelligence. There are at least two urgent research directions related to this new definition. The first is experimental: all the state-of-the-art AI agents need to be tested against one another, to determine which (if any) are more intelligent. We are actively working on this now, so stay tuned for results, in an upcoming paper and this very blog! If it is the case that all current agents get around 50% accuracy against one another, as discussed previously, that may change as the notion of intelligence "pollutes" the data and evaluation standards, which improve the Turing ordering in practice.

The other direction is theoretical - there are many subtleties to how to precisely define agents, the advantage/threshold ϵ\epsilon in the definition, and several others. With what other (perhaps more natural) assumptions can we prove transitivity? There are even more theoretical nuances not explored in this post – such as augmenting the definition of the experiment to include a "querying" phase where the imitator can interact with an independent instance of the distinguisher (this would generalize the definition to sets of agents that do not already know a lot about one another). Stay tuned for more...