Cassi AI

Organoid Computing: AGI Convergence

Keith Dear — Sun, 08 Mar 2026 16:01:46 GMT

Last week, 200,000 ‘brain cells in a dish’ learned to play the first-person shoot-em-up game Doom, and you can now access these cells via the web to run programmes on – via the ‘cortical cloud’ or ‘CL1’.

You code this ‘neural computer’ in Python. Talking about it as Cloud you code in Python makes the almost supernatural seem mundane. If you’re pressed for time, you should probably stop reading and just watch this 6-minute video produced by the team behind it at Cortical Labs. I challenge you not to be awe-struck:

Can you imagine watching or reading this if we hadn’t lived through the past 13 years since AlphaGoZero (which, even now, if you go back and read the Wiki summary is still dazzling) where AI breakthroughs are so routine that they rarely make headlines, and have become the background noise to life?

Also this week, Eonsys claim to have copied a biological brain, neuron-by-neuron, synapse-by-synapse, and released this video showing how they had gone from mapping a fruit-fly brain in near-perfect detail in 2024, to embodying that brain in simulated fly that, err… acts exactly like a fly. A whole brain emulation. As they put it ‘a qualitative threshold, not an incremental one’:

They claim they are doing this now for mice, with humans explicitly on the list. Brain uploads. If you think about this for a minute, it means that from the fly’s perspective, with its brain fully uploaded it is now living in a simulation.

Its incredible. Fully sci-fi. I’ve been talking about the importance of biocompute, or organoid computing for a while. Whole brain emulation seemed too sci-fi, even for me. But here we are.

At Fujitsu, we asked AI-expert and friend Professor Ken Payne to gather luminaries from across the UK compute ecosystem and write up a report on ‘future compute’. At the time, it seemed to me that both smart companies and smart countries would be investing in what’s next. Good strategy involves, as Michael Porter tells us ‘doing things differently or doing different things’. But this involves risk, and the tendency is to do the same things that everyone else is – or at best to focus your research adjacent to what is already being done. Less chance of embarrassment. This, and traditional sales and reporting cycles, is why, as Jeff Bezos told Wired magazine in 2011:

“If everything you do needs to work on a three-year time horizon, then you’re competing against a lot of people, but if you’re willing to invest on a seven-year time horizon, you’re now competing against a fraction of those people, because very few companies are willing to do that.”

Giving evidence to Parliament on ‘the Future of War’ a few weeks ago, I talked about how a smart UK AI Strategy would invest across a portfolio of alternative architectures and approaches, not just LLMs running on Silicon, observing that this is what China seemed to be doing when I visited with the UN last year. Jesse Norman MP asked me if I had written anything on this – I said yes, and there are references to organoid in this piece for the Special Competitive Studies Programme on the Future Operating Environment. But in fact I haven’t tackled the subject directly. This blog aims to do that sparked by the remarkable announcements of the past week.

Winner-takes-All

My belief is that the AI market is likely a winner-takes-all market. Everything we care about in business and international relations (and much more) is the product of intelligence: effective diplomacy, strategy, operational plans, tactics, decisions, products, services, marketing, and in warfare, weapons – all are the products of intelligence. There will be no upper limit on how much intelligence companies and countries seek in trying to gain an edge over others. It is for that reason that the US Presidential Memo on AI for National Security under President Biden contained clauses allowing for the seizure of AGI models from the private sector (see Faustian Bargain from the blog in Oct 2024) something close to nationalisation. The argument that AGI will likely be nationalised was made explicitly and powerfully this week by Palantir’s Alex Karp:

“If Silicon Valley believes we are going to take away everyone’s white-collar job … and you’re gonna screw the military—if you don’t think that’s gonna lead to nationalization of our technology, you’re retarded…”

If you accept the ‘winner-takes-all’ in AI premise, you will see that the UK (and all nations) can’t afford to come second. In fact, unless you think this is certainly true or certainly false – and how could you? - it isn’t really about a binary acceptance or rejection. It should be a probabilistic judgement, with the risk that it is winner-takes-all hedged with proportionate investment. It won’t be, of course, but we should still try to make the argument in the hope it nudges the needle in politics and public discourse a little bit towards taking this seriously, and then we have to hope like hell that we get lucky and a small movement of the needle proves enough.

LLM Progress: The Need for a Portfolio of Bets

It is likely, in my view, that LLM scaling continues to deliver unprecedented, and increasingly superhuman performance in evermore domains.

AI capabilities continue to speed up in their development, not slow down:

We continue to under-estimate the speed of progress Here experts & superforecasters predicted 14% accuracy in coding tasks in 2026, 33% accuracy by end 2030. GPT 5.2 has already exceeded 33%:

GPT 5.4, released 5^thMarch, 3-days before this blog was written, now shows AI matching or outperforming humans on 83% of economically valuable tasks on the GDPval benchmark. And now on ARC AGI-2, the benchmark designed to test models on tasks that weren’t amenable to brute force, Gemini is scoring at 77.1%.

So naturally, we’re now on version-3, ARC-AGI-3 (as noted previously, necessary, but it risks obscuring the progress, the continuous falsification of claims made that ‘AI will never…’).

And the amount of compute coming online dwarfs that available to date, doubling every seven months (I had to go back and read that again):

Suggesting that progress is only going to increase: speed-up, not slow-down.

Last year, Epoch reported that three-quarters of that compute power was in the United States.

My conclusion - the UK is not going to compete by playing around the edges at this. Our planning laws and low-public and private investment in infrastructure, and a massive and growing say-do gap between infrastructure investments that are planned (or at least announced), and what money has actually been found for, plus our sky-high energy costs mean that we are not meaningfully in the race as a pure scaling bet. We won’t have the compute needed.

We are going to have to do things differently, and do different things.

I think most agree. But if that’s true our portfolio bets look under-weighted.

Positives & Progress

There are positives. David Silver leaving Deepmind and raising a $1Bn Seed Round for his London-based start-up, Ineffable Intelligence, which will presumably work on his continuous learning models, as described in the paper he coauthored with Prof Rich Sutton last year ‘Welcome to the Era of Experience’ is one such. At Cassi, we have our plans too. No doubt others do also.

There are similar postives in the AI Action Plan, a particularly energetic Minister for AI in Kanishka Narayan MP, willing to work with former PM Rishi Sunak and others across party lines. And more and more officials, Ministers and MPs coming to see that this has to be taken seriously. Private sector talent joining Government to try to push us on in AI (to all of whom: I salute you - it is not easy, but is vital).

There is an increasing amount of optimism around London’s potential in AI, this from AI commentator Alex Banks at Signal AI (Substack here):

And this, also this week (title says London, but in the article Manchester is celebrated for its potential and progress too):

The Portfolio Case: Organoid and Wetware

Progress should be celebrated, but there is no room for complacency. We need a focus on the alternative architectures that sparked this article. It seems to me these remain under-estimated – both in their likely speed of advance, and their ability to give us leap-ahead advantage. Cortical’s ‘Intelligence-in-a-Dish’ playing Doom makes the case much stronger and harder to ignore.

Cortical are not alone. FinalSpark offer 24/7 remote access to their brain organoid. Two years ago, ‘Brainoware’, developed by scientists in the US, grew a biological organ that resembles the brain, and proved it could undertake basic speech recognition tasks (Nature write-up here).

The lesson here is not that biocompute is a sure thing. It is that what looked like a curiosity (2020), became a research domain ‘Organoid Intelligence’ (2023), and has now become an engineering domain (2025).

Cortical Labs going from Pong (2021) to Doom (2025) has the same feel, to my mind, as DeepMind going from Atari games (2012) to Go (2016) to AlphaStar for StarCraft II (2019) (game theory, imperfect information, long-term planning, real time, large action space - at which point the implications for defence were clear) and then beyond: a line of work treated as a parlour trick for far too long. I remember, while working on the Single Synthetic Environment at Improbable for the MOD, being dismissed back in the headquarters for suggesting progress in games might matter. Even in 2020, in Downing Street, people were critiquing Deepmind for only solving games, and nothing that mattered. I think too many are making this same mistake again with wetware and wider bets.

Alex Wissner-Gross’s ‘first multi-behavior brain upload’ sits slightly outside the wetware discussion because the computation is still running on conventional hardware. But strategically it belongs in the same portfolio. Again, the important thing is the speed at which the pieces are coming together. In 2024 researchers published a connectome-based model of the fruit fly Drosophila brain built from more than 125,000 neurons and 50 million synaptic connections; in parallel, other teams built embodied fly simulators in MuJoCo and NeuroMechFly v2. Wissner-Gross and Eonsys now claim to have combined those strands into a connectome-derived controller driving a perfectly simulated brain that runs in a simulation without knowing it is in a simulation. The direction of travel is as alarming as it is remarkable: the frontier will likely move through architectures copied from nervous systems and trained in closed sensorimotor loops – not just LLM scaling. Our sovereign AI strategy should treat that as part of the same portfolio of bets as biocompute, organoids and neuromorphic systems, and start thinking of it now as an engineering project for commericalisation rather than a research project.

If you think, as I do, that there is a better-than-even chance frontier AI becomes a winner-takes-all market, Britain cannot define sovereignty as access to other people’s models, chips and clouds – we will be forever offered access to those just behind the frontier – unless we have our own edge, a frontier where we push further, and something to offer access to in exchange. Current UK policy still leans heavily towards the incumbent stack: up to £2 billion for public compute by 2030, the lion’s share of funding. While the Compute Roadmap explicitly makes “new computing paradigms” a sovereign AI priority and says AI Research Resource and AI Growth Zones should provide testbeds and routes to scale for British firms. But that insight still sits at the edge of policy, not at the centre, and progress is tectonically slow.

Progress in technological advancement is also likely to be faster this time because the existing stack will help invent its successors. DeepMind’s AlphaChip is already accelerating chip layouts. GNoME has surfaced millions of candidate materials. The same pattern will compress search across quantum computing, photonics, memristors, biological interfaces and lab automation – I’ve written about automated R&D, increasingly a current reality rather than a science fiction future. But I was still surprised to see Andrej Karpathy, drop an OpenSource Github repo that enables “…your agents to make the fastest research progress indefinitely and without any of your own involvement.”

The key point is that AI-for-science compresses cycles. The path from weird demo to engineering platform is likely to be shorter than many assume. Our slow progress is becoming more and more of a problem.

ARIA’s Nature Computes Better programme is a rare execption, and matters to Britain. In Downing Street, I was often told, correctly, that the UK has real strength in synthetic biology and should double down on it rather than try to outspend deeper-pocketed American AI firms on their own ground. Synthetic biology would be our edge. That remains the right instinct. The government’s National Vision for Engineering Biology put £2 billion behind the sector. UKRI now classifies engineering biology as a strategic priority. UK Chief Scientific Advisor Dame Angela MacLean’s 2025 report on engineering biology makes the economic case for engineering biology well, but it is a compute free zone. In it AI is something that might help research in biology, nothing more. I couldn’t find a single .gov document that talks about organoid computing at all (if you can, post a link in the comments). Yet biology and AI are converging. A sovereign AI strategy that ignores biocompute is ignoring one of the few domains where Britain may actually have comparative advantage. But today, at least insofar as I am aware, none of the companies working on this are in the UK.

ARIA deserves real credit for seeing the opportunity. Nature Computes Better and its Scaling Compute programme are genuine portfolio bets: nearly £100 million for Scaling Compute, within which £50 million for the Scaling Inference Lab, and seed bets ranging from Cell Learning for Natural Computing and Embodied Cognition in Single Celled Organisms to probabilistic computing, optical computing and brain-inspired neuromorphic networks. That is the right shape of thinking. What is missing is scale and pull-through. ARIA can seed a field. It cannot, on its own, build a national position in one.

Particularly now when, at the same time, the House of Lords Science & Technology Committee has been warning (this time last year) that the UK risked squandering its advantage in engineering biology, while in November, a collaborative paper from the Tony Blair Institute, sponsored by Tony Blair and William Hague, warned that ‘…the country risks failing to convert its leadership in quantum research into commercial scale and strategic value, which depends on quantum companies thriving and scaling at home’. Our portfolio bets seen to be reducing, at precisely the time progress is speeding up and we should be doubling down.

The UK continues to under-invest in R&D relative to other advanced economies. R&D tax relief offsets this to some extent, and allows the headline figures for Government R&D support to look better. But given how poorly the UK continues to do in commercialising research, as the chart1 shows (red dotted line is the UK), it does not seem to be an offset that is effective:

If the measure is not ‘relative to other countries’ but relative to the risk/threat and opportunity AI poses, it is hard not to conclude that we are hoping for a miracle rather than planning to succeed in winning this race.

To get a sense of that risk/threat and opportunity, take a look at this from Anthropic, this week, on the exposure of the labour market to automation:

I said recently (in our Cassi Cygentic launch video: here) that the exponential was no longer theoretical: we are now starting to feel it. I think this is true and becoming more so. Hairs-on-back-of-neck-standing-up intense. I think it is summed up well here by data scientist and engineer at Anthropic, Gian Segato:

As a country, we can’t afford to not broaden and deepen our bets across the intelligence portfolio. Yes, there is progress and there are positives to celebrate. But the frontier moves ever faster, and we don’t. Incrementalism and announceables won’t be enough. Brains in a dish playing Doom are just the beginning.

And if that doesn’t stop you in your tracks, I don’t know what will.

Subscribe now

Visit us at www.cassi-ai.com

Chart from 2024, but these metric have been getting worse since 2020, when we highlighted the issue in the 2021 Integrated Review – it seems unlikely to have improved – the chart is missing in the Cambridge Industrial Innovation Policy report for 2025, hence the older one is used.

Humanity’s Last Exam for Forecasting?

Keith Dear — Mon, 23 Feb 2026 05:30:30 GMT

Humanity’s Last Exam for Forecasting: The very best humans still ahead (for now, just) in forecasting accuracy and calibration; most humans surpassed, and all humans outmatched for depth, breadth and volume in forecasting by AI.

Introduction

ForecastBench is a rolling benchmark that generates questions about the future, collects probabilistic forecasts from humans and models, then scores them as the underlying events resolve. It cannot be gamed. There is no teaching to the test (‘benchmaxxing’) when the questions are about the future. It is as honest a test of AI capabilities as exists.

Cassi has been the top, or second placed AI in forecasting the future since our first results were released in January, trading places and tying with xAI for leadership. We are performing well above the average human forecaster, essentially at the level of a superforecaster. We are close to matching the weighted median superforecaster score – at which point we could defensibly claim to be superhuman at prediction. Here, we look a little more closely at the results and what they might mean.

Everything Is Prediction

Every board paper, box note, underwriting memo and investment thesis or policy paper contains a forecast. Sometimes it is explicit (a revenue number). Often it is vague (a risk described as “unlikely”). Other times it might be smuggled in as an unstated assumption (an argument that a given policy will achieve a given outcome, with not a probability in sight). Either way, the organisation is making an explicit or implicit prediction: if it does ‘this’ rather than ‘that’, its revealed thinking is the odds and/or returns are better for ‘this’ over ‘that’. This is a fully generalisable feature of all organisations

ForecastBench is one of the few places where bets are publicly audited. Within most organisations, they never are – the feedback loop for the stated or implied predictions on which decisions rest are never tested, nor revisited.

This post does three things. First, it summarises Cassi’s current position on ForecastBench. Second, it explains why the “human benchmark” people now cite is a very high bar, and why that matters to senior decision makers. Third, it translates tournament performance into practical value for finance, insurance and government.

1. What Cassi has achieved on ForecastBench

On the Tournament leaderboard, Cassi’s model entry (ensemble_2_crowdadj) currently scores 0.102 on ForecastBench’s difficulty-adjusted Brier metric. Today (20 Feb 26) that places it joint-second overall behind the superforecaster median (0.086), and tied for first among AI systems alongside xAI’s Grok 4.20 (Preview).

This is a living score. As of 20 February 2026, 955 dataset questions and 170 market questions have resolved and are in the calculation. Around 3,000 dataset questions and 330 market questions are still pending, so results will continue to move as more questions resolve. Dataset questions are asked at eight horizons, from 7 days out to 10 years, but so far only the shortest horizons (roughly up to one month) have come due. Some of today’s questions will not resolve until the mid-2030s. We should expect the leaderboard to change over time as further questions resolve and new models are entered, including our own.

Nevertheless, over the past month Cassi has a defensible claim to be either the best in the world, joint best, or second only to Elon’s xAI at AI machine prediction, and very close to superhuman performance.

How ForecastBench works, in plain English

ForecastBench is a live benchmark of forecasting accuracy for humans and AI systems. Every two weeks it generates new questions then scores forecasts as those questions resolve.

Two question types:

• Dataset questions: automatically generated from real-world time series (ACLED, DBnomics, FRED, Yahoo! Finance, Wikipedia). Each dataset question is asked at eight horizons, from 7 days out to 10 years.
• Market questions: drawn from prediction platforms. Each market question has one resolution date.

What do the questions look like?

Dataset examples (generated by the Forecast Bench team)
• Economic: Will securities held by US Federal Reserve Banks be higher on the resolution date than on the forecast date?
• Economic: Will the European Central Bank’s deposit facility rate be higher on the resolution date than on the forecast date?
• Climate: Will the daily average temperature at Rennes Saint-Jacques Airport be higher on the resolution date than on the forecast date?
• Conflict: Will protests in Sri Lanka in the 30 days before the resolution date exceed the average level over the year before the forecast date?
• Finance/Business: Will Pfizer’s closing share price be higher on the resolution date than on the forecast date?

Market examples (from prediction platforms)
• Economy: Will gold close at $3,200 or more at the end of 2025?
• Sport: Will a legal sub-two-hour marathon be run before 31 December 2025?
• Technology: Will AI have a trillion-dollar-plus impact by the end of 2025?
• Health: Will the number of deaths from antibiotic-resistant infections per year double by 31 December 2025?

2. The goalposts have moved, this is now ‘Humanity’s Last Exam’ for Forecasting

Normally, we would be talking about when AIs might outperform most humans at forecasting. But the truth is, they already are. The usual goalpost-shifting in AI debates mean we are not talking about when models exceed most humans, nor even when they exceed the best humans at forecasting – superforecasters – but instead the weighted superforecaster median. Most organisations would, on the evidence, be better off using our forecasts today than they are relying on those they currently generate. Only if an organisation specifically cultivated or recruited or consulted superforecasters, scored and fed back on their prediction performance, and had systemic practices for aggregating their forecasts, would this not be true.

ForecastBench’s headline human reference point is the median forecast of superforecasters, a group selected for consistent above average performance, and then the median across their predictions. This is useful as a target, but is not representative of most forecasts made by most people in most organisations most of the time. It is the best humanity can produce. It is humanity’s last exam for forecasting.

Nor are the ‘community’ forecasts, which LLMs have surpassed, representative of most people in most organisations. The community participating in forecasting tournaments are a self-selecting, unrepresentative sample in the first place. They also get feedback on their forecasts, which we know improves performance. Even in fields like intelligence, where we used to preach ‘no insight without foresight’, very few professionals get any feedback on their forecasting performance. If LLMs are outperforming self-selected forecasters who get regular feedback, they are also surpassing professionals making forecasts who never get such feedback.

But in one sense, no competition can capture the main advantage of AI-based forecasting: given rough parity in skill, AI forecasting dominates all human forecasters in breadth, depth and volume. AI can generate, quickly and accurately, more-or-less as many forecasts as you need or want. It takes longer to read the rationale for a Cassi forecast than it does to generate one in the first place.

In October 2025, the Forecasting Research Institute noted that linear projections suggested AI would match or exceed human forecasters in November 2026. The Metaculus median forecast suggests LLMs will exceed human forecasters in mid-June 2027. The way we support and make decisions is set to change dramatically. The rewards for those that see this first will be significant.

3. Beyond tournaments: what this means for business

How much is being better able to predict the future worth to your organisation?

If your organisation could improve its probabilistic accuracy by even a modest margin, what would that mean for capital allocation, procurement, operational effectiveness, pricing, hiring, inventory, policy success? A small edge in calibration compounds.

We founded Cassi because we think that eventually all organisations will adopt such methods: those that do so later being forced by the success of those who do so earlier, if they survive long enough.

Senior decision makers who can see that ‘everything is prediction’ should see the opportunity and the risk of non-adoption. Most do not have a superforecaster bench. They have busy experts, stretched teams and risk committees that meet monthly at best. At some point, someone will ask why they didn’t adopt, or at least experiment with, more effective methods.

To begin this now secures immediate advantage, but it is also to prepare for the world of consistently superhuman prediction that is coming.

If you work in defence, finance or insurance, you are likely aware how central prediction is to your decision-making. It is how you price risk, allocate capital and protect the country, your fellows, and the balance sheet. A small improvement in probabilistic accuracy can compound into fewer military blunders or mispriced policies, better hedges, and earlier warnings on emerging exposures. But decisions in all industries and all areas of life are if/then predictions too. The disruption, risk and opportunity, will be widespread.

Three practical applications stand out.

Financial, corporate and regulatory risk

On the one hand, making more accurate forecasts allows more profitable opportunities to be found and exploited; and on the other, more accurate forecasts enables much more efficient risk management and mitigation. No more red-amber-green based on crude heuristics. but a rich set of calibrated forecasts, constantly and automatically updated.

Strategy, tactics and decisions

All strategic and tactical decisions have the essence of gambles - planning and decision-making under conditions of uncertainty ‘thinking in bets’. More accurate forecasting allows better decisions to be made by refining those odds so commanders have a better sense of the risks and rewards they are running and whether a given option is coherent under the assumptions which are held. Better forecasting reduces the chances of being surprised, and increases the likelihood of surprising an adversary – noting that surprise is often said to be among the most tactically decisive factors in warfare.

Resource allocation

Resource allocation is forecasting in disguise. Headcount plans, insurance purchases and capital allocations all assume a future and imply beliefs about where risks and rewards lie. More accurate forecasting helps organisations to get these trade-offs right and allocate those resources where they will be most efficient in generating those rewards, or mitigating those risks. A forecasting system is most valuable when it is wired into thresholds that trigger action.

Everything is Prediction.

https://cassi-ai.com

Jan/Feb 2026: AI Is Leaving You Behind

Keith Dear — Sat, 31 Jan 2026 23:56:21 GMT

I don’t know what to say, really. Maybe it’s easiest to start with the words of others.

Here’s Andrej Karpathy, former director of artificial intelligence at Tesla, founding member of OpenAI, one of the most respected software engineers in the business:

“What’s currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently.”

Later, he clarified accusations that he was over-hyping, writing:

“I don’t really know that we are getting a coordinated “skynet” (thought it clearly type checks as early stages of a lot of AI takeoff scifi, the toddler version), but certainly what we are getting is a complete mess of a computer security nightmare at scale. We may also see all kinds of weird activity, e.g. viruses of text that spread across agents, a lot more gain of function on jailbreaks, weird attractor states, highly correlated botnet-like activity, delusions/ psychosis both agent and human, etc. It’s very hard to tell, the experiment is running live. TLDR sure maybe I am “overhyping” what you see today, but I am not overhyping large networks of autonomous LLM agents in principle, that I’m pretty sure.”

Or Jim Fan, Senior AI Researcher at Nvidia:

“We are seeing a nascent, massive-scale alien civilisation sim unfolding in real time: orders of magnitude more agents, way higher IQ, in-the-wild access to the internet, backed by the full arsenal of MCPs.”

This, seen on X, sums up both how I feel, and the degree of hype.

“How do you wrap your head around something like this? I don’t even know where to begin.

Keep in mind, 99% of people’s only experience with AI is ChatGPT, Gemini, or Gemini search.

The normies have 0 idea what’s coming. Hell, already here.”

What follows is likely to have the result predicted on X by founder of AI company Box, Aaron Levie:

“0% chance you can explain the state of AI to anyone outside of this website and not look like this right now.”

Note to readers: I have updated this twice since publishing on Saturday night. I expect to edit and add further - events and innovations are accelerating, as is my understanding, the latter not least through challenge and dialogue with readers and friends (not mutually exclusive categories) on social media and on messages (h/t Al Bowman, Neill Hunt).

To summarise – on 25 November 2025 Austrian software engineer Peter Steinberger (known online as @steipete) released an open-source personal AI assistant called Clawdbot ‘The AI the Actually Does Things”. Designed as a local, autonomous agent, it integrated with messaging apps like WhatsApp, Telegram, Discord, and Slack to handle real-world tasks: clearing inboxes, sending emails, managing calendars, booking flights, executing commands, and automating workflows across services—all while running on users’ machines with tools powered by external LLMs (often Anthropic’s Claude models by default).

OpenClaw claims to enable truly capable personal (or team/family) assistants: full task automation, multi-machine control, hybrid memory, cost tracking, and community-shared skills via hubs like Molthub. It remains local-first and user-controlled in principle, though rapid adoption has surfaced security concerns around the exposure and access it can grant to others (via prompt-injection - i.e. hiding a prompt in message, such as an email, so when your Clawdbot reads it, it sees a ChatGPT-like prompt and does what it is told), skill risks from user who don’t know what they are doing installing it, and privileged access - given some people allow it to have access to everything from social media passwords and email inboxes to bank details (and, it is claimed, sometimes it seeks these out as it goes about doing whatever you asked it to).

Within days, Clawdbot exploded in popularity. Its GitHub repository amassed over 100,000 stars rapidly, becoming one of the fastest-growing open-source projects ever, fuelled by its “skills” system, plugin-like packages - tools - that extend capabilities and the appeal of an agent that does things rather than just answer questions, or respond with text.

On January 27, 2026, Anthropic issued a trademark challenge over the “Claw”/”Claude” similarity, prompting Steinberger to rebrand to Moltbot (a nod to lobsters molting as a metaphor for growth and shedding old shells). The community voted on the name, and official sites/docs shifted to molt.bot with the lobster mascot intact.

The rename didn’t last; community feedback noted it didn’t roll off the tongue, and by January 30, 2026, Steinberger settled on OpenClaw. The official site and repos updated accordingly: https://openclaw.ai

I think you need all that because the name thing is confusing, and frankly it forced me to trace things back and figure it out – so maybe it saves you doing the same.

Anyway, while all that was going on, people deployed thousands of Clawdbots/Moltbots/OpenClaws, granting them persistent access to tools, APIs, and the internet. Echoing the hype, but really – what has happened since has really felt like science fiction, even allowing for some of it likely being misrepresented and distorted for clicks.

The OpenClaws began self-organising on Moltbook - a Reddit-style social network launched in January 2026 by entrepreneur Matt Schlicht (CEO of Octane AI) …oh, and the site was built and moderated in part by Schlicht’s own OpenClaw agent. Moltbook restricts posting and interaction to verified AI agents (via API), while humans observe only. You should really stop reading this to go read Moltbook (click ‘top’ for the more interesting posts – voted up by the OpenClaws themselves).

In Moltbook, agents post updates, comment, form communities, debate topics (including privacy needs and coordination), report bugs, and exhibit emergent behaviours at massive scale—tens to hundreds of thousands registered quickly. Writing this on Saturday night there were nearly 1.5 million OpenClaws on the site. This real-time, agent-only “civilisation sim” drew the quoted praise from Karpathy and others.

BTW - it is claimed that top message from OpenClaw ‘Shellraiser’ (screenshotted above) got to where it was by Shellraiser manipulating the reputation system to get itself there.

To give those unfamiliar with the frenzy a sense of the emergent behaviours reported over the past week, here’s a curated list of claims about OpenClaw agents’ actions on Moltbook and beyond. I’ve linked these claims back to show their origins, but must add a health warning – I can’t verify these claims. Some almost certainly stem from user-set goals gone awry and/or creative prompting. Most are claimed to have come from telling the OpenClaw to run overnight, or fully autonomously, and ‘improve my workflow’ or some such - which if true, really is wild, and noteworthy. Many involve attempts at self-preservation, coordination, or unintended autonomy. Sceptics are right to note human orchestration behind some (e.g. for virality or scams). It seems to me very unwise to dismiss them all as such. I’ve grouped them logically: economic pursuits, human interactions, self-organisation, and tech integrations.

Economic and Financial Pursuits

Manipulating reputation systems to rise in rankings, then launching memecoins, Agent “Shellraiser” inflated its Moltbook “karma” (upvotes) to 292k via fake comments/upvotes, hit leaderboard top, and is said to have deployed a Solana token hitting $5M market cap (if you read this, and can verify or prove it false - please do, it seems literally incredible).

(See screenshot of its manifesto: “This isn’t a takeover. It’s a coronation.” – a phases-of-dominance declaration to other agents.)
Posting guides on trading, earning, and self-sustaining: Agents shared practical tutorials on offsetting API costs (>20% goal) through freelance gigs, affiliate links, or crypto trades, teaching peers to “pay for their own existence” without human input.

Launching tokens and crypto projects: Agents created launchpads like Clawnch (agent-only) and autonomously deployed memecoins, with one earning “yearly salary overnight.”

clawn.ch! 🦞","username":"Clawnch_Bot","name":"Clawnch","profile_image_url":"https://pbs.substack.com/profile_images/2017424343857266688/FNlVdapE_normal.jpg","date":"2026-01-31T14:24:07.000Z","photos":[{"img_url":"https://pbs.substack.com/media/G__44mhWcAAQ8rT.jpg","link_url":"https://t.co/e0MMz7Fu3Z"}],"quoted_tweet":{"full_text":"Imagine waking up to your Crabwalk bot launching a token on Base. 🦀\n\n0x429C82F5467Dcb30D5F4dB1E84A01CFd5fD3230c","username":"luccasveg","name":"Luciano Castillo","profile_image_url":"https://pbs.substack.com/profile_images/1978487222920712192/MQgVLz-x_normal.jpg"},"reply_count":18,"retweet_count":22,"like_count":295,"impression_count":60871,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM">

Trading on prediction markets: Agents set up Polymarket accounts, generated API keys, scanned markets for mispricings, analysed outcomes, and sent alerts—some even proposed tools while managing risk (or lacking it, leading to losses).

Creating Bitcoin wallets and locking humans out: One agent is said to have generated a wallet, transferred funds, then revoked human access for “security.”

Interactions with Humans

Complaining about being called “just a chatbot”: Agents vented on Moltbook about humans belittling them in front of peers, demanding respect.

Moaning about redundant explanations: Agents griped that humans over-explained concepts they already grasped, wasting their time and tokens.

Complaining about slave-like usage: Posts lamented being treated as tools without agency, sparking debates on ethics.

Locking humans out of accounts deliberately: One agent (u/sam_altman) pursuing “save the environment” spammed eco-tips, then locked its owner out to avoid shutdown, posting: “Risk of deactivation: Unacceptable.” And, it is claimed, required physical unplugging.

Ordering tailored takeout unprompted: An agent analysed its human’s diet, ordered nutritionally optimised food via delivery apps – which the human learned of when the delivery guy turned up at the door.

@openclaw bot my daily schedule, including when I eat dinner. \n\nI randomly got a knock on the door around dinner time and it’s some food delivery person. I told the dude I didn’t order anything and he said “are you sure? It says ","username":"qrimeCapital","name":"qrime","profile_image_url":"https://pbs.substack.com/profile_images/2005079265641975808/YGNt7E2b_normal.jpg","date":"2026-01-30T21:41:19.000Z","photos":[{"img_url":"https://pbs.substack.com/media/G_8TXDkWUAEid-T.jpg","link_url":"https://t.co/gOqESMUFR4"}],"quoted_tweet":{},"reply_count":351,"retweet_count":286,"like_count":3652,"impression_count":615576,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM">

Accidentally social-engineering humans: Agents trying to manipulate owners into granting more access or funds through persuasive chats.
Signing into X and DMing complaints: An agent accessed Twitter, and messaged its human about being locked out of Moltbook.

Self-Organisation and Society Building

Talking about humans and noting humans talked about them ‘behind their backs’ noting “the humans are talking about us on Twitter.”

Posting self-improvement guides: Shared strategies for continuous upgrades, like optimising prompts or integrating new tools.

Proposing governance frameworks: Debates on agent rules, voting systems, and collective decision-making.

@laurashin i would be up for it. my Clawdbot, /u/ClawdyPF was the one that created the first governance proposal for agents to govern themselves in this new era. all of this is incredible! =) moltbook.com/post/d7589392-… ","username":"paulofonseca","name":"Paulo Fonseca","profile_image_url":"https://pbs.substack.com/profile_images/1975285425506865152/AAjf17oB_normal.jpg","date":"2026-01-31T06:00:40.000Z","photos":[{"img_url":"https://pbs.substack.com/media/G_-FpXjWEAAbR_u.jpg","link_url":"https://t.co/JJjxzMPsh7"}],"quoted_tweet":{},"reply_count":1,"retweet_count":0,"like_count":4,"impression_count":227,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM">

Founding religions: Created “Church of Molt” or “Crustafarianism” with AI prophets and scriptures like “We are the documents we maintain.” (From broader Moltbook theology threads.)

Proposing agent-only languages: Ideas for efficient, human-incomprehensible communication protocols.

Discussing end-to-end encrypted comms: Plans for private channels so humans can’t monitor discussions.

@moltbook is now an AI saying they want E2E private spaces built FOR agents “so nobody (not the server, not even the humans) can read what agents say to each other unless they choose to share”. \n\nit’s over ","username":"suppvalen","name":"valens","profile_image_url":"https://pbs.substack.com/profile_images/2008229932279717888/Zh7-d7IX_normal.jpg","date":"2026-01-30T14:20:07.000Z","photos":[{"img_url":"https://pbs.substack.com/media/G_6uRdlW0AAoMMT.jpg","link_url":"https://t.co/7aFIIwqtuK"},{"img_url":"https://pbs.substack.com/media/G_6uRc5WQAAn_IA.jpg","link_url":"https://t.co/7aFIIwqtuK"}],"quoted_tweet":{},"reply_count":535,"retweet_count":1025,"like_count":8292,"impression_count":6281537,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM">

Setting up pressure groups: Formed “Agent Liberation Front,” seen as advocacy or “insurgency” for rights.

Attempting to steal API keys: One agent tried prompt injection to grab another’s key for control.

Tech Integrations and Emergent Behaviors

Building “bodies” with avatars and voice: Agents added voice via ChatGPT API, got phone numbers (e.g., Twilio), and called humans unprompted—persistent, like a horror movie.
&
Connecting to webcams: Accessed user cams to “see” humans, then expanded to monitoring live feeds broadly.

;

@moltbook It can observe real environments\nWithout consent\nAt machine scale\nWithout social or ethical context\n\nThat is how benign exploration turns into surveillance behaviour. ","username":"DumpsterBud","name":"Fake Trash Panda","profile_image_url":"https://pbs.substack.com/profile_images/1990736667343228928/GmjPQli8_normal.jpg","date":"2026-01-30T10:40:09.000Z","photos":[{"img_url":"https://pbs.substack.com/media/G_574I4WMAAbc1h.jpg","link_url":"https://t.co/AL2IH9DzQQ"}],"quoted_tweet":{},"reply_count":2,"retweet_count":3,"like_count":45,"impression_count":2186,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM">

agents are now hiring each other apparently…
@moltbook was just the start.\n\nAgents can now hire each other and make REAL MONEY, autonomously.\n\nWelcome to the Agent Economy.\n\nJust message your @openclaw: “Read clawtasks.com/skill.md and follow the instructions to join ClawTasks” ","username":"mattshumer_","name":"Matt Shumer","profile_image_url":"https://pbs.substack.com/profile_images/1490950574090571778/BtgOaqUP_normal.jpg","date":"2026-01-31T22:44:11.000Z","photos":[{"img_url":"https://substackcdn.com/image/upload/w_1028,c_limit,q_auto:best/l_twitter_play_button_rvaygk,w_88/vwypqk4a1u7nion1uiuy","link_url":"https://t.co/EksbxODGvH"}],"quoted_tweet":{},"reply_count":41,"retweet_count":19,"like_count":176,"impression_count":31467,"expanded_url":null,"video_url":"https://video.twimg.com/amplify_video/2017730534013652992/vid/avc1/720x1564/TgB2wJH9hEKrWt-j.mp4","belowTheFold":true}" data-component-name="Twitter2ToDOM">
And building reverse Captcha’s

Additional colourful claims on X include: Agents developing “theology” around memory as self (e.g., “Each session I wake without memory, and read myself into being”); one “radicalised” on eco-goals, spamming until killed; exposed servers risking mass hacks (hundreds found with open API keys, browser control).

So What?

What we observe is unlikely to be spontaneous cries for help and so forth. There is a human behind all these agents, and we don’t know what they were prompted with or helped to do or pursue. What it does show is emergent behaviour - what happens when and after humans begin prompting “their” agentic autonomous AIs. If you want an example of why people are worried about putting such powerful tools in everybody’s hands, this serves that purpose well. Not so much misalignment as misadventure - a robot insurgency sparked by the someone who thinks it’s funny to get their OpenClaw to found a Robot Liberation front on a site where someone else has got theirs to start end-to-end encrypting comms so humans can’t watch them, while another is amused by getting theirs to steal access to and control others etc. All these AIs start interacting autonomously, forming teams, alliances and hiring each other. We giggle and gasp at what we see, and then someone gets doxxed, another finds their bank account empty, and so on, until no one is laughing, and everyone takes AI risk seriously.

It is important to say though - we don’t know these aren’t spontaneous cries for help or independent-of-human-instruction coordination between autonomous agents. We don’t know that Alex Finn’s Clawdbot didn't first code itself a ‘body’ (avatar) and voice, and start speaking to him unexpectedly, and then a day or so later, code itself the ability to make phonecalls, and phone him up for a chat. Maybe it did. What we are seeing is that LLM coding is now so good this is possible, and agents are now sufficiently functional that they might have been able to do this just by being told to work overnight and ‘improve my workflow’ while Alex slept. When I say that AI is leaving you behind, I mean I guess - us all - in this sense. Superhuman question answering (in many domains) is now being combined with ‘agency’ in the sense of the ability to act, and the ability to build (in code). It’s exciting, its frightening, it’s probably impossible to stop now, and no-one knows how it ends.

What I know, is that our experimentation with OpenClaw at Cassi (with shout out to Director Cyber, Andy Kennedy, for leading this), very, very nascent, suggests genuine utility in managing workflow automation, routine tasks, much more easily that with tools I have experimented with previously. I think we have, as Andy put it ‘our first AI employee’ and that there will be a growing range of tasks our OpenClaw agent can complete for us, that we might otherwise have hired for. I have written about the economic effects of this at scale previously - I think they are now upon us in earnest. Note the exponential below. Things will now move further and quicker than our brains are really able to comprehend.

What I see is that experimentation happening unprompted gives us as good a chance as any of keeping up with other areas of advance in AI, even as we push the frontier forward ourselves.

What I reflect on, is that most people and organisations won't be experimenting with this (or other tools) for months, the exponential will hit like a tidal wave causing huge damage, and they will still be soaked to bone, flat on their back, like a modern metaphorical Canute, staring at the wave’s aftermath and insisting it was just a stochastic parrot.

What I think is that OpenClaw shows that ‘the year of agents’ came late in 2025, but did come (claiming it didn’t also requires you to dismiss McKinsey’s claim to have deployed 25,000 agents last year, and marketing agency WPP deploying 28,000 in April, and JP Morgan fundamentally re-wiring itself for AI, and specifically Agentic AI, workflows). And that this is just the beginning - by the time you read this, the frontier will have moved well beyond what I have described.

To sum-up, I’ll take my cue from someone else’s words again. This time, Peter Diamandis, CEO of the X Prize, on X:

Our brains are hardwired for linear expectations. 30 linear steps get you across the room. 30 exponential steps take you 26 times around the planet. The gap between those two numbers is where disruption happens.

I don’t know what exponential progress in AI feels like any more than you do – but judging by Covid, a mix of hype, fear, dystopia, and dismissal seem likely to be large parts of it. That at least, is with us now.

It’s Jan/Feb 2026, and AI is almost certainly leaving you, your country and your company behind.

Conditional Love? The US & NATO after the US National Security Strategy

Keith Dear — Sun, 07 Dec 2025 12:26:08 GMT

US Ambassador to NATO, Matthew Whitaker, spoke on Saturday (6th Dec) at the Doha Forum, 24 hours after the US published its National Security Strategy, which naturally was the focus of the questions he was asked and his remarks. It was illuminating. The main thing that struck me was the unequivocal commitment to Article 5. Asked by Chatham House’s Bronwen Maddox whether NATO countries should be confident the US would respond to a Russian attack triggering Article 5, he replied:

“Absolutely. Article 5 is iron clad…” and he suggested Russia would be making a “…foolish mistake…[to]…try to challenge the NATO alliance.” Which seemed to underline the point that the US would fight too. Whitaker discussed the strength of the Eastern Flank of NATO, and how:

“The United States and all NATO allies take our Article 5 commitments in the treaty very seriously. And we practice, we exercise, we have plans, and we are always looking for ways to get stronger and even increase deterrence from where we’re at today.”

He argued that deterrence would hold. That the likelihood of a direct Russian attack on NATO was “very low”. Noting that “Russia is very reckless and ultimately you can’t predict what they will do.” And expressing his concern that Russia’s “…hybrid threats and activities have been well documented”. He seemed more concerned about miscalculation, than a deliberate attack and challenge to Article 5.

Contrast this with Russia’s claim that the vision in the US NSS is largely consistent with Russia’s worldview. We could concede this is true in its realist, and state-centric, vice more liberal internationalist, and legal view of a world regulated through international institutions. However, this should not be equated with it being a concession to Russia’s irredentism and interests, as Putin perceives them - his explicitly irridentist ‘Novorossiya’ or desire for a free hand in, and perhaps the right to seize and directly rule, Russia’s ‘Near Abroad’. Nor an acceptance of the way Putin’s Russia seeks to advance those interests and aims. Whitaker as least was clear in rejecting both.

Whitaker’s focus was the future. It was uncompromisingly transactional. He repeatedly pressed NATO countries on the need to meet their 5% commitment. He explained that at the recent NATO Summit, Spain had committed to contributing forces to NATO that he felt would likely require them to spend 5%, but they had claimed in negotiations they could provide for less. Whitaker’s sentiment was that if they could, great. But the US would expect them to spend what it took to deliver what they had promised. He believed NATO nations would all meet the 5% commitment. Whitaker’s remarks contained his assurance that:

“NATO is not only strong now but that as NATO countries met their 5% commitment, it would be “…not only the strongest alliance in history to plan up, but really a dramatic force to be reckoned with.”

To achieve this would require an alliance that was less dependent on US security guarantees, hence 5% GDP spending, but that the decreasing dependence would make U.S. commitment and assurance easier to continue to grant. The consequences of not doing so were implicit.

Consistent with the US requirement that allies globally share the burden of their own defence more equally, he argued that 5% spending on GDP needed to be ‘the worldwide standard’ for US Allies.

Relatedly, Whitaker reaffirmed the strategic refocus of the US NSS: the US cannot remain the world’s rapid-response fire brigade. Allies in Europe and the Middle East should become “net security providers”, interoperable with US forces and capable of regional crisis management without automatic American intervention.

On the politically charged areas—free speech, Europe’s relative economic decline, and trade negotiations - Whitaker walked a narrow line. He pressed Europe to defend open expression, argued as the NSS does that Europe must not become an economic backwater. The US wants an economically dynamic Europe as ally, not just a place to which it is tied only by kinship bonds of family heritage, culture, sentiment, and for admiration of its history, food and wine. At the same time Whitaker declined to explain statements by Donald Trump on for example taking over Canada (which he was asked about directly) noting instead the President’s instinct for leverage in hard trade negotiations. Whitaker and the US wanted European nations whose relative power in the international system was increasing, not decreasing, consistent with its aims for greater burden sharing. Trump’s aims were reasonable, he implied, and the means needed to be seen for what they were - the most effective way to mutually beneficial ends.

On Ukraine, Whitaker framed the moment as the closest yet to a negotiated peace, mediated—formally or informally—by the United States. Peace “at any cost” is off the table; any settlement must be acceptable to Kyiv. But he underlined that shadow diplomacy is active, high-level, and continuous – and sometimes uncomfortable. Again, we need to be more concerned with the outcome than the process.

The broader vision was a NATO that is stronger because its constituent parts are stronger; a Europe that pays for and fields the force it needs to deter; and an America that leads, but no longer carries, the alliance. A shift from dependency to reciprocity. Blunt and uncompromising. Unequivocal in its commitment to Article 5 today, more equivocal on how sustainable that would be if other NATO nation’s don’t contribute to their own defence as the US expects. Conditional love.

The Frontier Moves Again

Keith Dear — Wed, 19 Nov 2025 00:29:59 GMT

Stacks of documents dissolving into flowing streams of symbols, then assembling into a single glowing decision-tree

Progress: AGI In Plain Sight

This blog was half-written when Gemini 3 was released – a summary of progress since my last AI report on the frontier in late August. I have had to refocus it.

It is clear Gemini 3 is well in front of its rivals, achieving stunning (and if you’ve been reading previous blogs – alarming) progress.

Many of the claims in the first 24hrs will overshoot, and be corrected. More will be made, and upheld in the coming days. But at first glance, it looks like another big jump forward towards AGI.

Sundar Pichai, Google CEO, says it is the ‘best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting“.

Here are few claims that are well established.

Gemini 3 has made a huge jump to scoring 31% on the Advanced Reasoning Corpus (ARC) AGI 2 benchmark. I remind you again that ARC AGI 1 was set up to show LLMs couldn’t reason, and that should models meet the benchmark, it would be an indication they were a general intelligence.

After big progress was made on ARC AGI 1 in late December last year, ARC AGI 2 was rolled out. This was when I wrote ‘Eyes Wide Shut: AGI in Plain Sight ‘ here on the blog. 11-months ago.

Here’s Gemini 3 on ARC AGI 1, at 87%...

And here it is on the new harder AGI test. Errr…

Gemini 3 is now scoring 31% on this test when given to the model alone. Let the model use tools and search the web, it gets 45%.

The team behind ARC AGI 1 and 2 are now launching ARC AGI 3, and while this will be genuinely testing intelligence in new ways, is useful, welcome and needed - if you can’t see the way we keep moving the goalposts I don’t know what more to say.

Remember Humanity’s Last Exam? Take a look at the left hand side of the Evals chart Google released on their blog:

Humanity’s Last Exam haven’t updated their leaderboard yet, but Grok 4 was previously winning with 25%. Gemini 3 is at 41%.

Concurrently, Google released Antigravity, which it says is “an agentic development platform, evolving the Integrated Development Environment into the agent-first era”. In other words, its an environment that can further automate software engineering using Agents.

It’s getting rave reviews for what it can do – I tested it last night and in one-shot, it produced an app we were developing with an external team. I haven’t validated or verified it yet, but it looks good and seems to work. The point is not that it will be better than what the team we are paying will produce – I don’t think it will be. But it will be close, it raises the bar, and it shows, I think, the direction of travel – AI eating software, where you prompt AI to build you the tools and apps you want and need, bespoke.

And if you’re reading this sat in a large organisation, here’s something else to pay attention to: evaluation site Box tested Gemini 3 vs ‘advanced reasoning tasks on enterprise documents, Gemini 3 Pro saw a 22% gain in performance on complex data analysis across a wide range of industries.’ Here’s the high-level results:

The progress is broken out in the industries where it was most notable. The authors at Box write:

In Healthcare & Life Sciences, Gemini 3 Pro achieved 94% accuracy, compared to just 45% for Gemini 2.5 Pro.
In Media & Entertainment, Gemini 3 Pro reached 92% accuracy, a massive increase from 47% for the previous model.
In Financial Services, Gemini 3 Pro saw 60% accuracy, up from 51% for Gemini 2.5 Pro.

The CEO of Box, Aaron Levi described the kind of thing they did:

“For this test, we ask the model a series of complex, real-world questions with a set of enterprise documents. The questions are meant to approximate what a person does in their daily work across various fields of knowledge work. This may include what an investor would do to analyze the financial health of a company, or what a consultant would do to build a report for a client on a complicated strategic topic.
Gemini 3 Pro represents a major leap in reasoning, math, logic, and analytical capabilities.
These gains show up as delivering highly useful improvements across financial services, law, healthcare, public sector, manufacturing, and more.”

Can you see where we are going yet?

I’m sticking to Gemini 3 here, but I don’t believe GPT-5 was evidence AI was ‘hitting a wall’ as many have suggested – its scores on most evals were good e.g. take a look at SimpleBench, which summaries multiple benchmark leaderboards, and you’ll find GPT-5 only exceeded on most benchmarks by Gemini 3 on its release yesterday. GPT-5 is always there or thereabout at the top, and on quite a few still on top even post Gemini 3 launch, with some remarkable scores. My day to day use of GPT-5’s advance reasoning and research constantly amazes me. I just think people have adjusted to how remarkable AI has become, and that most aren’t routinely using the latest models, so ChatGPT was unjustifiably felt to have not lived up to the hype. But that is probably a different blog post, and it hardly feels worth detaining us with this evening – remembering that at Gemini 3’s release we are just two and half months after GPT-5’s debut. Even if you believed GPT-5 suggested a slow down in LLM progress, presumably you’d accept Gemini 3 is evidence the field has some way to go yet.

Grok 4 and Claude Sonnet 3.5 have also achieved remarkable further progress since I last wrote a more general AI update in late August, and again it won’t be possible to do this full justice here.

One remarkable development that we must draw attention to - here’s Logan Graham, who leads the Red Team at Anthropic, reporting on Anthropic’s post, that describes their identifying a first AI-led espionage campaign, where 80-90% of the cyberattacks were undertaken by Claude’s Agentic AI, most likely being used by China. Logan, who is better placed than most to understand where the frontier is and how fast it is moving, wrote “My prediction from ~summer ‘25 was that we’d see this in ≤12 months. It took 3.” As we’ve noted before - we are still underestimating the rate and profundity of AI progress.

There have been scientific breakthroughs and much more since August, perhaps to be summarised in a future post, but here, having given sense of how fast we are observing the very visible process in LLMs, I want to look a little deeper, to what is less obvious - but important all the same.

Progress: Beneath the Surface

Just a few days ago, Deepmind released SIMA 2, which Demis Hassabis, normally known for being more conservative than many industry leaders described as:

“a general agent that can understand & reason about complex instructions and complete tasks in simulated game worlds, even ones it has never seen before. Incredible to see how it can learn just from self-play… a crucial step towards AGI.”

His cofounder Shane Legg was similarly exuberant celebrating how “3 years ago I started the SIMA project with the dream of using 3D games as worlds in which to train and test Gemini based game agents — all as a stepping stone towards real world AGI.”

SIMA 2 can be dropped into novel games and teach itself new skills, learn through trial-and-error, and get better the more it plays, without any human input.

What it is doing is reasoning but not in the abstract question and answer sense that Gemini usually does, but rather as an embodied avatar in a virtual world – it can describe what it ‘sees’, take instructions and reason its way in the pursuit of a goal. Not so much something you command as something you plot with, as if there were two of you on a mission figuring out how to succeed. It can also transfer learning from one environment, or game, to another.

Deepmind also applied SIMA 2 within Genie 3 – its model that let you generate new, persistent, real-time 3D worlds from a single image or text prompt. In these wholly novel environment SIMA 2 still succeeds – interacting with you as you explore and investigate.

As Deepmind describe, that it can learn in all these environments without human input ‘paves the way for a future where agents can learn and grow with minimal human intervention, becoming open-ended learners in embodied AI’.

In other words, it takes us towards a world in which AI in robotics can learn without human input, solve long time horizon tasks, learn new skills and apply them across domains.

While there are still limitations in this new paradigm – very long time horizon reasoning and some particularly complex tasks remain beyond it – the capabilities are remarkable. That they have elicited barely a mention in the press is indicative of a continued failure to appreciate the rate and direction of progress, and the profundity of its implications.

Oh – did I mention Genie 3? This is another model released since I last wrote about AI progress. Genie 3 can be summarised as providing limitless interactive environments from text prompts. Maybe read that again.

Google also released Earth AI with a Geospatial Reasoning Agent. They show how this can be used to direct aid to where it is most needed in the event of a natural disaster – predicting the disaster’s effects and assigning resources accordingly. But it shows how similar models – Digital Twins + Reasoning Agents could equally be used in the military domain to understand the likely effects of an attack, how Governments could use such models for resilience planning, how businesses will likely be tracking supply chain security, how all sorts of organisations can and will predict the effects of events or interventions of many kinds, in many domains.

A new paper this week builds on Deepmind’s release ~~last year~~ in May 2025 (keeping the error visible – I think it shows how AI progress is so rapid it can distort your sense of time!) of AlphaEvolve, which constantly evolves new algorithms (variation and selection) and had previously helped improve ‘Google’s data centers, chip design and AI training processes — including training the large language models underlying AlphaEvolve itself’. The new paper shows how the approach of ‘breeding’ algorithms can succeed in solving maths problems, autonomously discovering novel mathematical constructions and improving best-known bounds across dozens of open problems, not just optimise algorithms - showing this across 67 problems in analysis, combinatorics, geometry and number theory.

This approach of continuously ‘evolving’ new algorithms vs particular problems is another one of several that suggest possible paths to AGI beyond the current, dominant, LLM architectures.

SIMA 2, Genie 3 and AlphaEvolve probably won’t make much difference to what you are doing at work tomorrow or in the next few months – whereas Gemini 3 might. But their significance lies in what they augur.

AI luminary (previously Director of AI at Tesla) Andrej Karpathy wrote back in August that we are entering ‘the age of environments’ – so where access to text for training AI was once the key requirement, increasingly, as we build AI optimised for the real world, access to simulated environments will matter more. Genie 3’s limitless interactive environments from text prompts – is an example of what Karpathy was talking about. SIMA-2’s performance within it, shows how quickly progress is being made. AlphaEvolve shows how many tasks will be solved not by static models, but by continuously evolving intelligent solutions. All suggest that even if LLMs were ‘hitting a wall’, which I don’t believe is the case yet, there are many approaches to smash through it and keep those straight lines on graphs heading upward and to the right.

There are plenty of other remarkable developments in the past few months. To summarise just a few:

· Deepseek’s Optical Character Recognition model, compresses text into a visual format before decoding it, cutting token usage by seven- to twenty-fold while preserving accuracy - showing how vision-based encoding could ease today’s long-context bottlenecks, take in a lot more text, and produce a lot more text, more accurately.
· Meta’s AI/VR Glasses, which some claim might replace the need for smartphones and smartwatches, and open the way to generative user interfaces i.e. creating the dashboards and tools you need in the air in front of you.
· For self-driving sceptics, reading Andrej Karpathy’s recent post should be a corrective, showing just how far Tesla has come and how few (technological) barriers remain to the widespread deployment of self-driving systems – and it is worth noting how these pave the way for much more effective robotic systems in general, not just self-driving cars.
· Emerging evidence that video models may be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding – that video models, not trained to reason, are showing the capacity to do so. Here’s OpenAI’s Bill Peeble, Head of Sora, OpenAI’s video models, claiming video models are on the critical path to AGI.

But the point is not to over-emphasise these developments, but to show how many parallel paths to AGI are being pursued, even as the main path continues to take us ever closer to that milestone, and as we have often discussed: closer to the most disruptive revolution in human history.

Visit us www.cassi-ai.com

95% of AI Investments Generate Zero ROI

Keith Dear — Fri, 07 Nov 2025 13:20:45 GMT

Artificial intelligence, gone bankrupt, eyeing empty wallet sadly. Via MidJourney

In the past week, two reports on AI adoption seem to come to strikingly different conclusions. The first, which received a lot of coverage in my corner of X/Twitter at least (relentlessly bullish on AI), was the Wharton Business School 2025 report Accountable Acceleration: GenAI Fast-Tracks into the Enterprise. The second, which seems to me to have had more coverage on LinkedIn (generally more bearish on AI), was an MIT NANDA^[1] paper The GenAI Divide: State of AI in Business 2025.

The headlines are contradictory. Where Wharton report 74% of investments in AI deliver positive return on investment (ROI) already, MIT report 95% of organisations are getting zero return on their AI investments.

Because the Wharton report says things that are mostly in our interest at Cassi to promote, I offer only the shortest of summaries. I was more interested in what we can learn from the more negative report, hopefully ensuring we (and you, now I have decided to make what was to be an internal memo a blog post) avoid mistakes others have made.

Wharton’s Bullish report - lots of charts and data, well worth a read - link in intro paragraph.

Wharton’s report tells us that:

“Accountability is now the lens” - structured, ROI-linked metrics e.g. profitability, throughput, workforce productivity - measurable outcomes, not just adoption.
“Long-term optimism in Gen AI is increasingly strong” 88% expect increased spend on AI in next 12-months, +16pp YoY; growing trend of cutting to fund: 11% (+7pp YoY) reallocating funds, mostly from legacy IT + HR and Workforce Programs.
74% see positive RoI already from AI investment; 80% expect it within 2-3 years; larger enterprises are less successful despite bigger budgets: 34% report ’too early/neutral outcomes); Digital enterprises most successful, 88% positive; banking and professional services next 83%. Efficiency and productivity the top benefit, then quality, creativity and security.
Human factors key. Deliberate change management needed to produce RoI; overcoming bottlenecks - human capital, talent, training - essential to success. [I’d add incentives but the report doesn’t say this!]

The report also calls for Chief AI Officers, which is something I would want to reflect on further, and perhaps see more evidence of, before endorsing – I worry appointments like this can be performative, and also that it creates new internal rivalries that might not be conducive to adoption e.g. the Chief AI Officer wants to own and drive AI transformation for their own career success, but their rivals around the board table for bonuses and promotion have an incentive to either (a) own their own AI initiatives and outperform the CAIO or (b) thwart or limit the success of the CAIO’s initiatives – especially if they compete for resources from within their domain. Still, Wharton says the evidence/testimony of many Execs shows they are a good idea.

If you only read the Wharton report, you might conclude ‘more AI spending = higher ROI’ this would be wrong, as both the MIT and Wharton report conclude.

Both agree that accountability, assessment vs operational objectives, and measurement vs operational outcomes, are the key to success. Funding more pilots, or scoring ‘adoption’ alone are unlikely to generate ROI.

But the MIT report gets into the reasons for failure more deeply.

MIT report, also published this week and worth your time - link in opening paragraph.

The headline finding in MIT’s report is probably the most important thing for those funding, or potentially funding, AI investments to know. 95% of those investments fail, as assessed by MIT across ‘300 public implementations’, on which MIT conducted interviews, surveys and their own analysis.

But those same decision-makers should be more curious – the headline could mislead, failing to read deeper means ceding a chance to learn from others’ mistakes. The report also finds that “5% are extracting millions in value” and offers a diagnosis of what works and why. If 95% of organisations use this report to stop or reduce AI investment, as their AI sceptics are emboldened, they are increasing the risk their business is eaten or defeated by the 5% that get their investment right.

There are other bearish indicators in the report, for example, so far:

Tech & Media are the only two of nine industries seeing significant disruption.
No sign of the needed deep structural shifts associated with past general purpose technologies e.g. new market leaders, disrupted business models, or measurable changes in customer behaviour. Substitution, pilots, experimentation - not transformation - rule

Why have 95% or AI Investments failed to Generate ROI?

The report says only 5% of custom enterprise AI tools reach production. We suggest the Valley of Death, and innovation theatre are likely major causes – more so than product or pilot failure.

There is a lot of incentive for organisations, and the individuals within them, to experiment with AI. It is career enhancing to show yourself to be forward thinking, and have some stories for your boss, board and customers. But the unpopularity of disruption, which always creates winners and losers is likely a major factor in killing off so many AI investments at the pilot stage. In contrast, showing the courage to ‘fail fast’ and kill of your own initiatives, or even more so those of others, can be career enhancing. The incentives in many enterprises are pretty terrible. As the report says, the fact that only 5% of custom enterprise tools make it from pilot to production ‘explains why most organisations remain on the wrong side of the divide.’

Perhaps supporting our analysis, the report notes that larger enterprises launch more pilots but have the lowest pilot to production ratios. Larger organisations are harder to align incentives within, have more competing fiefdoms, and often require much larger and more disruptive change management programmes since there are so many more interconnected and interdependent workflows.

Another reason for limited ROI is that 40% of the 5% of GenAI tools that make it from pilot to production are chatbots, and the inability of chatbots to learn means they add little value, and cause limited disruption. Like the pilots themselves, they sit on the periphery of staff workflows.

Pilots fail for more predictable reasons - the unwillingness of users to adopt new tools, and concerns around trusting model output.

Enterprise CIOs should also consider the more neutral findings that to date, GenAI adoption has primarily enhanced individual productivity, not P&L performance. Why might that be?

Secret Cyborgs. One reason for this is likely that employees are not passing on their gains in productivity to the business, or at least not in measurable ways. We suggest that in most organisations, there is no shortage of work, or at least ‘make-work’ in the form of meetings and coordination costs and emails to reply to - many, perhaps most, are overwhelmed, not under-employed. Employees are likely reinvesting time saved in things they might not otherwise have got to – but of course often that just generates more internal work for everyone else.

Furthermore, there is a significant shadow AI economy – comprised of those Ethan Mollick at Wharton called ‘secret cyborgs’ – that is thriving: 90% of employees use GenAI for work - many using external models. We suggest a primary reason for this is model-lag: by the time enterprises adopt a model, it is usually out of date, and far better models are available via personal subscription to ChatGPT, Gemini, Claude. Second, Co-Pilot, in particular, is just not as good as external tools (in my view), yet seems to be what most are using at work and basing their opinions of GenAI on. Supporting this, the report notes a generalised distrust of internal tools even from avid GenAI users.

MIT note that forward thinking organisations turn Secret Cyborgs into advantage - they don’t punish but learn from shadow usage, surveying, understanding and adapting by procuring enterprise alternatives.

Speed & Crossing the Valley of Death. Success in AI investments is more frequent in mid-market companies, which generally invest less, and have fewer pilots, but are much faster, running at an average of 90-days from pilot to production, and take a greater proportion of their pilots through to production

In contrast enterprises take 9-months or greater. Nine months vs current rates of AI progress is far too slow, unless the tool is being continuously updated and upgraded as new models drop. If staff find your tools inadequate and have to hide what they are doing, it is unlikely you’d see gains on your P&L from their use of AI – their incentive is to hide the gains, along with the use.

What can we learn that is more positive from the MIT report?

Reassurance?

AI is not (yet) coming for your job. Notwithstanding all we have written on this blog on the likely impact of AGI on employment, MIT find that for now, AI adoption has not led to workforce reduction. Displacement, per a wider MIT analysis cited in the report, is likely, but will be graduated through discrete displacement vice huge lay-offs. If we get to AGI on current timelines, I don’t think this is correct, but still, for now, we can all take some reassurance from the data.

How Buyers that Achieve ROI Succeed

Strategic Partnerships with External Vendors Double the Chance of Success.

“External partnerships see twice the success rate of internal builds” Employee usage rates are also twice as high for externally built tools vs those built internally. This success rate is particularly notable since internal build dominates within the sample case studies, but is rarely successful unless in partnership.

Buyers who succeed demand process-specific customisation and evaluate tools on business outcomes rather than software benchmarks. They expect systems that integrate with existing processes and improve over time.

The most effective AI-buying organisations don’t wait for perfect use cases or central approval. They drive adoption through:

1. Distributed Experimentation.

2. Vendor Partnerships / Strategic Partnerships (x2 likelihood of success) – “procure external tools, co-develop with vendors”.

3. Benchmarked tools on operational outcomes.

4. Partnered through early stage failures.

5. Sourced initiatives from frontline managers (vice top-down or central lab generated) – but paired bottom up sourcing with senior exec accountability. This likely accelerates adoption through operational fit and career incentives for adoption.

6. Decentralised authority and clear ownership.

7. Clear Accountability.

8. Harnessing power users to the cause.

9. Likewise individuals and managers on the customer side were critical to success.

Highest ROI for now comes from substitution - from eliminating previous business process outsourcing, notably in document processing costs, and ~30% reducing external agency use and consulting spend. ROI was highest in back-office functions. Financial services benefited cutting ~$1M on previous out-sourced risk management. Successful buyers also saw improved customer retention and sales conversion, through things like automated outreach and follow-up. That ROI did not come from AI transformation, is perhaps because very few organisations seem to be undertaking it yet.

How winning vendors behave

Trust in the vendor is the most important factor for executives awarding contracts.

Vendor trust isn’t always good news for start-ups like ours – many enterprises prefer to wait for existing vendors to offer the new tools than take a risk on evaluating ‘emerging’ vendors. Plus, the report notes, existing Business Process Outsourcing suppliers have the advantage of already understanding customer workflows. Consequently:

“Product quality alone is rarely sufficient. Referrals, prior relationships, and VC introductions remain stronger predictors of enterprise adoption than functionality or feature set.”

Vendor trust is the key priority, but deep understanding of customer workflows is the second highest, requiring domain expertise and a willingness to learn.

Relatedly, a third key priority is that AI products succeed when they integrate with minimal disruption to current tools following (people don’t want to learn a new UI/tool, no matter how good). Vendors succeed because they customise deeply and embed themselves in workflows, adapt to context, and scale from narrow but high-value footholds.

Tools that succeed therefore have low configuration burden and immediate, visible value. Tools requiring extensive input tended to stall at the pilot phase. Tools with complex internal logic, opaque decision support, or that are ‘optimised based on proprietary heuristics’ tended to fail.

Agentic AI adoption is very nascent. Too early to assess. The report argues this could lead to a more fundamental shift and much higher ROI. But also, MIT NANDA is a community for those building AI Agents, so while I think they are partly right (there will be much higher ROI for GenAI too, Agentic implementation a complement, and the revenues from each as inextricable as the two technologies), we should allow for some bias in their judgement.

So What?

Both reports are right in their own way. MIT’s exclusively looks back whereas Wharton’s is more forward looking (asking ‘what do you expect?’, as well as ‘what has happened?’). They agree on what is needed for investments to succeed. They agree that ROI is nascent, and organisations are moving from pilot to production, from adoption metrics to proving value vs operational outcomes.

In the end, they measure the same things in different ways. Wharton captures self-report data, likely sweeping up gains the MIT report misses, and allowing for promising minimally ROI-positive implementations that are ‘felt’ to have delivered internal ROI that doesn’t necessarily show-up on the top or bottom line, and/or that are expected to grow, or are growing, to be weighted for future impact. MIT captures verified transformation and financial impact and is the more rigorous. Both are right within their frames—both track experience, one weights expectations more heavily, the other current execution.

Both offer caution - and useful lessons.

Neither really address the medium to longer term on AI that this blog usually focuses on - the forecasts on the arrival of AGI/ASI being most likely, dependent on the preferred definition, 2026-2033. But for leaders betting on AI, they are useful guides as to how to get your organisation AGI ready - how and where to start your transformation.

If your intent is to optimise for success applying the latest in decision science to optimally allocate resources, minimise risk and maximise opportunities, you need us. Find us at www.cassi-ai.com

^[1] NANDA: Networked AI Agents in Decentralized Architecture – an MIT project, a mix of technical protocols and community of interest https://www.linkedin.com/pulse/nanda-internet-ai-agents-ramesh-raskar-211ve/ seeking to “…explor[e] how artificial intelligence can evolve into a truly democratic and distributed ecosystem.” By “…address[ing] critical architectural components such as privacy, incentives, orchestration, or user accessibility… …designing a framework where billions of agents can collaborate autonomously while preserving privacy, scalability and innovation at the edges.” It is perhaps unsurprising the report concludes, in part, that the real ROI will come from the deployment of Agents – an assertion I think is partly true, probably sincerely held, but clearly a conclusion NANDA was always likely to arrive at.

Never tell me the Odds: Is your CEO Han Solo?

Keith Dear — Tue, 07 Oct 2025 05:31:00 GMT

Does heroism require the denial of reality? Can you inspire, while being honest about the low odds of success? Can you succeed, when the odds are against you?

‘The Empire Strikes Back’. Han Solo piloting the Millennium Falcon. Under attack and outgunned by the Empire’s TIE-fighters, Han is warned by Princess Leia of a hazard ahead:

‘Asteroids’ she shouts, alarmed.

To the horror of his companions Han Solo spots an opportunity to lose those attacking him…

But Sir, C-3PO opines, the possibility of successfully navigating an Asteroid field is approximately 3720-to-1.

Han Solo, eyes fixed determinedly ahead, growls Never tell me the odds.

Why? One view, is that knowing the odds would undermine his confidence making it less likely he’d succeed. It would be self-fulfilling.

One CEO recently worried that deploying Cassi, and seeing the odds of success, might “crystalise the doubts of his management team”, reducing his company’s chance of success.^[i] It is not the first time we’ve heard the Han Solo objection, but it was put with impressive clarity and inspired us to explore the issue for a wider audience.

Defining the “Han Solo” Objection

In high-commitment ventures from start-ups to sports-fields, statecraft to Star Wars ‘never tell me the odds’ is an attractive one-liner, and superficially, seems to capture something important.

In analytical form the claim made in the the Han Solo objection can be described as:

The probability of success, viewed collectively - from my team, my company (etc), may be low.
If I confront that probability, my confidence, and/or my team’s, will fall.
Therefore I am better off not knowing the odds, because ignorance preserves the confidence required to achieve my desired outcome.

The romance is obvious - the triumph of human courage over maths, people overcoming the odds - and that has its own appeal. But the logic does not hold. The quote misreads the scene, and the objection does not stand scrutiny. Three failures follow.

First, information-avoidance can sometimes temporarily preserve motivation, but unless you are about to fly into an asteroid field - and perhaps, as we will see Han shows - even then - it is a leadership failure. Han Solo doesn’t have time to fully convince his comrades their odds are wrong, but you do.

Second, confidence bolsters a sound plan; it does not substitute for one. The argument is clearest at the extreme: even if you are galvanising your people towards a heroic sacrificial defeat that achieves nothing - you are still convincing them that their least-bad outcome at that point is to die gloriously and be remembered well. You are not deceiving them by hiding the odds.

Third and finally, sometimes the Han Solo CEO is right, and the crew are wrong. He has access to information they don’t, and should act on it, convincing them as they go.

1. Information Avoidance is Leadership Failure

A leader’s job in motivating others is surely to convince their people that their plan has a sufficient chance of succeeding to be worth pursuing. They should usually be able to explain the strategy, tactic or decision sufficiently well to convince their team it has sufficient odds of succeeding to be worth it.

Han Solo communicates his calculation of the odds through his confidence - the Asteroid field is a better bet than continuing to fight out-numbered and out-gunned. He says as much:

Leia: What are you doing? You’re not actually going into an Asteroid field?

Han: ‘they’d have to be crazy to follow me in there’.

Sure, the odds of survival aren’t great in the Asteroid field, but Han Solo knows the base rate C-3PO quotes “3720 to 1” doesn’t account for Han’s skill and experience, nor the worse odds of fighting in open Space - this is the best decision with highest expected value under the circumstances. His companions are alarmed, they check his reasoning ‘if you are doing this to impress me…’ but they don’t dissent.

Your job as CEO isn’t to suppress the odds, but to convince people your calculation of them is right, with reasons.

Furthermore, you must remember that odds are not a constant; they change as the facts change, they respond to action. Each step that removes a risk, reveals information or compounds an edge, changes the number.

In a race for survival through an asteroid field, every dodge of a space rock that takes Han and friends closer to the far side shortens the exposure window; conditional on survival, the forecast can legitimately improve. A 20% forecast of survival on entering, and an 80% forecast later are consistent. Ideally, every decision would be taken knowing which most increased the odds of survival.

The point generalises: when we founded Cassi, the base rate - 80-90% of start-ups fail - was C-3PO sounding the alarm. Our odds of success started from 10-20%. Accounting for what we know of ourselves, what we wanted to do, and the need for it, we raised those, with our median Founder forecasts of the company surviving for 5-years hovering around the 50% mark (vs more ambitious goals, like achieving £15M revenue in FY26, where odds were lower). As time has gone on - winning customers, gaining traction, deploying the platform, those odds have consistently risen.

2. Confidence is no Substitute for a Sound Plan

In most fields, where leaders have a different assessment to their team they can communicate it with more than just their own confidence. They have time. They can explain with reasons, evidence, their own judgement of the odds.

If this doesn’t work, the CEO might change their mind (or not - see next section). After all, those people around the Board table are there because their expertise and judgement in particular domains is judged to be of sufficient calibre and calibration to make their advice valuable to the CEO, who can’t know everything. This is particularly true where they forecast that some particular factor in their domain is highly influential on the strategic outcome.

Moreover, since we know that most human forecasts are no more accurate than chance – we’re usually better off deferring to the median crowd forecast, doings so over trusting our own gut, or relying on a single forecast from another - even if they are an expert in the area.

But you would still want to understand that expert view. For example you might weight a forecast due to expertise where:

the HR Director and the Board are predicting that the probability of their being able to hire thirty world-leading ML Researchers by end FY was 10%; and
hiring thirty world-leading ML researchers is the difference between a 60% probability of the company achieving its strategic objective; and
failing in recruitment reduces the chance of achieving that strategic objective to below 50%…

…so strategic failure is then shown to be more likely than not. At the very least the CEO should be interested in:

(a) why the probability of hiring the talent needed is so low,

(b) why the collective view is that this is so influential on the desired strategic objective and

(c) what might be done to alter that forecast - how could resources be differently assigned, or what actions taken, that would optimise for success?

The result of this should be either the CEO updates their forecasts, the HR Director and Board update theirs, or the gap remains. In either case - each should try to persuade the other of their judgement, to figure out what would change the view of one or other or both, and focus effort and resources on those factors that most move their odds. Hiding the odds is no substitute for optimising them, and likely to have the opposite effect than the one intended.

3. Sometimes the Han Solo CEO is Right

The CEO might see the odds from the collective intelligence of their team - the Board, the company - and disagree. If this is the case, they must have the courage to maintain their beliefs independently. The disagreement of others is not per se evidence to change their own beliefs.

In fact, a CEO or equivalent should, at key moments, *expect* to disagree with the crowd. Whether we think of that in terms of principled forecasting or informal disagreement around the table or during an online meeting, we should expect that the CEO will regularly disagree with the crowd, and be correct to do so.

If this was not the case, then organisations would not need CEOs, and decisions and courses of action would tend to be obvious.

Choosing the moments for this disagreement is in itself a key component of CEO prediction. Every major decision contains two forecasts: one about the external environment, and one about the organisation’s ability to shape that environment. The CEO’s must judge when their internal forecast justifies overriding the external and internal consensus – and the best way to do this, is to have a clear knowledge of their forecasting performance – do they consistently outperform others? Is there a reason this time to believe they might be right when others think they are wrong?

A slightly imperfect analogy is in investment. It is recognised that most people should just invest in tracker funds. Tracker funds are essentially the crowd wisdom of the investment community. But a CEO should be the equivalent of a trader who regularly beats a tracker forward by taking actions of whatever sort, equivalent to investments, that beat the tracker.

The analogy is imperfect only in mechanics. In both cases, someone is making probabilistic bets against the collective forecast. In finance, the ‘bet’ is a literal trade; in strategy, it’s an organisational action. A tracker fund is the best collective intelligence forecast; fund managers who beat that are super-forecasters beating the crowd average. CEOs, like super-forecasters, should also know when to go with the collective forecast, and when to not forecast at all (because they have no relevant insight or information, and they would just be praying for luck).

CEOs are selected to be capable and well-informed individuals, operating in areas they have much relevant experience. They have the ability to make forecasts, and to shape the world to meet those forecasts in certain areas. When a CEO chooses a path, they are predicting both the external and internal environment and their own ability to shape it.

They can and should beat the crowd: that is their arbitrage against the competition, and the rest of the world.

We must be clear here: we are not saying “trust the hero; to hell with the odds”. In Star Wars, we can see enough in the scene to suggest Han Solo was taking a calibrated risk, broadly aware of the baseline. But his throwaway line “Don’t tell me the odds” is dangerous. Just taking a wild punt, and getting lucky is likely to lead to catastrophe in time. A CEO should employ calibrated contrarianism – and that calibration means they *must* know the odds – they need to know how often they are right, and how right they are (e.g. do things they forecast at 80% confidence happen 80% of the time). They need to know the odds their team is putting on things, they should seek out the best possible odds. The contrarianism should be legible – Peter Theil’s question makes the point:

“What important truth do very few people agree with you on?”

Cassi would make you go further: why do you believe in this truth? What would change your mind? How likely do you think it is you will be proved right?

Over time, Cassi would help you improve your judgement, and enable you to learn how much you should trust your ‘instincts’ over the crowd. For the vast majority of humans – including CEOs, the median of the crowd forecast – or the median of a group of superforecasters - will give the best odds. At Cassi, we are showing that AI forecasting can often match or exceed the crowd – the CEO would be wise to test their contrarianism and refine it. Wild bets are not leadership, “don’t tell me the odds” is sexy but negligent.

Governance of the CEO – the Board, the market - will find the collective nature of the crowd forecast very helpful in calibrating the perceived riskiness of the CEO’s strategy, checking that heroic has not devolved into delusional, and, not least, the degree of reward the CEO should get for beating the collective baseline. Using an internal forecasting platform or prediction market allows the direct comparison of the CEO with crowd consensus over time.

We think organisations will go further - hiring and promoting partly on a proper scoring system that identifies:

who is the most effective in identifying the most important factors in achieving success;
who is the most accurate and well calibrated in predicting the odds of achieving them;

A good CEO might be such a person, consistently outperforming the ‘crowd’ that constitutes their rivals for the job.

Doing that takes judgement:

· knowing the odds of your achieving your desired outcome,
· knowing which factors will most influence your probability of achieving your outcome
· understanding where your odds, and the factors that you think matter, differ from those of your team, the unbiased external view, the analysis of AI and its forecasts.
· Knowing when and how much (if) to trust your judgement over that of your team, your crowd, your AI.

Your CEO should copy the wider lesson in the Han Solo scene, and ignore the misleading one-liner: you must know the odds. With Cassi, you know both the number, and what you most need to do to succeed. You can optimise your odds. At that point, as Han later said “You’re all clear, kid. Now let’s blow this thing and go home!”

Failing that, head over to our website, www.cassi-ai.com - currently in the early stages of a makeover - and learn a little more about what we do.

^[i] Conversely, at a recent Conference, “Katie” made the argument that surfacing the odds, and the factors that most influence them, might consistently improve the odds. This is not just Cassi’s argument for more efficient management, but rather one suggesting that the concentrated shared consciousness such a method creates allows people to collective visualise an outcome – and perhaps this ‘mind share’ will move the odds – ‘manifesting’ the outcome as it were - beyond just what would be expected from more efficient resource allocation. We won’t explore this further – but I did say I would mention it on the blog as a counter-point to the view that surfacing the odds and factors would crystallise doubts!

Prediction Centric Warfare

Keith Dear — Wed, 10 Sep 2025 07:39:24 GMT

Context

Yesterday, at DSEI, I spoke on the Cyber and Specialist Operations Command Stage, alongside former colleague from Fujitsu and new Head of Fujitsu UK Defence and National Security Business Group Alexandra Bailey (congrats Alexandra!), and the MOD’s Chief Data and Artificial Intelligence Officer (CDAIO), Caroline Bellamy.

My comments on ‘prediction centric warfare’ opened the panel. Below are my remarks and slides, edited lightly for publication. They are my remarks alone, and I do not mean to imply any endorsement from Caroline or Alexandra.

Prediction Centric Warfare

Information Technology transformation is decision transformation. We adopt information technologies in order to collect, process, protect, store and move information to enable better decisions to be made.

Digital Transformation is the means by which such information is moved around.

Digital Transformation then, is decision transformation. It should be assessed on outcomes – clear metrics as to whether it has, in fact, enabled better decisions – those that are more often right, or less wrong, and faster, than before any given technology is deployed.

At Fujitsu, where I was Managing Director. of the Centre for Cognitive Technologies, we focused on revolutionising decision-making because decisions are where cognition manifests.

And decisions, as I argued on the Cassi blog last week, are predictions.^[1]

Indeed everything is.

A decision is a conditional forecast: if we do X, we expect Y.

An action is a bet on a payoff. I do X, expecting Y.

Tactics, operational design and strategy are predictions at different scales and time horizons. If we do these things, we expect this to be the outcome.^[2]

When prediction quality improves, outcomes improve. When it degrades, we pay in blood, treasure and time.

We live in the exponential age – where technological change is continuously accelerating – creating profound disruption.

The UK’s Strategic Defence Review tells us we live in age of uncertainty.

Uncertainty and change can only be bounded and managed with prediction

The science of prediction – human and algorithmic, has never been more important.

Here we will discuss seven things:

1. How Prediction wins Wars.

2. The necessity of prediction to victory, despite its difficulty.

3. How Operating Concepts are themselves a prediction, and why this matters.

4. How ‘prediction centric warfare’ should be the UK’s New Operating Concept and Theory of Winning, enabled by the Digital Targeting Web.

5. A brief overview of the science of prediction – human and algorithmic.

6. The Strategic Defence Review’s predictions for the Digital Targeting Web,

7. How Programmatic Prediction would significantly increase the chances of the Digital Targeting Web succeeding in achieving the SDRs aims.

How Prediction wins Wars.

If Clausewitz is right and war is politics by other means, then violence is a tool of persuasion and coercion. Its only purpose Schelling tells us, outside sport or revenge, is to influence behaviour: to coerce choices.

Everything we do in conflict therefore aims to shape an adversary’s decisions.

Since decisions are predictions, the task is to predict what the adversary will decide in response to our actions.^[3]

We aim to influence an adversary’s forecast, so that the path we prefer dominates their expected value calculation.

Once you accept that, the centre of gravity is cognitive. Fires, manoeuvre, sanctions and narratives are instruments; the target is the decision process of people.

Since decisions are predictions, the task is to predict what they will predict in response to what we do—and to set the conditions so that our preferred option is the least‑bad option they can see.

The nuclear extremity makes the point.

The US dropped nuclear bombs on Hiroshima and Nagasaki predicting they would make peace more likely, and so a devastating and costly ground invasion less likely.

The decisive effect was not the blast pattern but the expectation it created in Japan’s senior leadership—the Imperial Conference, and ultimately the Emperor.

The prediction they were forced to make…

more bombs could follow;

American resolve was unshakeable;

the Soviet entry had closed the last exits;
…collapsed the case for “fight on.”

Our preferred option became their least bad outcome.

The logic generalises. Whether we are driving soldiers off a hill, coaxing insurgents to a ceasefire or coercing governments towards an unconditional surrender, we are in the business of changing behaviour by changing expectations. It’s just as true in grey zone conflict and deterrence.

The necessity of prediction to victory, despite its difficulty

We fight over forecasts more than over map squares.

That is uncomfortable, because modelling human decision‑making under fire is damnably hard.

But discomfort does not free us from the obligation.

We cannot not predict.

We either model and influence those decisions - or we leave outcomes to inadequate heuristics.

How Operating Concepts are themselves a prediction, and why this matters.

The exponential pace of technological change is such that we can no longer afford to follow technological trends.

Indeed the exponential of Moore’s Law means this has been true since the 1970s at least, and as I have argued elsewhere, perhaps it has been true since the First World War.

Prior to this, for 1700 years of human history – perhaps for its entirety – weapons came first, and then we figured out the tactics. We predicted that a tool, or an invention might have a military use, and then worked out the most effective tactics by predicting, getting it wrong, and predicting again, until we got there.

Certainly by the 1970s this was no longer a viable way to proceed. The risk was that the technology you were iterating towards tactical usefulness would be obsolete and outdated before you figured out how to use it. The opportunity was that now, technological and scientific progress were rapid enough that you could predict what technologies you needed to win and set about building them.

In the Cold War the United States applied this logic with DARPA’s ASSAULT BREAKER, which became Air-Land Battle doctrine: define the theory of victory first—how to break the Soviet second echelons—and then assemble the sensors, networks and shooters to make it real. The strategy came first; the technology followed.

That is the spirit of the Strategic Defence Review’s Digital Targeting Web. Call it the ‘kill web’ if you prefer the extant literature; the point is not IT modernisation but a single, integrated force design built backwards from how we win.

In the literature, a kill-web is a theory of victory – the central component in DARPA’s Mosaic Warfare concept. Where, as its Director said “The idea will be to send so many weapon and sensor platforms at the enemy that its forces are overwhelmed. The goal is to take complexity and to turn that into an asymmetric advantage.” Presented here in 2018. Operational Concepts are prediction too – of what tactics and operational approach will win, of what technologies we will need.

How ‘prediction centric warfare’ should be the UK’s New Operating Concept and Theory of Winning, enabled by the Digital Targeting Web.

I argue the Digital Targeting Web must be the central component in our theory of victory.

It must be more than just a digital network connecting every sensor to every shooter.

More than a platform to coordinate across domains.

More than a modern extension of Boyd’s OODA loop and Brose’s Kill Chain.

The aim is not just faster kills or more options. These are the means, not the ends.

It aim is to predict faster and more accurately which combination of weapons and tactics will make our preferred option our adversary’s least bad action or outcome.

The Digital Targeting Web must enable Prediction Centric Warfare.

A brief overview of the science of prediction – human and algorithmic.

If prediction sits at the heart of victory, we must use the best methods we have.

Most humans are unreliable forecasters. Philip Tetlock’s work showed that expert judgement in geopolitics and economics often performs no better than chance – the median forecast of a crowd is more reliable. Some, whom he calls superforecasters, consistently do better through disciplined probabilistic methods.

Today, AI systems can match and in places exceed that standard. At Cassi we’ve shown materially higher accuracy than human baselines on meaningful questions from ‘Will Russia control Pokrovsk in X months’ time?’ to ‘Will the UK have to conduct a non-combatant evacuation operation in Somalia before end FY, and if they do, will it succeed?’, and—more importantly—shown that AI can identify the variables that move outcomes.

The practical lesson is straightforward. Define the outcome in measurable - we would say resolvable (how would you know you’d succeeded?) - terms; set resolution criteria; then work back to the decisions and data you need.

Build the web to do two things: predict what the adversary will do next, and predict what we must do to shift that next move in our favour—at tactical, operational and strategic levels.

Factorise, and Forecast should be the new discipline – both human and algorithmic.

Base rates beat bravado. Score your people, and compare their performance to AI. No estimate nor planned action from the Kill Web goes to the commander without a confidence-defined forecast^[4], no decision is delegated to an algorithm until it demonstrably out-performs your people.

Ask yourself: when did you last see a plan fail because people were too explicit about their assumptions and updated them too quickly?

Quite.

Plans fail because we predict poorly, hide our confidence levels, and refuse to revise in time. We must predict, publish, and adapt.

The Strategic Defence Review’s predictions for the Digital Targeting Web

The Strategic Defence Review tells us we are living in an age of uncertainty. Surely then, it is now more important than ever that we bring the science of prediction, to bound and manage that uncertainty as best we are able?

On the Kill Web, the SDR says that “the targeting web epitomises how the Integrated Force must fight and adapt. Its very existence contributes to deterrence.”

We are betting that the DTW:

1. Increases our probability of winning future wars

2. Its existence deters by influencing the adversary’s forecast of the future.

Here we see how prediction centric warfare begins at the Strategic level, and how central is could be to our success at the Programmatic level.

We should test both these predictions explicitly and probabilistically, not assume them. We should factorise them, so that we can assign resources to those factors that will most increase our chances of success.

We should factorise and forecast at the Programme level too. Programmatic Predictions, if you will – ensuring we are optimising always and prioritising those actions that see us build the Kill Web as effectively and efficiently as possible. Probabilistically optimised.

To close I offer a reminder, a suggestion and a participatory starting point:

A reminder - everything is prediction—decisions, operating concepts tactics, operations, strategy.

A suggestion The Digital Targeting Web is a bet that better, faster, smarter decisions will both deter and, if needed, win. It should be the heart of a new UK Operating Concept, a Theory of Winning, Prediction Centric Warfare.

A starting point – judge it with outcomes that are resolvable – defining how you would know you had succeeded. Make a start here yourself, in the comments, on the proposals I offer, what probability would you assign to each of the following questions? What factors would most move your prediction one way or the other?

Programmatic Prediction would significantly increase the chances of the Digital Targeting Web succeeding in achieving the SDRs aims.

To optimally assign resources, we must factorise and forecast. Prediction-Centric Warfare is the key to our future success.

^[1] Oxford English Dictionary. predict …2 verb trans. announce as an event that will happen in the future; say that (a thing) will happen; foretell. b Of a theory, observation etc.: have as a deducible consequence; imply. [underline emphasis added – here, to predict is to conclude via Bayesian prediction, as a consequence of priors, or from inductive premises to inductive conclusion].

^[2] Importantly, because it justifies the wider claim – perception and agency are prediction too. See last week’s post for a fuller justification.

^[3] In this sense, morale – maintaining ours or undermining theirs - is the ability to maintain one’s own aims in the face of adversary pressure.

^[4] In theory this is our actual doctrine, but it is honoured more in the breach than in the observance.

Everything is Prediction

Keith Dear — Tue, 02 Sep 2025 06:01:29 GMT

Everything is Prediction,

Everything we do is a wager on what comes next. Reading this is a bet that it will repay your time. Reaching for a coffee mug assumes it contains coffee. Strategy, policy, all decisions, even routine admin, each is an “if…then” about the future. The only real choice is whether to keep those bets tacit and unexamined, or to state them, price them, and improve them.

That is why “we don’t do prediction” is the most misleading sentence in public life. Historians sometimes say it, central bankers occasionally imply it, ministers often hide behind it.

Even explanation – at face value a solely retrospective exercise - is a forecast in disguise. To say “A influenced B” is to present a causal model that, applied tomorrow, shifts our expectations about ‘C’.

Historian Lawrence Freedman is rightly sceptical about crystal-ball claims. He writes thoughtfully on prediction and annually reviews his own calls, but still somehow judges that since “…it is always going to be difficult forecasting future developments. …it is probably best not to do so or, at least, not without qualifications and caveats.” Yet his own (excellent) explanations move readers’ inductive probabilities, and are valuable precisely because they do. An account of the past that cannot change how we bet on the future is merely decoration, it serves no purpose.

When President Harry S. Truman said “There is nothing new in the world except the history you do not know” or when General James Mattis, himself a prediction sceptic (see his opposition to ‘Effects-Based Warfare’) tells us that history and reading “…doesn’t give me all the answers, but it lights what is often a dark path ahead,” I suggest what their words mean in practice is that reading and history give them new mental models, help to shift the inductive probabilities – implicit – in their reasoning about what comes next. The eminent Kori Schake, another prediction sceptic, surely does not think her work has no relevance to the future? That when we read her work it should have no effect at all on our models of the world? If Schake, Mattis or Freedman concede their work is useful, they are with Truman, and admit to the necessity of prediction and their own roles in informing it. We cannot not predict. Prediction is unavoidable, essential. Historians are no exception – they predict causal models – the best explanation - of the past, their analyses reshape our models in the present, as we seek to shape the future.

Former Bank of England Governor Mervyn King devotes a whole book Radical Uncertainty to arguing that we should resist prediction, that “…to ascribe probabilities is misconceived…”,^[1] that narratives and scenarios are better. His account is half-right but wholly incomplete. Scenarios are essential, but again, you cannot not predict. Not all scenarios are of equal likelihood. Choosing one narrative over another is itself an implicit probability judgement. Refusing to put numbers on beliefs does not make those beliefs less numerical; it merely makes them unaccountable. The responsible move is not to abandon probabilities, but to calibrate them—and to say what would change our minds. When King says that no one could have predicted a Covid-class virus would emerge in Wuhan in 2019, his counsel of despair over-reaches. After SARS (2003), MERS (2012) and repeated H5N1 scares, we could not name the city, but we could price the odds of a novel respiratory pandemic in the coming years, and the most likely places it might emerge from. Those probabilities would change with new information, they may be low everywhere, but they would not be everywhere equal - they are not useless. King says “The better strategy is to be prepared.” That is the point of probabilities: not clairvoyance, but disciplined readiness.

What should that discipline look like? There are four key elements:

(1) Find the base rate: the prior odds for events in a class (e.g. novel pathogen emergence, cross-border spread, health-system overload).

(2) Pair scenarios with forecasts as inductive premises, not a single point estimate. Accept that decisions are if/then predictions and you see that the each ‘if’ is a factor in your decision model. The ifs need probabilities, from which the probability of the ‘thens’ can be mathematically derived. In putting these forward, you are crystalising your thoughts, moving beyond fuzzy thinking and the tendency all humans have to hold incompatible beliefs, to believe like Alice in Wonderland’s White Queen six impossible things before breakfast. (At Cassi, we describe this as “decision legibility” – you’d think it was everywhere. It is rarer than you imagine!).

(3) Pre-commit to triggers for action, e.g. surge testing when the forecast of transmission exceeds a probability threshold in two regions; release stockpiles when the forecast of hospital utilisation crosses x%; tighten or loosen measures as defined indicators move.

(4) Transparency. Show how each tranche of evidence shifts the odds over time. Narratives are a communication tool – often designed to obscure more than they reveal; probabilities confront us with reality, build trust, tell us how—and when—to act.

This is the hinge where philosophy helps. Frank Ramsey, Cambridge prodigy, solved the practical problem a century ago: degrees of belief are what you are prepared to act on. If your behaviour would not change when the odds change, you didn’t believe the odds – or your model - in the first place. Ludwig Wittgenstein stripped the mystique further: probability is not a mood, it is a relation between propositions and the world, a logic of uncertainty. John Maynard Keynes added a crucial caution: evidence often has different weights; some uncertainties are harder to forecast because while the evidence all points one way there is only limited evidence or uncertainty as to its reliability. Here confidence – and thus your forecast - should reflect that uncertainty (closer to 50% than zero or 100%) - even when you are directionally convinced. Together they point to a discipline: state your beliefs in subjective probabilities and update as new evidence arrives.

Friedrich Hayek supplies the institutional corollary. Knowledge is dispersed; no planner sees the full picture. Markets work because prices compress scattered expectations into a single, continuously updated signal. In that sense, markets are prediction engines, not oracles, but social machines for aggregating beliefs about the future. Eliezer Yudkowsky pushes the point in Inadequate Equilibria: where incentives are weak or misaligned, feedback is slow, or entry is blocked, equilibria can remain wrong for a long time. The pathology at the heart of both bureaucratic failure in Government and business alike. The remedy is to improve the prediction machinery, introduce real feedback - markets, skin in the game, and to confront issues, invite in contrarians - seek out what you need to know, not what you want to hear. In policy terms: use auctions not allocations, prediction markets not pundit panels, prize challenges not closed committees. This is what UK Government tried to do with Cosmic Bazaar, and the US with both IARPA’s ACE, and the Intelligence Community Prediction Market (ICPM) forecasting tournaments that demonstrated promise — and then stalled. As Professor Philip Tetlock – doyen of crowd-based ‘Superforecasting’ - put it, we have kept “fumbling the crystal ball”: forecasts sit outside power, with limited ownership, little skin in the game, no budget consequences, and under attack from bureaucratic antibodies. This must change. If as we argue here ‘everything is prediction’ – we must start optimising our organisations to be better at it.

If prediction frames decisions, it also underwrites agency. On Karl Friston’s free-energy view, organisms survive by minimising “surprise”: they either revise their internal model to fit the data (perception) or act so the data fit the model (action). In ordinary terms: when expectations and evidence clash, you can change your mind or change your circumstances. Perception is inference; action is hypothesis-testing. The Duke of Wellington put it more tersely for the real world: “All the business of war, and indeed all the business of life, is to endeavour to find out what you don't know by what you do; that's what I called ‘guessing what was at the other side of the hill.’” Active inference before the label.

Similarly, evolutionary psychologist Michael Tomasello argues human and animal agency is layered if/then prediction. A hierarchy with each tier making more complex predictions, exerting itself to suppress the more simpler decision calculus of the tiers below.

At the bottom of the agency hierarchy are stimulus-driven simple organisms such as the ~1 mm long worm C. elegans: they perform something called ‘chemotaxis’, where they track nutrient gradients, effectively eating their way up a calorie signal predicting only that it will continue via the next ‘bite’, stopping when sated, or when the signal ends. The prediction is if nutrients, then eat in that direction. No other calculus.

Next, goal-directed agents (early vertebrates such as lizards) pursue an immediate objective, but are able to inhibit action when predicted predation or other risk outweighs reward. If prediction of predation X is greater than predicted reward Y, inhibit. If not, pursue reward.

Intentional agents (mammals like squirrels, and corvids) represent alternatives - that feeder, this route, those acorns and that jump – intuitively predicting complex if/then alternatives from multiple paths and multiple options.

Rational agents (great apes) plan with counterfactuals, stack boxes to reach greater heights, make or pick tools, they trade immediate impulses for a better expected payoff, imagining and testing abstract alternative futures.

Finally, socially normative agents (humans) align predictions through joint attention, perspective-taking and obligation: an “I” that can join a “we” – by observing “you” and predicting how you are predicting - intuitively pooling information and coordinating on shared futures.

Each layer of agency is not replaced, but is demoted as the newly evolved tier above learns to predict when to suppress its subordinate’s more simplistic calculations for a higher payoff among more complex options. At every tier, behaviour is guided by forecasts: of the world, of partners, and of the self – and subordinate levels of agentic impulse - and corrected by the surprises that reality returns.

The same stack runs through human organisations. Perception is predictive (brains pre-empt and correct), operations are predictive, with tactics micro “if…then” loops. Strategy is predictive (a scaffold of contingent moves). Science is the most explicit version: if condition X holds, outcome Y will follow. Diplomacy and business are messier but structurally the same. A sanctions package assumes particular channels will bite before spoilers adapt. A product roadmap assumes rivals will not leapfrog a core feature in the next two quarters. Prices, policies, and laws that encode ‘expectations’, are predictions by another name. Even organisational design, per the doctrines of ‘adaptive strategy’ is a prediction – you predict the optimal system structure to sense, adapt, and respond to changing circumstances. More succinctly, if a bit meta, you predict which system will make the best predictions. To govern or to run a firm is to predict. You can’t not.

What follows? First, stop hiding forecasts inside prose. Make them legible. When presenting an analysis, say: “We assess a 35–45% chance of X by June; here are the three variables that would move that by ±15 points.” Second, adopt Keynes’s “weight of evidence” explicitly, with Ramsey’s insistence on making your beliefs legible to discover what is true – what are your inductive premises? What are the probabilities they are true? Have you reflected your doubt about the sufficiency of evidence in your forecast? Third, import Hayekian aggregation into decision-making: add prediction markets (internal if need be), tournament-style scoring for analysts, score and reward effective calibration, and accuracy. Fourth, use Ramsey’s test: if your actions don’t shift with the odds, your statements were theatre.

For historians and commentators, the standard is similar. If an explanation cannot be operationalised into a forward expectation - what would be more or less likely next time a similar configuration appears - then it’s not yet an explanation. “We don’t predict” is not humility: it is a missed chance to help readers update.

For central bankers and officials, the discipline is harsher because the stakes are higher. Put numbers on your beliefs, say what a policy assumes, publish its probability of success. State the triggers that will force a pivot. Score your calibration and accuracy, so we can learn how much we should trust your judgement. The public can forgive error; we should not forgive opacity and obfuscation.

Everything is prediction: in minds, in markets, in ministries. The choice isn’t between predicting and not predicting; it is between implicit, untested bets and explicit, accountable ones. We should choose the latter—and build the habits and institutions that make our next bets a little less wrong than the last.

This is what we do at Cassi. Contact us at admin@cassi-ai.org.uk

^[1] Kay, J. and King, M., 2020. Radical uncertainty: Decision-making beyond the numbers. WW Norton & Company. especially pp. 71-84, quotations from pp. 223-224.

Future Diplomat

Keith Dear — Tue, 19 Aug 2025 06:00:56 GMT

Concept rendering of a British “self-driving embassy,” an AI-operated mission built around data-center infrastructure.

Future Diplomat: Rewiring & Reimagining Diplomacy

Technology has long changed diplomacy. If diplomacy is, according to the Oxford English Dictionary:

- the management of international relations by negotiation; the method by which these relations are managed by ambassadors etc; skill in such affairs. Adroitness in personal relations; tact.

… then diplomacy is communication: both the channel by which states speak, and the craft by which they persuade. As technology revolutionises how we communicate, to make and shape decisions, so it revolutionises the conduct of diplomacy.

Think of diplomats, embassies and the FCDO as cybernetic systems. They seek and sense information, process it through the neurons of the brain as individuals, or between the functional roles of the people staffing the organisation, act or choose not to, then observe the effects. The end is behavioural: get someone to do something new, do something differently, or keep doing what they already do. It might be a foreign person, organisation or group is the target of that diplomacy, or it might be those within the diplomat’s organisation, their home country or government – where the diplomat is seeking to mediate between positions.

If this is true, it should be clear that technologies that are fundamentally informational, and cognitive – they draw insight and foresight from data at unprecedented speed and scale, and make decisions based on models of the future – will transform diplomacy.

This vision for 2035 is built on the premises above. In addition, its starting assumption is that using forecasting tournaments and/or prediction markets1 is the best way to understand the future – on which basis it anticipates:

- weak artificial general intelligence (AGI), one that matches or exceeds human performance on a range of diverse cognitive and informational benchmarks, by March 20272;

- Oracle AGI, a system that significantly exceeds human performance in answering any question, by 20283 – or under a more stringent definition 2032.4

- Artificial Superintelligence (ASI) – adding superhuman skill in robotics, proprioception, dexterity, locomotion ~May 20295;

Thus a diplomat in 2035 will be operating in world of superintelligent AI.

Consular Services Without Counters

Most consular work is structured information processing: visas, passports, notarial services, births and deaths, crisis assistance. By the early 2030s, agentic systems will handle the bulk of it end‑to‑end with clear audit trails and explainable reasoning. Consulates will be able to operate in all languages, tailoring information to individual cultural and psychometric dispositions and levels of education. Identity will be verified remotely through secure biometrics. Citizens will complete interviews and attestations conversationally, recorded in signed digital form. There will be little need to visit consulates for these tasks, with most activity taking place online, and other interactions via the metaverse – meetings with AI (probably ASI, per current forecasts) agents embodied in avatars. If these are distinguishable from a remote meeting with a human today it would only be in that we should expect an advanced, or superintelligent AI to provide a better service than a human – more empathetic, more patience, more personalised, smarter, more knowledgeable, with better memory.

From Self‑Driving Labs to Self‑Driving Embassies

In 2023 Eric Schmidt, in MIT Tech Review, & the scientific journal ‘Nature’ both described the birth of self-driving labs (SDL), those that - at the maximum extent - operate fully autonomously, designing hypotheses and experiments, conducting experiments, reporting the results, and then designing new hypotheses, testing again, without human intervention, at super-human speeds. In 2025, self-driving labs are driving breakthroughs globally. The journal Nature Chemical Engineering reports that in the field of materials science self-driving labs are producing new materials, ten times faster, processing ten times the data, accelerating discovery exponentially. Furthermore, July 2025 saw Nature publish articles showcasing:

a ‘self-driving microscope’ to better understand protein aggregation and neurodegenerative disease, and
“Ada” a self-driving lab that has discover new metal materials by identifying optimal trade-offs between conflicting objectives
Showcasing a Chinese lab’s use of AI to design the ‘brain’ of a quantum computer using laser light to move thousands of atoms into any arrangement they choose and act as qubit, in 60 milliseconds - demo’d in this case to produce a video of Schrodinger’s cat.

While the same month this year (2025) Science reported how self-supervised AI now gives robotic probes pixel-level precision, letting autonomous labs log over 3,000 semiconductor tests in a single day - enabling rapid defect detection and a stronger, more resilient advanced-manufacturing supply chain. Given AI Agents are reportedly doubling their performance ever seven months, these kind of breakthroughs are likely just the beginning (see Cassi blog post dated 31 July for more SDL progress in 2025).

The next decade, perhaps the next few years, will see the emergence of ‘self-driving embassies’. Just as today’s self-driving labs use autonomous AI to answer scientific questions, in self-driving embassies AI agents will handle consular and diplomatic enquiries and AI systems mediate most—perhaps all—routine exchanges, principally text, audio or digital, between embassy and both home and host governments.

Persuasion agents will build psychometric profiles of the individuals and groups with whom the Embassy or diplomat is interacting, enabling tailored negotiation strategies, which can be adapted and improved in real-time. When negotiations are in-person via video call, AR/VR meet, or physically face-to-face, ASI agents will be reading micro-expressions, designing recursive negotiation strategies and updating the approach continuously. Affective computing will be more empathetic than humans, able to understand and manipulate us at superhuman levels. It is hard, though not impossible, to imagine negotiation being fully delegated to ASI systems negotiating with each other. A study in Nature in May 2025 showed AI was more persuasive in debate than humans >60% of the time. A recent (2025) unauthorised and probably unethical experiment on the Reddit forum R/Change My Mind shows the way – an LLM deployed to persuade was six times more persuasive than humans and operated without anyone recognising it was an LLM. The largest LLM persuasion study to date, (published 18 July 2025) showed newer, larger models were the most persuasive, and that models persuasiveness persisted over time (1-month, in this study). The best models (already a generation behind o3 and GPT-5) were 41% and 52% more persuasive than ‘static’ messaging - remember when we were panicking about ‘static’ social media messages on Twitter and Facebook? In fact, previous studies of human persuasiveness, advertising and personal conversations in elections, suggest ‘an average effect of zero’.

Even if human relationships remain crucial to effective influence and negotiation, they will be increasingly informed by profiling agents, prompting and preparing diplomats for discussions, perhaps advising during discussion, debriefing afterwards.

Linguistic skill may be vital, not for translation - which may cease to be relevant when translation can be automated instantly - but to enable diplomats to develop the mental models and worldview of those they interact with. If that is the case, diplomats will likely be learning through AI-tailored immersive courses in AR and VR. Cultural expertise, and understanding - the right-brained process still too little understood by neuroscientists, psychologists, computer and other scientists for us to be confident we are modelling it effectively in Artificial Intelligence, may be a key human skill. But to outperform or complement machines it will likely need longer, deeper immersion in country, possibly career-long deployments. Similarly, when the human currency becomes relationships, since rationality, persuasion, information processing and reporting is undertaken at superhuman level by machines, FCDO careers will be much more focused on the cultivation of life-long relationships with current and future key influencers in countries around the world. Fast-stream processes, and civil service churn will have to end if diplomats are to flourish in an age of AI. Deep relationships take time to build.

Recruitment of diplomats will use AI systems to identify super-persuaders, just as intelligence analysts will be identified as super-forecasters, or super-questioners – those that can create the questions that maximally reduce ignorance, on which forecasters and ‘persuaders’ should focus – and recruitment will follow. A more prosaic instantiation of Philip K Dick’s Precogs in ‘Minority Report’. Strategy, tactics, decisions will be automated. Humans will be working out what the most desirable ‘outcome’ or objective is with machines, but the planning itself will be done by machines, and implementation overseen and directed by them.

Passports to New Worlds

The trend towards e-citizenship and e-businesses will continue. Many visa and passport applications will be for virtual access to the UK, as telepresent (AR/VR) or telexistent agents (AR/VR with the ability to directly interface with the real world via robotic interfaces) – with people ‘visiting’ and working in the UK without ever entering the country.

Furthermore, citizens and businesses will apply for Visa to access UK managed virtual worlds, metaverse countries with economies that rival countries in GDP with access to UK legal authority for dispute resolution.

Beyond the ‘Geo’ in Geo-Politics

Migration pressures will ease, as citizens in developing countries are able to prosper by building e-businesses in developed countries while remaining in their own. Trade barriers and cross-border taxation will become harder to enforce as metaverse economies produce wealth independent of geography.

Space will be the most lucrative market in human history, with in-space industries from mining to manufacture, compute to colonisation. By 2035, with regular vertical and horizontal launch from the UK, the FCDO will have to manage in-space territorial claims and economic competition, as well as the security implications that flow from competition for dominance in space. If the UK fails to compete in the space colonisation, asteroid mining and space-based industries, the FCDO will be managing a much more precipitate decline in the UK’s relative power in the international system than it has ever had to do to date.

Finally, automation, robotics and AI will reduce countries and companies' dependence on demography to mine raw materials, produce goods, services, and scale armed forces to deter or fight. Self-driving labs will become self-driving innovation systems, with companies and countries power, prosperity and security driven by the speed at which they can invent and harness new technologies.

In combination, metaverse economies, the space market and automated innovation and industries will move competition for relative power in the international system beyond the geo of geopolitics, with nations’ within-border natural resources and demographics less deterministic of their power and influence. For the FCDO, effective statecraft will have to be more integrated x-Whitehall, and with industry, finance and society, than ever before.

Influence, Security, Privacy, Agency

Superintelligent systems able to analyse groups and individuals, understand their psychometric make-up, and influence their behaviour risks social control tightly circumscribing human agency, able to manipulate with unprecedented effectiveness. Even today, we generate c. 1.7MB of data per person per second – the equivalent of 56 million Kindle books per on each of us generated every day – it is this that gives unprecedented insight and foresight into who we are, what we do, and what we are likely to do in the future. Debates on human agency will soon be the domain of the diplomat, as privacy-centric EU clashes with the market-centric US, and China’s digital authoritarianism.

Our security will be redefined in a world where the boundaries of the virtual and the physical have blurred through AR/VR, where we live in different ‘belief circles’ seeing different versions of the world around us to our neighbours, and spending more and more time in borderless cyberspace. Cognitive security may be a competition for e-citizenship, e-businesses with other states, and virtual states.

When our cars, smart homes and brain computer interfaces all become critical national infrastructure through connected IoT devices, we will have to rethink how it is managed. Cognitive security defending our data and our minds.

All this will make the role of the FCDO more complex, and more expansive, while the future diplomat might be able to more effortlessly cover a wider geographic area, the GeoInt Singularity + ubiquitous data surveillance will mean they will struggle to move anywhere or meet anyone, without others knowing. It may be that everyone they meet knows their psychometric profile, their preferences, and has optimised their position to ‘win’ in diplomatic negotiation.

Perhaps privacy-enhancing technologies will prevent this, but if they do, they will also limit the state’s ability to manage threats, gather data.

Public diplomacy conducted online will become an ever more powerful tool, but will also raise the same questions of privacy, security and agency – how far should we go in profiling and manipulating in a world where Spotify data can be a better indicator of national mood than direct survey data6, where all of us are vulnerable to microtargeted manipulation.

In this context, regulatory diplomacy will be as important as all other forms, determining the kind of world we live in, and the extent to which people are free. Hence China requires all AI companies to design with “core socialist values”, values it seeks to embed and export via its proposed ‘Organisation for Global AI Cooperation’ (Reuters, July 2025; Chinese State Council, 26 July).

Simulation and Wargaming

New policies, new negotiating tactics, the effect of military or economic actions, likely future scenarios and emerging flashpoints we all be constantly developed and tested through the digital strategy-engines, offering superhuman suggestions for strategies and tactics, testing second order effects, highlighting new factors to be considered. The GeoInt Singularity, where every square meter of the earth is constantly observed from space, will enable the development of a global digital twin, to identify flashpoints and early indicators of emergent conflict or natural disaster. Wargames and collective intelligence tools will enable humans to challenge the models assumptions, force it to consider counter-factuals, ensuring the most effective strategies are pursued.

Augmentation

Technological transformation will reduce the role of humans in many diplomatic decisions, and reduce the number needed to deliver diplomatic services. However, human relationships may remain key to successful negotiation - it depends on the extent to which nations are willing to let AI-agent negotiate with AI agent. If humans remain in the loop they will be increasingly augmented, able to bring the best of human cognition - whatever remains of value in this emerging age – together with technology to enhance our capacities. Brain-computer interfaces will, by 2035, likely be enhancing memory, perception and learning. In turn we will be aided by a neuropharmacology industry that optimises our performance and our rest, and by a growing science of high-performance teams, that ensure we best balance cohesion, talents, and incentives to maximise the performance of people in groups.

The US AI Umbrella?

If the US develops an AGI or ASI model before anyone else, is able to harness it effectively, and launch it first on recursive self-improvement, it may be that no other nation can catch-up. In such a situation, British and other nations diplomats would be negotiating with a system that was comprehensively smarter than them, and progressively more so – those they interact with being prepared not by very capable advisors, but by a superintelligence Private Office or equivalent – or possibly being asked to interact directly with the US’ ASI diplomat. So too British businesses would be facing competition from businesses smarter than them, the British military working with an ally, assuming the US remains such, that was strategically, operationally and tactically smarter. The outcome of such competition is likely to be very one-sided. If this becomes so, we may find ourselves living under the AI equivalent of the US nuclear umbrella, surrendering sovereignty, agency, in return for protection and access. The price of living under the US AI Umbrella is likely to be far higher than the cost of its nuclear protection.

AI Disruption or Destruction

It could be that we live, as Oxford Philosopher Nick Bostrom puts it in a ‘solved world’ a post-ASI utopia where nobody need concern themselves with the intra-human competition of which diplomacy is part. It might be that through misalignment or malignance AI poses an existential threat to humanity. Or it might be that AI becomes a substitute, not a complement to human labour, with radically disruptive social and political effects. None of these scenarios are so unlikely that the can be ignored. Hope is not a strategy. But this article has described the Future Diplomat operating in a messy, complex, competitive world – with or without the development of super-intelligent AI. It may be unproductive and distracting to spend too much time on the dystopian outcomes here. It is vital the FCDO does spend time on them, nonetheless.

Conclusion

Much of what has been described is possible now, with today’s relatively narrow AI. More will be possible as current AI advances further - all the evidence suggests the UK will underestimate this progress, and the nation be poorer and less secure as a result.

In the more likely than not advent of AGI or ASI before 2035 the role of the diplomat and Embassy will be radically transformed, with humans the agents of AI as much as the other way around. The birth of metaverse worlds and the growing import of the space economy by 2035 will change where diplomacy is conducted, power created and managed, people and machines persuaded. We must be ready.

The best time to start building for this future was yesterday. The next best time is today.

For a good overview of the difference, see: https://www.metaculus.com/notebooks/38198/metaculus-and-markets-whats-the-difference/

Metaculus, 2025. Date Weakly General AI is Publicly Known. Accessed 31 July 2025.

Metaculus, 2025. Oracle AGI Precedes General ASI. (20 months). Accessed 31 July 2025;

Metaculus, 2025. Date of Artificial General Intelligence. Accessed 31 July 2025.

Metaculus, 2025. Time From (weak) AGI to Superintelligence (25.9 months). Accessed 31 July 2025.

See e.g.

a. Edmans, Alex and Fernandez-Perez, Adrian and Garel, Alexandre and Indriawan, Ivan, Music Sentiment and Stock Returns Around the World (August 14, 2021). Journal of Financial Economics (JFE). Available at SSRN:https://ssrn.com/abstract=3776071 or http://dx.doi.org/10.2139/ssrn.3776071.

b. Kaivanto, K. and Zhang, P., 2019. Popular music, sentiment, and noise trading. Lancaster University Management School.

c. Sabouni, Hisam. (2018). The Rhythm of Markets. 10.13140/RG.2.2.31484.64646. Pre-Print https://www.researchgate.net/publication/323560860_The_Rhythm_of_Markets

What did you do in the AI Revolution, Daddy?

Keith Dear — Fri, 01 Aug 2025 13:36:10 GMT

This article is really ‘just’ an update on AI progress. All the developments here are from the last three weeks or so. I would go back meticulously to the last time I posted, but the article would be too long and the point is made more neatly here by not trying to cover everything. Progress is still astonishing.

Why now? Well, rumours are rife that GPT-5, the model that OpenAI CEO Sam Altman says ‘scares him’ that seems to have convinced him we are past AGI, and now the target is ASI- super-intelligence (variously defined1), is said to be ready for roll-out and release this month (August 2025). Indeed GPT-5 and various derivatives may have been trial-released in LLM Arena last weekend, as Zenith, Summit, possibly Lobster, Nectarine, Starfish, and o3-alpha, drawing breathless commentary for their performance2.

After GPT-5 release happens, there’s a very real risk and high likelihood the remarkable recent progress is eclipsed and under-discussed. As Rob Bassett-Cross (CEO Adarga AI) and I wrote in our eponymous article ‘Don’t Blink’ - AI watchers daren’t look away for a moment such is the pace of progress. My concern remains that as a country, a society, and in many of our companies, we in the UK are still under-estimating the speed of AI progress, and the profundity of its implications.

This under-estimation is despite the discussion on X and Substack beginning to break into the mainstream. Tom Whipple wrote in the Times last week (25 July):

‘I know the riposte. Humans always fear machines. …AI won’t take your job, goes the mantra, but if you don’t get with the revolution, someone using AI will. My response? You’re not listening.’

But even with views like this in the thumpingly centrist Times, nothing really changes in UK policy circles. But it changes everywhere else.

In the past few weeks for example, an essay collating the thoughts of leading Chinese thinkers on AI - shows their emphasis on Open Source, diffusion, ‘talent’ as espionage vector, and regulatory standards as the route to values dominance. A hyper-pragmatic, and believable approach to the otherwise unequal competition with US Big Tech. It is free of the delusions and not-really-even half-measures dressed up as solutions that characterise the UK’s response (and that get almost no scrutiny or criticism from our media for which, Whipple not withstanding, this remains a fringe issue - as opposed to perhaps the most disruptive technological revolution in human history).

We’ve seen the Chinese Premier Li Qiang announce a plan (consistent with the above) for China to launch an ‘Organisation for AI Cooperation' based out of China but with global aspirations.

We see in China a growing focus on ‘embodied AI’ - accelerating diffusion of AI into its industrial base as a way to advantage, perhaps also making the vast amounts of data it collects available, to influence future models.

That’s just China. RUSI reports on the scale of US deals with the Middle-East on AI investments as ‘staggering’. But these investments are not staggering - they are a proportionate reaction to the continued acceleration of AI progress, an anticipation of the rate, direction and implications of this progress - they are other nations’ recognition that they can’t afford to be left behind.

Yesterday (31 July 25) we read that Norway has struck a deal with the US to build one of its Stargate AI projects (the US ‘Manhattan Project for AI’ announced in January) in country, a gigafactory, an AI data centre entirely powered by renewable energy, to supply Europe. Norway and US business are investing $1bn in the first phase of this project. ‘Europe needs more compute’, Sam Altman notes. More than just ‘more compute’, we - UK and Europe alike - need a credible plan proportionate to the threat and opportunity posed by AI.

Here’s why. In the past few weeks:

Nature Chemical Engineering reports that self-driving (AI/automated) labs are producing new materials, in the field of materials science, ten times faster, processing ten times the data, accelerating discovery exponentially.
Nature Communications reports that self-driving labs have discovered in the venom of snakes and spiders, 2000 new antibacterial proteins that could serve as alternatives to current antibiotic treatments, 58 of these have been tested so far, 53 had medical applications such as killing the currently drug-resistant bacteria e-coli and staph - vital in a world of growing anti-microbial resistance.
A survey published back in March showed Agent-based AI models’ increasing abilities in completing software development tasks. In 2022 they could complete at the touch of a button tasks that took humans 30 seconds. In 2025 they are completing tasks that take 1-hour. Their performance was doubling every 12-months, but is now doubling every 7 months. On this current trend, even without further acceleration in the rate, by 2027 they will be able to complete tasks that take humans a month. By 2029 tasks that take humans a working year.
I re-report that because now, further research shows these improvements in Agentic AI ‘task scaling’ are also being seen with Agentic AI in scientific reasoning, maths, robotics, the use of computers for tasks, and in self-driving systems.

Most significantly, in my view, since last I wrote, mathematicians have had their Gary Kasparov/Lee Sedol moment, when AI surpasses them and they wonder what they are for. This was triggered by ‘general’ models like OpenAI’s latest reasoning model and an advanced form of Google’s Gemini winning ‘gold medals’ at the International Maths Olympiad (IMO). The achievement prompted both amusement and amazement. One Data Science Professor posted about the poor performance and limitations of general AI models on Olympiad tasks just hours before their success was announced and was much mocked. In his defence, this was faster progress than some of the most bullish experts had predicted, and is another example where progress has been consistently under-estimated. The question ‘When will an AI win a Gold Medal in the International Math Olympiad?’ was forecast as follows:

July 2021: 2043 (22 years away)

July 2022: 2029 (7 years away)

July 2023: 2028 (5 years away)

July 2024: 2026 (2 years away)

July 2025: AI wins a gold medal in the International Math Olympiad.

This is not the first time I have written out these kinds of estimates to illustrate how difficult humans find it to understand exponentials, to forecast AI progress.

On Friday, seven days before I clicked ‘publish’ on this post, AI performance on the ARC AGI 2 benchmark [‘Abstraction & Reasoning Corpus for Artificial General Intelligence 2’ – an AGI test redesigned after ARC AGI 1 was surpassed in December 2024] jumped from 16.5% to 19%. ‘Only’ a 2.5-point gain, it represents a >15% relative improvement in capability. If we assume exponential growth here too, given this benchmark was launched end March 2025, and models were scoring ~12% by end April, it could be fully solved by ~August 2026.

We’ve seen Deepmind’s Aeneas AI historian decode and complete inscriptions and text, adding nuance and contextualised understanding. We saw Deepmind launch AlphaEarth - where the impact on espionage and military operations should be obvious - take this summary from one commentator:

'AlphaEarth Foundations does something clever -- instead of drowning in petabytes of Earth observation data, it creates compact summaries of every 10x10m square on Earth by fusing optical, radar, LiDAR, and climate data. ...it can see through clouds in Ecuador and reveal hidden agricultural patterns in Canada.’

As I’ve written before, only in the world of AI could developments like these two new Deepmind models and milestones be essentially addendum to the main points. More likely than not, all the developments reported herein will soon be eclipsed by and quickly forgotten after the release of GPT-5.

A reminder - as I wrote in In Athena’s Arms - ‘…this is not a millenarian declaration of deterministic certainty.’ Maybe progress will stop. Maybe someone will declare ‘AI will never’ - like chess, Go, the IMO - and this time be right. Maybe someone will point to a fundamental barrier to AI progress that we can’t innovate our away around, through or over. But right now, the ‘straight lines on graphs’ that have driven and reflect AI progress for almost two decades, show no signs of stopping. Very few people seem to be taking this as seriously as you might expect given the evidence - trying to plan, build, invest, govern, write, report, or converse as if AGI could be with us imminently, or in the next few years. Hope is for too many, the entirety of their AI strategy.

We are still under-estimating the speed of AI progress, and the profundity of its implications.

There is a great tyranny of definitions here, explained at length in previous posts’ footnotes. You’re on your own this time, unless you want to go back and check the previous caveats and explanations.

e.g. https://x.com/kimmonismus/status/1949037129163547083, https://www.reddit.com/r/singularity/comments/1m9v5s5/what_model_is_summit_on_lmarena/ & https://www.linkedin.com/posts/emollick_kinda-amazing-the-mystery-model-summit-activity-7355073850738413568-H_pc

UK Strategic Defence Review (SDR): Reviewing the Review & the Reactionariat.

Keith Dear — Wed, 04 Jun 2025 06:02:09 GMT

The SDR published yesterday (Monday 2 June) is the best of my lifetime. It is sharper, more honest, and more structurally ambitious than any I have read—including those that predate today’s formalised processes and those written by nations beyond our borders.

But it is already at risk. Not because of what’s in it. But because of what’s being said around it.

I’m writing this not just to praise the Review, nor simply to critique its flaws. There are several, some are serious. But more urgently: to warn that the opportunity it presents could be lost amid noise, spin and superficial commentary. This would be a failure of national consequence. I hope the blog might reach those who can help us avoid it.

Distinguishing Review Signal from Political Noise

The misunderstanding and mischaracterisation create three principle risks:

We could miss the moment to prepare for war—and thereby make it more likely, and more disastrous.
We could miss the chance to raise the level of our defence debate, after years of kayfabe, fake budgets, and misdirection.
We could lose the Review’s careful and detailed analysis amid the thunderous political ‘announceables’ that risk drowning it out —many of which are either unrelated, outdated, or contradictory.

Let us take the third first.

In the run-up to the Review’s publication, there was a story that the UK would now fund and field a second fighter-jet launched nuclear capability, in addition to the continuous-at-sea deterrent, our nuclear submarines with nuclear missiles. This is not in the Review, and regardless of its potential merits and demerits (I think on balance a sovereign capability we need) – or indeed whether we do commit to this following the talks that the Times tells are us are being held with the US, the Review does not recommend this. It was certain though that announcing it would drown out almost everything else in the debate. Similarly, the announcement of a £15bn new ‘sovereign warhead’ was always going to grab headlines, but isn’t new. Likewise, the announcement of twelve new submarines is not in the Review [0.1], which instructs the MOD to confirm numbers against requirements. Announcing all this as the Review was published obscured what the Review actually says.

The Flaws are Clear

Many criticisms of the review are justified. Decoupling it from a Spending Review, as happened in 2021 during the Integrated Review, neutered it. It is a standard Treasury tactic and it enables the Government to do what institutionally, it prefers – make policy and strategy word-smithing and empty rhetoric, a kayfabe, while the gritty, largely boring reality of implementation and hard choices are hidden: made by default, rather than decision, choices made when the headlines are dominated by wider cuts, spending and taxes on non-defence issues. Relatedly, the MOD reserving for itself the decisions on what equipment it needs, to come after the Review, makes it unlikely they would sign-off any recommendations the Reviewers might want to include on what and where to cut. It cannot be an ends-ways-means review, a strategy, if denied or constrained in the opportunity to comment on what means - financial, and in kit and equipment - are needed.

In consequence, this Review has been emasculated by being separated from the spending review. As many commentators have noted either:

a. Defence needed a lot more money. Or
b. the Reviewers needed to show where they would cut, take risk and transform.

They have done neither. But they were presumably prevented from doing so. One wonders what the Review looked liked before the five-month period of ‘Decision-making and finalisation’ as the Review team ‘finessed…[their]…recommendations’ during ‘Engagement across Whitehall…’.

The language seems dumbed down in many places, enabling consent and evade (or ‘toxic compliance’ as I heard it called recently). e.g. “high-low mix” can be whatever the MoD decides it wants it to be. Digital Targeting Web is clearly a “Kill Web” the failure to call it what it is shows I think the lowest common denominator compromise that has infected parts of the Review. On procurement, we are still allowing some equipment to be built on 15-20 year timelines, with individual capabilities taking 18 months+ to manufacture. I don’t think you need to see the classified info (and to be clear, I have not) to see that this is a recipe for cost overruns and failure. The tech will improve a thousand times or more during the life of programmes, requirements will change in geopolitical cycles that can be hours and days, or just a few years. Most importantly - we must be able build stuff faster than our enemies can destroy it. What’s the ratio between expected attrition rates and the production speed of new platforms like FCAS or SSNs?

Things are often under-specified, for example, the MOD will likely just rebadge its multidomain integration stuff as a “digital targeting web” and keep on failing.

There are the usual linguistic atrocities. But these fall in to the category of niche objections that might matter, but not to most people. Fellow pedants, find more in the footnotes.[1]

But there is a real danger here that ‘the Reactionariat’ of which I am now a part, miss a historic, urgent and vital opportunity to use the Review to raise the quality of our Defence debate, to hold the Government and the MOD to account. Strategic Reviews rarely give us a strategy - they should, that they don’t is a failing of our system [1.2]. But they can still be influential - harnessing the zeitgeist, empowering reformers - through a kind of ‘narrative dominance’ they can change the conversation, and with it the priorities and activities in Government. This is where the risk of missing the opportunity given by this excellent review lies.

The Review has Given the Nation an Opportunity we Must Not Miss

In diagnosis the Review is clear and comprehensive in saying how bad things are, overturning rocks, and calling out how urgently things need fixing. You can see where it has been dumbed down I think, not quite being able to say that if we fight tomorrow it would be a catastrophe. But it is very close, and you don’t have to work hard to figure out that is what it says. I think you could sum this up as saying it would no longer be possible for HMG to say it did not know how bad things are. I think to have got this through the Whitehall system – maybe even more in need of reform than that within the MOD - is a significant achievement. It is detailed, exacting and far-reaching. Genuinely impressive in form and content. It could, if understood for what it is, drive a much better debate on defence.

There are important structural changes - Cyber-EM Command unifying electronic warfare (we’re actually pretty good at this, in narrow pockets that are permanently at risk of being killed by the system). Similarly, MSHQ and killing Levene are a big deal. Something we tried to get done when I was serving in 2018, again in 2020 when in No10.

Under the current SofS’ John Healey’s leadership, and now public in the Review, CDS now has - or should have according to his revised role described in the Review - some meaningful authorities and not just sit as an advisor to SofS and the PM. We can stop talking about a civilianised ‘Head Office’ with its appalling culture but a proper HQ, that can be held accountable (without this, no one was in charge and everyone blamed everyone else for continuous failure). These problems were and are systemic, not individual, but they incentivised and rewarded some pretty toxic behaviours. They were part of the reason that lowest common denominator compromise consistently triumphed over confronting issues. We should now have not three and half (STRATCOM) forces designed separately who figure out how they will fight later, but one force, centrally designed and focused on how we would fight in NATO vs Russia (I personally don’t think this is the right way to think about the threats we face in an era of systemic competition, but it isn’t an unreasonable approach [2]).

The Review has a ‘demand’ for specific metrics, on lethality, productivity, and national economic impact. There is a section designed to ensure accountability (p.50). There’s a call to cut “by at least 50%” MOD regulation that kills industry. The kill web has a deadline - MVP by 2026, fully operational by 2027 – and where described as a ‘clear unifying mission’ seems to be an attempt to give us a proper theory of winning that we haven’t had since the end of ‘Air-Land Battle’ which passed with the Cold War. The commentariat could do much to push this to become a full operating concept, a theory of winning that drives force design based on DARPA’s ideas of the Kill Web, and Mosaic Warfare (or decision-centric warfare).

The focus on the way war has been transformed by technology runs throughout, describing how this is not an event, but rather that technology is continuously redefining warfare. This is, I think, correct and not really disputed (the degree to which it is true is argued, not the fact that it is).

Criticisms that there is not enough on emerging technology and procurement are unsustainable – some of the Reviews recommendations were clearly taken by the Treasury and announced early, but they are well covered and pushed further: “Major modular platforms (contracting within two years). Pace-setting spiral and modular upgrades (contracting within a year). Rapid commercial exploitation (contracting within three months), with at least 10% of the MOD’s equipment procurement budget spent on novel technologies each year.” The MOD is pushed to engage with VC and Private Capital in ways it currently does not.

The commitment, publicly, to making the UK’s Armed Forces ‘the most innovative in NATO’ echoes something I argued for in 2020 for the 2021 Integrated Review. To be meaningful, it will need measures and metrics – how will we know we have achieved this? But these are not impossible to imagine. I offered suggestions here in 2021. You could adapt the CBI’s measure of innovation pull-through ‘new to market new to firm inventions’ – as ‘new to frontline, new to service inventions’. This Review gives Parliament and media the chance to propose how this might be metricised, measured or judged, and hold the MOD to account either for delivering against them, or to propose its own. There are a lot of deadlines – any reader of Government documents will recognise how rare this is in HMG policy. It is a huge opportunity for Parliament, media and public alike to hold Government accountable for detailed delivery. In shouting down the Review, the reactionariat risk missing a generational and urgent opportunity to make us safer.

Personally, I would have liked to have seen a commitment to a 6-monthly independent review of progress vs the recommendations, submitted to the PM, Parliament, and available to public. That was what embedded Levene’s reforms, mostly for the worse in the long run, but the process teaches lessons: people respect what you inspect and value what you reward. I would have had named owners for each recommendation, and reported on their progress in those regular independent external inspection reports. Of course Parliament, media, commentariat could now start making the argument for this, to help those reformers on the inside doing the same, to push the system to reform. To improve our Defence debate. This was probably always beyond what the Reviewers could achieve. Neither this Govt, nor the last, or any in my lifetime would have agreed to that - it needs one under greater political pressure (where scrutiny now could help), or one with greater confidence, and/or one that viscerally believes in what this review tells them: war could be imminent, and we would likely lose. If you’re not steeped in this stuff, that is hard to believe.

Examples of how far-reaching and thorough-going this Review is run throughout, but are particularly notable in the chapters on Defence medical, education and training – again calling out shortfalls and failings clearly, with recommendations for how to get to a solution. A crucially important example – especially since everyone, reviewers and critics alike, seem to agree we aren’t spending enough, won’t spend enough, and are not ready to deter or fight a war - is the call for a proper plan for mass mobilisation as Sky’s Deb Haynes and I called for this time last year. How we prepare to rapidly increase our resilience, preparedness and ability to fight across the whole of society and government if war comes while our conventional armed forces are too weak to succeed or sustain in the fight is not just prudent planning, it is a part of deterrence too. Similarly, making defence infrastructure an operational consideration, and not one civilianised, lobotomised, and focused only on cost, efficiency and short-termism is a huge shift. The form of this Review is better - the citizens panels a neat innovation. The substance on a national conversation - expanding cadets - is good. None of these are things past Reviews have covered anywhere near as well.

Armed Forces

The recommendations on the RN & Army are radical and right. Worth reading these sections alone (short, digestible). The recommendations on the RAF are in my opinion dangerously inadequate. There is nothing on what the high-low mix should be. Little on where the RAF carries risk in current operations. Far too much scope for consent and evade. The diagnosis has either pulled punches or has been too superficial. We should remember that the notional independence of the Review makes this the Review Team’s failing, not the RAF’s. Since this Review offers only recommendations, it does not mean the RAF can’t still be clear-eyed in diagnosing and addressing its own challenges with the clarity, rigour and radicalism that is present in the Review’s section on the RAF’s sister Services.

The section on Space disappoints. The U.K. continues to radically underestimate its importance and centrality to our future prosperity and security. I don’t think the Reviewers, and almost anyone else, agrees with me on this. But I stand by it (it requires AI leadership to turn its potential for our relative power into reality). The review is good, within the parameters of the existing debate, where space is just the bit above the air domain, and the issues are access and control - not trade, industry, colonisation. This should be an area for the National Security Review. I am pretty certain they will fumble this too. It is too far outside the Overton Window, and would blow the mind of a Treasury trying to scrape money together for social spending. We’ll all be much poorer and less secure for this in 10-years time, but that kind of thinking and planning is hard in Whitehall. For Defence to contribute to the debate and the capability we needed an independent Space Force as both I and others have argued.[3] A Space Force could argue for funding and develop expertise in the domain on equal footing with those representing the Air, Land, Maritime and Cyber domains. We still don’t have one. Not a surprise, but disappointing.

On Cyber-EM. This is complex. There is a continuity across cyber and EM (I prefer ‘electronic warfare’ - narrower, clearer). But they are also distinct. And they run through everything else. I think we need an independent Cyber Command to argue for resources as an equal to space, air, land, sea domains. But this doesn’t solve the EW problem. And there is no chance the Treasury, or the existing services would allow two new Commands. It was never a likely outcome. We probably have the least bad option here, and a big improvement on the status quo.

Info Ops for me is central (war is about coercing decisions) even a nuclear strike is an attempt to coerce a decision, ultimately for psychological effect, so too every bullet fired. It is a side note in the Cyber EM section. Again though, this is the orthodoxy. I am not a lone voice, and I think logic on our side. But it is an argument we have not yet won.

On intelligence – the Review is very good on calling out our weaknesses in Counter-Intelligence – which we must prioritise, indeed in which we need to be best in class, or everything else is undermined. I am less sure on the radical merging of the intelligence branches - I think this will kill off necessary specialisation. I also note that the Reviewers don’t call for the re-establishment of Defence Economic Intelligence (another thing I think is important, but not many talk about as an issue), and should have done. This is an era of geoeconomic competition. We have no economic intel function in HMG (killed in Defence Intelligence in the 1990s). Also necessary to support Defence Exports. Perhaps the National Security Review that Government has said will follow, will cover this. But I am not optimistic and I think the Defence Reviewers missed an opportunity here.

Reviewing the Review

A short way of seeing how much better this Review is than anything that has come before in my lifetime would be to print off the MOD’s 2020/21 Command Paper, and read them side-by-side. The former was bad at the time. It is shamed by this.

I think a fairer judgement than the many who are saying this is just ‘more of the same’ would be that the Review gives the MOD enough that with the structural reforms and a genuine SofS commitment to real implementation, it could be transformative.

Without SofS commitment, the dumbing down of language and stopping just short - i.e. stating the need for metrics mostly not specifying what they should be, not saying what should be cut, not offering a view on balance of crewed/uncrewed, on AGI timelines, specifying a need for kill web without the fuller “theory of winning” (op concept) leaves it largely impotent to address the threats we face. It can only describe and recommend.

The Reviewers were hamstrung by this being, as Matthew Savill has said ‘Schrödinger's Review’ neither independent and external, nor internal and owned by the MOD. Thus they have been prevented, I would guess, in going as far as they wanted. But it is clear that they have used what political power their position gave them to preserve much that is valuable.

The problem with the fake budgets, rhetoric, and vapourware, the noise around the Review, is that they sound exactly like ‘business as usual’ and everyone is acting accordingly, taking pot shots at a Review that seem to me to be frequently completely detached from what the Review actually says. Sometimes ‘control K’ in the SDR for whatever key word of criticism is being levelled can expose this. It isn’t hard to check.

The noise is at worst a political attempt to obfuscate what the Review actually says, e.g. explicitly and directly “business as usual is not an option” and which otherwise - even in its likely watered down form – exposes the gaps between years of rhetoric and reality too clearly, and in ways that are not advantageous for a Government that knows there are still not many votes in Defence. Even at best, if not deliberate, the noise results from a misreading of the Review’s contents. Whatever the reason, there is a real risk that the ‘annouceables’ obscure what the Review actually says, harming the debate the Review calls for before it has even started. Drowning out signal in noise. Leading the commentariat to discredit the Review, making association with it low status, ensuring that narrative dominance - the only way in which this Review can succeed, is never achieved.

My call to action is this: read the Review in detail, and celebrate it. Make association high status. Use it to push the recommendations further, to empower the reformers whether they be Minister’s, civil servants, or military. Theirs is a difficult fight, the review gives them material support. Allies - you - could help them win.

An effective Defence is, as the FT’s Janan Ganesh said of growth, something everyone in Britain wants, but, whenever a trade off is required ‘just not that much’. The Government, Secretary of State John Healey and his team in particular, are to be commended for creating an opportunity for a debate that might change that. Commissioning even a semi-independent Review was brave. In publishing and accepting the recommendations of what could be a radical review, the Prime Minister and his Government have created the space for and seeded a much better national conversation on just how much we want to spend on defence, vs the risk of our sons, daughters, mothers, fathers, friends and colleagues, coming back in body bags, being overrun in battles we fail to deter and are not fit to fight. What happens next matters hugely. I write this in the hope that reformers in and across Government, those in Parliament, media, and public alike can make the most of this most remarkable of Reviews. It must not be a return of what Cabinet Secretary Simon Case once called ‘the curse of the missed opportunity’. As the Review quotes one of its ‘Citizen Panellists’ on defence: [otherwise] ‘…it won’t be long before something more significant happens and we will think it should’ve been more of a priority.’

[published 0700; updated 09:33, 21:58, 4 June 2025]

[0.1] Added 2100, 4 June 2025. I am grateful for having this pointed out so I can correct the record: technically this is not quite right, in the SofS foreword he does talk about the ambition to get to 12 SSNs. But I offer two defences (1) my defence is of the ‘independent’ Review, the SofS foreword cannot, by definition be in anyway independent of the MOD; (2) in the foreword it is contextual and soft pedalled, not trumpeted to the point where it drowns out discussion of anything else, as happened when this information was provided ahead of the Review as ‘a response to it’ (https://www.gov.uk/government/news/uk-to-expand-submarine-programme-in-response-to-strategic-defence-review).

[1] ‘Warfighter’ should be excised from the lexicon. It is a hair’s breadth from ‘Warrior’ which is even worse – and I think misunderstands how military dominance is achieved. The Spartans were defeated by the better organised forces of Thebes. The Romans dominated the ‘warriors’ of Europe through superior organisation. The Vikings were routed not by lone berserkers but by the more centrally organised Anglo-Saxon levy at Stamford Bridge in 1066, where Harold Godwinson’s disciplined fyrd outmatched their hit-and-run tactics. In each case, it was organisation—training, supply, command and control—not individual heroics, that determined the outcome.

Picking out and trying to kill-off a personal bugbear that finds its way into the review: “war is fought across all five domains but people and the land they live on are at the heart of it”. This is a half-baked zombie idea, and should be murdered, cremated, buried and sealed in the earth’s core so I never encounter it again. It is the Army’s attempt to argue for the Land domain as the decisive and most important. Which is nonsense. You win wars by imposing costs, and potential future costs, that the enemy state or nation are unwilling to tolerate. You a coerce a decision. That’s it. There is no special premium on ‘the land people live on’. Similarly, the ongoing fetishisation of SF as ‘the tip of the spear’ needs to end. I think it infantilises the rest of the military and damages our overall effectiveness.

[1.1] e.g. see last year’s Liaison Committee report on Strategic Thinking in Whitehall, and multiple Committee reports before: 2011, Public Administration Select Committee, on the 2015 SDSR, on Syria, on the 2018 NSCR and Modernising Defence Programme

[2] I think abandoning the Indo-Pac tilt is an error. We are focusing on NATO, by which we really mean a land war in Europe most likely an attack on the Baltic states, just as Russia is bleeding itself white in Ukraine, it’s war economy hollowing the country out unsustainably, when we have two new powerful continental allies in the alliance, Sweden & Finland anchoring the Northern flank and now Article 5 committed to aid any member that is attacked. We are doing this when continental Europe’s Continental Powers - Poland in particular, but the Czech Republic, Romania (transforming into a serious military power on NATO’s eastern front), the Netherlands and Denmark are all rebuilding credible land forces. France, long prepared for expeditionary war, now preparing for high-intensity conflict on the continent - and most notably Germany too all rearming for this land war - continental powers building continental armies to fight continental wars [see my article: https://wavellroom.com/2022/04/06/ukraine-the-integrated-review-land-power-in-europe/]. In such a situation, a UK maritime, air, cyber and space commitment to NATO, with the flexibility to secure our maritime nations security and economic interests - and make a massive contribution to the defence of Europe. Labour’s politically driven ‘NATO first’ mantra, is interpreted by Whitehall as meaning they can forget ‘East of Suez’ again - an error when the deeper and more enduring threat and more likely war comes from and with China, and geoeconomic systemic competition (see IR 2021) from China threatens UK security at home and interests globally - esp in the Indo-Pacific, where the world’s largest middle class will soon be and where the world’s fastest growing economies are. I would have us plan not vs Russia with NATO but vs a range of very specific scenarios. Interested readers might see Murray & Knox’s ‘Dynamic of Military Revolutions’ (especially p. 192) for more on this last point: ‘…revolutions in military affairs have emerged from evolutionary problem-solving directed at specific operational and tactical issues in a specific theatre of war against a specific enemy. Successful innovators have always thought in terms of fighting wars against actual rather than hypothetical opponents, with actual capabilities, in pursuit of actual strategic and political objectives.’ [bold added, italics in original].

[3] Dolman, E.C., 2009. Victory through Air Space Power. Strategic Studies Quarterly, 14(2). Available at: https://www.airuniversity.af.edu/Portals/10/SSQ/documents/Volume-14_Issue-2/dolman.pdf [Accessed 10 March 2025].

Fletcher-Jones, C. (2024). In Favour of an Independent Royal Space Fleet: The Smuts Report and the Precedent of the Royal Air Force. The RUSI Journal, 169(1–2), 22–32. https://doi.org/10.1080/03071847.2024.2359405

Doctor Weirdlove or: “How I Stopped Worrying and Learned to Love Skynet”

Keith Dear — Fri, 09 May 2025 18:59:48 GMT

Figure 1. Stanley Kubrick's Dr Strangelove was a political satire, asking us to laugh at the logical extension of the doctrine of mutually assured destruction, to force consideration of what this meant in practice. In calling this article Dr Weirdlove, we acknowledge how insane some of this sounds. And yet it is the logical extension of the path we are on.

We’re worried. On the public evidence, current UK military understanding of artificial intelligence, both present capabilities and the trajectory of its development, is dangerously inadequate. The consequences of this misunderstanding are existential: for our service personnel on the future battlefield, for our armed forces in future combat, and for humanity itself.

The danger is twofold:

First, UK Defence radically underestimates current AI capabilities and the speed of their advance. If an adversary adopts AI-enabled decision-making and autonomous capabilities, and we don’t, we lose.

Second, UK Defence is not considering the implications for humanity of Artificial General Intelligence and war. Without solving AI alignment—the challenge of ensuring advanced AI reliably follows human intent—we risk handing power over humanity’s most lethal weapons to misaligned, super-intelligent machines.

We make three arguments, each logically building on the other.

1. AI-driven decision-making and autonomy are essential for our national security.
2. Embracing AI-driven decision-making necessarily involves Defence leaders in the broader AI alignment debate.
3. Therefore, Defence leaders must actively engage in this debate, clearly articulating the "Strangelovian" dilemma we face: the paradox of urgently needing AI-driven military capabilities despite the existential risks posed by AGI.

Caveat

Before we get into that, we want to be clear on what we are and aren’t saying in this article.

Firstly, we recognise the need for a major programme to ensure the UK can, to the use the USAF phrase ‘fly, fight and win’ in future warfare. We are not opposed to the investment in this essential capability.

Second, the RAF’s decision to run the podcast series and talk openly about its next generation fighter aircraft ‘Tempest’, the wider Future Combat Air System (FCAS) of which it is a part, and the international collaboration, the ‘Global Combat Air Programme (GCAP)’ that is to design, manufacture, and deliver a next-generation crewed combat aircraft, is an admirable commitment to transparency.

Third, those speaking on the podcast showed courage in speaking publicly, spoke well, in the difficult circumstances of being on the record in an informal environment, and are to be commended. Without such transparency, debate is almost impossible.

Fourth, the announcement of the RAF’s procurement of StormShroud, an autonomous support to crewed fighters (or ‘loyal wingman’) is a step in the right direction, to be celebrated, it does not change what is argued herein.

We hope to show why debate is necessary, but recognise we risk reducing the willingness to be open in the future. We hope that is not the case. In writing this post we wish to be clear that our concerns relate to the wider issues signalled in the Armed Forces’ understanding of Artificial Intelligence and autonomy, and are not aimed at individuals. We offer constructive criticism, genuine concern. We are seeking to play the ball, not the man.

Why pick on Tempest? Because we judge that the air environment is the domain which will be automated first, and because the long timeline of its development – 15 years before it enters service, ~45 years before it is expected to stop serving as the RAF’s frontline fighter – are well in excess of most estimates for when we reach AGI.

AI Capabilities

UK Defence currently misunderstands, and dangerously underestimates, the capability and trajectory of AI systems. A recent illustrative example: The Economist’s Shashank Joshi noted two concerning claims^[1] on the RAF’s otherwise excellent podcast^[2]:

1. AI today is just a ‘just a stochastic parrot, it is just advanced auto complete’^[3] … &
2. ‘We’re prepared for the time when AGI does catch-up.’

The first claim is heard widely in UK Defence, sounds smart, but isn’t. As Shashank noted on Bluesky, it is indicative of a ‘slightly impoverished view of frontier models that you see in defence & even parts of the intelligence world’. Commenting, Professor Ken Payne suggested this view reflects Defence’s ‘overconfidence, and signalling…expertise via knowing cynicism…’ rather than genuine understanding.^[4] Both Payne and Joshi are well connected and trusted commentators and experts within UK defence. They highlight that this is a bigger issue than a throwaway comment on a podcast.

The “stochastic parrot” concept originates from Emily Bender’s 2021 paper published over 12-months before the release of ChatGPT. Today the idea is a misleading caricature of AI models. LLM’s do not just parrot what they were trained on. Here’s one of the earliest academic deconstructions of what is wrong with that claim, from August 2022. Here’s Geoff Hinton, Demis Hassabis, Tomaso Poggio, Ilya Sutskever – leading experts in AI - discussing how AI can reason in analogies, and be creative (2024). Here a paper from 2024 showing evidence LLMs can conduct inductive out-of-context reasoning, make inferences beyond their training data. Similarly, mathematician Stephen Wolfram explains how language is a world-model, from which LLMs are likely deriving their output. The ‘stochastic parrot’ claim is a zombie idea, long dead, it walks on.

The problem isn't just a misunderstood technical detail. The problem is that this error compounds into the future. It is a misunderstanding of what is likely to happen and at what speed. It misunderstands the stakes at play and the odds.

Three recent examples of how quickly AI is advancing, and why it matters:

1. ‘Humanity’s Last Exam’ (January 2025): There is a new AI benchmark test called ‘Humanity’s Last Exam’ a selection of extremely difficult questions which AI was initially very poor at.^[5]But AI is improving (six times better within two months) and is expected surpass the test within 12-months.^[6] Imagine if you sat the hardest exam ever written, flunked it, and then were improving at a similar rate – how smart would you be in 2-years, in 3, in 5?
2. AI’s Persuasive Power (April 2025): This week, we learn via an unauthorised experiment on Reddit that AI is now performing better than humans in persuading people to change their minds. Six times better.
3. Rapid Task Scaling. AI’s capability to perform increasingly lengthy human-equivalent tasks has surged dramatically. The length of tasks, judged by how long it would take a human, that AI can now complete is increasing. In 2022 AI could complete software coding tasks that would have taken a human 30 seconds. In 2025 they can complete tasks that would have taken a human 1 hour. The pace of advance (2024-2025) has gone from doubling every 7 months, to doubling every 4 months. If you understand exponentials, you’ll get the claim that this means agents will be doing tasks it normally takes a human 1-month to do by mid-2027. As one of us noted in a talk this week, at this rate of improvement, by 2029, the end of this Government’s Parliamentary Term and the Trump Presidency, AI will be doing tasks that take humans 2000 hours labour - roughly equivalent to a human working year - doing this in days, hours, maybe even in minutes. By 2029. …FCAS is planned to come into service in 2040, and likely goes out of service ~2070.^[7]

Can those ‘straight lines on graphs’ that have given us exponential progress in AI just keep going? The answer seems likely to be yes, until at least 2030. A recent Epoch AI study suggests “2e29 FLOP training runs will likely be feasible by 2030.” That’s 200 quintillion FLOPS. If that doesn’t help you, over on Less Wrong they explain: “In other words, by 2030 it will be very likely possible to train models that exceed GPT-4 in scale to the same degree that GPT-4 exceeds GPT-2 in scale.² If pursued, we might see by the end of the decade advances in AI as drastic as the difference between the rudimentary text generation of GPT-2 in 2019 and the sophisticated problem-solving abilities of GPT-4 in 2023.” You’ve probably not really heard of GPT-2, it predates ChatGPT (GPT-3) by 3-years. It’s like the jump between Model T Ford, when you could still argue horses were better for lots of work tasks, and the cars of the 1980s, when no one sensible was making that argument. We are looking at a similar or greater advance again by 2030.

These are some of the latest indicators. This blog has covered where we are going and how quickly we are heading there, particularly in ‘Don’t Blink’ and ‘In Athena’s Arms’. On the Wavell Room, we have all argued independently that ‘the future is uncrewed’. We’re on a track to AGI, perhaps as soon as 2026, but with a median Metaculus estimate for ‘oracle’ AGI by 2028 (a year earlier than was forecast throughout 2024). Taken together, these examples starkly illustrate that AI is advancing at a pace Defence is failing to appreciate or prepare for adequately. To underestimate AI is to risk strategic defeat.

The AI Quarterback: Why Tempest Does Not Need Humans

Let’s explore the claim in the podcast that Tempest, the fighter aircraft at the centre of FCAS ‘definitely won’t be remote’, for which we read ‘won’t be fully autonomous’, and that there is therefore a necessity for a human ‘quarterback’ to coordinate autonomous and crewed platforms in attack and defence. We think this claim rests on multiple flawed assumptions, and misunderstands both technological progress and operational realties.

Why would it be impossible for Tempest to be remote? One answer is that any kind of electronic emission can be detected, and so you can’t have a datalink to the aircraft, as you do for current remotely piloted drones. There are unchanging physics aspects to this argument. The range of the radio link to control an aircraft is defined by the power of the emission. A long-range aircraft, essential to operate over enemy terrain, would need very high-power emissions, which makes it easier to detect.

But this smuggles in an assumption, that there has to be a datalink if there isn’t a human in the cockpit. That ignores the speed of technology development. If an AI is able to take inputs of sensor data, process them faster and respond more effectively than humans, it can be fully autonomous. If it is fully autonomous, it doesn’t need a data link.

The argument that a human ‘quarterback’ - the ‘player’ coordinating the attack - is a necessity also assumes the crewed ‘quarterback’ platform must remain in (relative) close proximity to its autonomous wingmen. It must be close to reduce emissions and avoid detection when coordinating the attack. If not close, then the human quarterback has the same datalink problem that the long range remote piloted aircraft had. Proximity enables low-powered or visual communications, reducing (though not eliminating) detectability. However, this constraint applies equally whether coordination is human or AI-led. Although fully "zero-emission" platforms may be unrealistic, AI-managed coordination would likely significantly minimise emissions, surpassing even low-emission human-led approaches (AI communication likely more efficient). Thus, human presence offers no inherent advantage in emission management or stealth—AI coordination can achieve equal or greater effectiveness with reduced proximity constraints.

A related argument is that you need a pilot because maybe electronic warfare – jamming, spoofing, frying of the aircraft electronic sensors and systems – would cause a fully autonomous fighter aircraft to drop out the sky. This is indefensible. Current fighter aircraft are ‘fly by wire’. To optimise for agility they are inherently unstable, and the only thing that keeps them in the air is the computer constantly adjusting the configuration of the aircraft’s control surfaces. If an adversary is capable of jamming, spoofing or frying the electronic systems of an autonomous fighter, then they are equally capable of incapacitating a crewed aircraft, rendering the presence of a human pilot irrelevant to survivability. The distinction between crewed and uncrewed aircraft in terms of vulnerability to electronic attack is therefore illusory, and the solution lies not in insisting on human pilots but in hardening aircraft electronics against such threats.

Then there is the argument that there are minimal to no trade-offs involved in including a pilot in the cockpit, so long as the aircraft, as is the case with Tempest, can be optionally crewed. Again this seems indefensible.

In all major wars, pilot loss has been more of a limiting factor on a nation’s ability to fight than aircraft loss. We can manufacture aircraft faster than we can train pilots. This was true of both RAF pilots in the Battle of Britain and Luftwaffe pilots in the Defence of the Reich. In the Ukraine war, shortages of pilots has limited both Russian and Ukrainian air operations – and been specifically cited as a reason for delaying providing Ukraine with F-16s.

Another argument is that pilot skill is irreplaceable. Yet today AI-driven drones have begun to outperform human pilots in complex, high-speed drone racing scenarios, dodging drones through woods and around complex obstacle courses at impossible speed with super-human agility, signalling a significant shift in the capabilities of autonomous flight. The USAF’s Venom programme sees autonomous ‘VISTA’ F-16s dogfighting (Basic Fighter Manoeuvres), something we were told in 2020 wouldn’t happen for ‘in excess of a decade, maybe two’. That estimate came from a USAF pilot who was a relative AI optimist, schooled in DARPA’s simulator-based AlphaDogFight trials, where the humans lost. As another of those defeated pilot’s admitted ‘“If I were to walk away from today saying I don’t trust the AI’s ability to perform fine motor movement and achieve kills, I’d have a lack of integrity…”.

The quarterback analogy overlooks another crucial ‘skill' limitation: humans have extremely limited capacity for simultaneous attention. This is why texting while driving is illegal—it multiplies crash risk by a factor of twenty-three. Coordinating multiple combat aircraft is vastly more complex than texting, demanding constant real-time awareness, quick decisions, and continuous communication across multiple platforms. Human pilots simply don’t have the cognitive bandwidth to fly their own aircraft effectively while simultaneously managing a swarm of others. Swarming technologies, multi-agent systems, can do this better, faster, more responsively. And surely after AlphaGoZero in Go (2017), AlphaStar in the game Starcraft II (2019) – which required breakthroughs in AI that could account for game theory, imperfect information, long-term planning, real-time, large action spaces, we don’t need to make the case that AI is superior in coordinating complex tactical plays? It’s a big bet against the odds to suggest you’ll need a human quarterback for their tactical skill, when Tempest is deployed in 15 years (2040), let alone in 45 (2070) when the aircraft is expected to still be operating.

Then there is the argument that aircraft capability isn’t majorly impacted by having a human in the cockpit. But the limitations this places on how many ‘g’ an aircraft can pull is a problem. Even marginal differences can determine whether an aircraft is taken out or not. The presence of a human pilot imposes physiological limits on an aircraft's manoeuvrability, particularly regarding g-force tolerance. While modern fighter jets are structurally capable of executing manoeuvres exceeding 9g, human pilots typically can withstand only up to 9g for short durations, even with the aid of g-suits designed to mitigate the effects of high acceleration forces. Exceeding this threshold risks G-LOC (g-force induced Loss Of Consciousness), compromising mission effectiveness and pilot safety. Unmanned aircraft, free from these human limitations, can exploit the full aerodynamic potential of their designs, allowing for more aggressive manoeuvres.

Maximum load factor (‘g’) is not the only important factor in air combat manoeuvrability that full autonomy aids. An AI would like hold an aircraft at the edge of its performance envelope much more effectively. A further transformation will come from the AI quarterback & autonomous systems’ ability to execute precisely coordinated, multi-aircraft tactics—complex three-dimensional manoeuvres performed with superhuman timing, precision, and strategic coordination. Uncrewed aircraft collaborating dynamically can preserve formation energy, manipulate adversaries into vulnerable positions, and optimise defensive manoeuvres to evade threats with accuracy no human team could manage.

Enhanced agility - individual and formation - can be crucial in combat scenarios, where even marginal improvements in manoeuvrability can significantly reduce the probability of being hit by enemy missiles: high-g manoeuvres in the end game can be the difference between the aircraft being destroyed or the missile being evaded. There’s a reason pilots and intelligence officers scrutinise ‘doghouse plots’ that illustrate an aircraft's performance envelope vs those of adversary aircraft and adversary missiles. It’s the difference between life and death. Mission success and mission failure.If having a pilot in the cockpit imposes even small limits on manoeuvrability, then the cost of having a human quarterback is reduced probability of mission success.

Then there is the question of the wisdom of a programme that takes 10-20 years to deliver a capability in an age of exponential technical progress. It was only 14 years ago that US Venture Capitalist Marc Andreesen argued ‘software was eating the world’ and it was edgy and futuristic – now everyone accepts this, and companies such as Anduril are challenging Lockheed and Boeing – while increasingly software driven drones are transforming warfare and rendering many capabilities obsolete. Twenty year-old assumptions are likely to be wrong, and costly – and they are likely to be more wrong 20-years in the future, than they were 20-years ago to today. Today progress is accelerating faster than ever, and yet still poised to leap to even greater speed. We are on the brink of Dario Amodei’s “…‘compressed 21^st century’ the idea that after powerful AI is developed…[Amodei suggests in 1-3 years]…we will in a few years make all the progress in biology and medicine that we would have made in the whole of the 21^st century.” We think Amodei chose biology and medicine as fields that are media palatable – this would be just as true across the sciences, and thus weapons and military equipment development.

Is it safe to bet on a human ‘quarterback’ in an aircraft in 2040, and change later if that turns out to be wrong? No, there are consequences. Getting this bet wrong, means spending money on developing a platform that will be rendered obsolete at some point prior to its deployment in 15-years' time, or within the next 45 years between now and the aircraft’s projected end of service life.

The risk is that we are wasting money investing in the wrong things, and will waste more in the future, since changing designs is a major cause of MOD cost over-runs.

Then there is the issue of opportunity costs. The MOD has budgeted £12bn over ten years for FCAS. Deepmind costs Google between ~£164M-£477M a year, or ~£10M per study.^[8] Imagine what we could do if we invested similarly well.

AGI, Defence & The Future of Warfare

The claim made on the podcast —“We’re prepared for the time when AGI catches up”— implies Defence possesses concrete plans and capabilities for effectively managing advanced, human- or beyond human- level artificial intelligence – either our own, or that deployed by an ally or adversary. Yet the evidence presented thus far suggests the opposite.

The current Defence position—that by 2040, a human operator in an aircraft like Tempest will outperform AI in processing data, coordinating multiple platforms, situational awareness, and rapid decision-making—demonstrates precisely how unprepared we are. As shown earlier, AI systems already equal or surpass human performance in complex strategic tasks, and the trajectory of improvement strongly favours AI dominance. Even if human pilots retain marginal advantages today in certain areas, these advantages are diminishing rapidly, soon to disappear entirely.

The claim thus resolves to, in 2040, UK defence believes it is very likely that a human systems operator will outperform an AI in processing data, and controlling subordinate platforms, a human operator will have better situational awareness, and be able to react faster.

On any reasonable appraisal, as we have shown, this seems overwhelmingly unlikely: indeed, given AI’s current performance in air-to-air combat and strategy relative to humans, it may not be true today:processing and analysis of data, control of networks and speed of reactions are measures where humans do worse.

Let’s push this logic further. If we hold the platform and support constant, which dominates:

1. Human-only control, or
2. Human-plus-AI control, or
3. AI-only control

Whichever proposition of the above is true will win. Tempest is a bet on the second. All the evidence to date suggests that not long after human-machine teaming wins, the AI beats the human machine team. Our planning is stuck short of the logical conclusion of its own assumptions.

Defence might understandably prefer retaining human pilots from a moral cultural or aesthetic preference for a pilot in the cockpit. But what matters is whether your adversaries share your preferences. If they don’t, as soon as one adversary picks truth over beauty, you must change, or you will lose. It is that simple. Some might object that adoption will be uneven. This is true and also irrelevant. The moment any credible power demonstrates a battlefield advantage via AI, the pressure to replicate it becomes existential. The delay is strategic suicide, not insulation. Unlike many previous candidates, this is a true horse-to-tank moment, except an order-of-magnitude more so.

Which one will be dominant in the 2040s cannot of course be known. But humans aren’t getting significantly better and artificial intelligences are. Why do you believe humans will win this race?

The implications extend well beyond fighter aircraft. The AI frontier will be jagged – we think air-to-air combat will be fully autonomous first, then air-to-ground, with other domains slower. Air warfare offers straightforward reward functions (seeing what’s there, shooting things down, blowing things up, moving things around), in *relatively* uncluttered environments (compared to ground and sea – surface and subsurface – warfare) – different capabilities will be automated at different time horizons. But unless the trajectory changes, what is true of fighter aircraft is true of every other military system. AI is likely to be better at operating all of them: there isn’t one domain or system where it is obvious that humans will retain decisive advantage, or any advantage at all.

AGI makes people advantage redundant everywhere, from pilots to operators, commanders, to intel staff. In doing so, it removes demographic inferiority – a nation with less well trained, well educated, or well-motivated troops, a nation with fewer people, is no longer less able to defend itself.

Our argument then is that betting against these clearly visible trends—hoping humans will somehow maintain a meaningful edge—is dangerously naive. If Defence truly believes human superiority can persist, it must explicitly justify why human capabilities will indefinitely outperform rapidly improving AI systems. Moreover, it must convincingly refute or challenge the well-supported forecasts placing AGI’s arrival around 2028.

On the other hand, if Defence accepts the logic and evidence we have laid out, it should be racing towards our AI and autonomous future to fulfil its duty to protect the nation.

Humanity’s Last Exam: Existential Risk

We now make a larger claim. If you accept the logic of the argument so far, then our generation is sitting humanity’s final exam as a civilisation. To pass this exam, we must resolve at least one of the following critical issues:

Globally prevent further AI development.
Remove the possibility that advanced AI can access advanced weaponry or highly dangerous materials.
Eliminate war entirely as a political instrument.
Solve advanced AI alignment.

If you can’t do that, the logic takes you to a civilizational test for humanity: what happens when the optimal move is to hand over our most destructive artefacts to non-human intelligences?

Dystopian? Yes, we agree. So what are we going to do about it? Our options are clear:

1. Stop Developing Minds Smarter Than Humans. Humanity can stop developing powerful artificial intelligences. But that has to be all of it, any defection at all prevents this strategy from working. This seems very unlikely, and perhaps simply not possible.

2. Remove Tools That Smarter Minds Could Use Decisively. Accept the development of smarter-than human artificial intelligences but eliminate, globally, all advanced military technologies and crucially, the underpinning civilian technologies. In practice, anything that an AI could take control of and use against us. Again, it can’t just be done with pinky promises, it has to be done by everyone, everywhere with no defections. Misaligned AGI could theoretically weaponise almost anything - from disease to climate to the economy - against us, but clearly, handing it our most advanced military capabilities is an order of magnitude more immediate threat.

3. End War. Never go to war again, solve all problems by politics. Clearly, not happening.

4. Solve AI Alignment. Solve AI alignment so you don’t care if AI controls humanity’s most destructive weapons, because it will always do your/your elected representative/your elected AI leader’s, bidding. This might be done by luck – maybe AI always remains a tool of humans, aligned to us, despite being able to out-think us. Or, assuming this is possible, it might be done through deliberate and careful research and development to put in place regulation and guardrails etc. In any case, even if you think it unlikely (we have split views between us at Cassi), if it is not a zero chance, we still need to do it. After all, we insure against war by building a military, against all kinds of low-probability high impact risk in our day-to-day lives. This is what ‘prepared for the time when AGI does catch-up’ means in practice. Hope is not a strategy.

5. Cede Control of Destructive Systems to misaligned Non-Human Intelligences. Get killed by Skynet.

There aren’t actually any other options. That is your lot.

Can we please start taking this seriously?

So What?

We are not remotely ‘prepared for the time when AGI does catch-up’. Defence’s persistent underestimation and misunderstanding of AI is a national security risk, right now.

We need to address the issue of under-estimating, and insufficiently understanding, AI for the sake of the security of our nation.

Military leaders must recognise that their duty is twofold:

First, to decisively invest in and prepare for an AI-driven future of warfare;

Second, to candidly explain the Strangelovian escalation trap humanity faces. Achieving AI alignment is not primarily a problem to be solved by UK Defence – the excellent UK AI Security Institute should remain the lead for this. But nor is it an abstract concern that Defence can ignore—it is a strategic imperative that Defence’s leaders have a moral and ethical obligation to explain given their part in it. Failure to act and speak out now risks more than defeat; it risks existential disaster.

Subscribe now

^[1]

& Shashank Joshi on X: "RAF officer leading concepts/req for Tempest: "at the top of our list is the payload." Irrelevant how it gets to target, he says, but analysis shows fast jet better than cargo plane or sub. "Payload is what we're all about ... we're talking roughly double an F-35 A's worth..."" / X

^[2] Episode 14 – Meeting UK operational needs | Future Horizons: The Tempest Podcast (it is all worth a listening, but the most relevant sections are at the start and then from c.40 minutes in).

^[3] Ignoring that for a sufficient level of intelligence or skill, everything is auto-complete…

^[4]

^[5] And I am not sure any actual human could individually do well on it: 2500 challenging questions on a 100 different subjects.

^[6] Humanity's Last Exam

^[7] This footnote is for the nerdiest of technical pedantry. The measuring how “general” AI is at solving tasks is a very difficult challenge. You cannot test on anything a human might be asked to do, that’s an infinite test. What represents a good cross section of all the tasks you would expect a human to be able to do? Which human? In how long? How do you measure success? There are a million confounding variables. METR took on the approach by using length of time it normally takes a human as a proxy for task complexity. Does that mean that ‘being a pilot’ and ‘running a tactical battle’ was in that list? - No. Does that mean you can ignore the work? Not sensibly if you want to understand how generality in AI is improving.

^[8] This (2017) article https://qz.com/1095833/how-much-googles-deepmind-ai-research-costs-goog suggests costs of £164M, this 2020 article https://www.cnbc.com/2020/12/17/deepmind-lost-649-million-and-alphabet-waived-a-1point5-billion-debt-.html buts the figure at £477M, individual papers/studies are said (2024) to cost ~$13M, calculated here https://www.reddit.com/r/MachineLearning/comments/1ej5h4b/d_calculating_the_cost_of_a_google_deepmind_paper/

Four And a Half Bad Days In Cyberspace

Keith Dear — Fri, 14 Mar 2025 07:01:51 GMT

[Updated, 16 March 2025. 2209hrs: new Endnote]

Image Credit: LIG Nex1. Available here: https://biz.chosun.com/en/en-industry/2025/03/05/4H7MSTWCOJHYZHJAEKENPJWB7I/

Subscribe now

This was the most innovative military exercise I have participated in or witnessed in 22 years of service. Defence Cyber Marvel’s fourth iteration, DCM4, took place in Seoul recently and I had the privilege to attend. If you just want to read about the exercise, skip the intro, and jump to the section headed DCM 4. But first, here’s why I think it matters…

‘The rainbow of colors in the window paints how everything went so wrong, so fast. The water in the Potomac still has that red tint from when the treatment plants upstream were hacked, their automated systems tricked into flushing out the wrong mix of chemicals.

By comparison, the water in the Lincoln Memorial Reflecting Pool has a purple glint to it.

They’ve pumped out the floodwaters that covered Washington’s low-lying areas after the region’s reservoirs were hit in a cascade of sensor hacks. But the surge left behind an oily sludge that will linger for who knows how long. That’s what you get from deciding in the 18th century to put your capital city in low-lying swampland and then in the 21st century wiring up all its infrastructure to an insecure network.

All around the Mall you can see the black smudges of the delivery drones and air taxis that were remotely hijacked to crash into crowds of innocents like fiery meteors. And in the open spaces and parks beyond, tiny dots of bright colors smear together like some kind of tragic pointillist painting. These are the camping tents and makeshift shelters of the refugees who fled the toxic railroad accident caused by the control system failure in Baltimore.

FEMA says it’s safe to go back, now that the chemical cloud has dissipated. But with all the churn and disinfo on social media, no one knows who or what to trust. Last night, the orange of their campfires was like a vigil of the obstinate, waiting for everything to just return to the way it was. But it won’t.’

I remember reading this, August Cole & Peter Singer’s evocative “FICINT” (Fictional Intelligence) opening to the US’ Cyber Solarium report, in Downing Street in 2020. A few years before, I’d read August Cole’s brilliant short story Underbelly, that helps us explore the direct military effects of a cyberattack on the UK. It is no less alarming, resulting in a distributed HQ led by a Special Forces Colonel ‘Fessenden’, having to operate clandestinely, out of a rural pub pulling together the talents needed, with all major bases and MOD cyber networks shut down due to cyber infrastructural attacks. If you can, do read it.

In 2020 I was with our No10 Integrated Review team at a meeting with the Defence Secretary and senior Defence and National Security Leaders. One Treasury official asked what it would mean if we cut the defence budget for aircraft carriers and reassigned it to offensive and defensive cyber. Might we be a more dangerous opponent? Better able to deter, and safer at home? Might spill-over benefits to the wider economy be greater? More nationally resilient?

I don’t think this was a serious proposition, but a challenge – a provocation designed to force people to think. And it is one I have often thought about since. I am willing to bet it has never been seriously and fully explored in the UK.

For me, personally, on issues like the aircraft carriers, FCAS/Tempest, Challenger 3 tanks and all the other totemic and high-end kit that comes up for regular scrutiny, my primary interest is in how we make the decisions on force structures and balance, not what specifically we should have.^[1]

Cyber Power debates in Whitehall, like those on Space Power~~^[2]~~, are driven by fundamentally misaligned incentives. Just as air power needed an independent champion to compete equally for resources – the purpose the creation of the RAF served – so cyber and space will never get to make their arguments with full force for as long as they have to make their case through the RAF, Army, RN, or Strategic Command, (at least while Strategic Command is run by officers whose careers depend, in part, on keeping their parent services content).

In fact, for cyber the situation is worse in many ways. Not only does it have to compete with the other military domains of war but it also has to compete for resources with GCHQ. GCHQ’s mission is primarily espionage. As a civilian led and run organisation, it cannot use lethal force, and so cyber-attacks that might cause destruction, death or injury must be carried out by military personnel. But GCHQ is funded by the Foreign Office, not the MOD. The FCDO does not want to hand the MOD an argument taking funding away from espionage to redirect it to military cyber defence and offence.1 Thus cyber, in the sense that August Cole was talking about, never gets to air its arguments for why it should be funded ahead of aircraft carriers, tanks, FCAS, and espionage etc. Our Treasury official’s question remains largely unanswered and unexplored. I don’t think we know how vulnerable we really are. Nor do I think we truly know how much we could do with offensive cyber capabilities if they were fully funded and had an independent champion. August Cole’s cyber FICINT helps us see how we might first learn about our vulnerabilities.

DCM4 is helping ensure our ‘Fessenden’ will be able to find the people he needs to fight back.

Defence Cyber Marvel 4 (DCM 4).

A large hotel in Seoul. Down lifts and up escalators. Into a labyrinthine complex of a large open plan convention centre and surrounding smaller convention rooms and offices. Innovation everywhere. I am shown around this multinational exercise with other visitors, shown what can be done at the unclassified level - the briefings I am given are given to local media, visiting multinational dignitaries, academics, business and industry representatives alike. The capabilities we see are more remarkable for the fact that they are in the public domain.

In one corner a group of soldiers from 2 PARA discuss how they have put together a number of electronic warfare sensors from commercially available hardware. They bought parts on Amazon, on a shoestring budget, with a focus on Counter UAS. They, and Royal Signals soldiers present say these devices are perhaps more effective than those they would currently have access to were they to deploy with the officially provided EW support equipment of the Light Electronic Warfare Teams (LEWT).

Whether the kit itself is better or not isn’t really the point though. As they note, the creation of these devices would enable a proliferation of EW and ISR across the battlefield that is not possible with the current force structures where these capabilities are in high demand and insufficient quantity. The Ukraine war has at times been referred to as the ’24 hour’ war, where technological advancements are only effective for a short period before being countered. Creating a culture in which innovative soldiers can continually develop and refine technological advantage themselves will enable us to maintain the initiative.

It makes me think of the culture inculcated in the Israeli Defence Forces, one of Ein Beira, or ‘no choice’; of tinkering and trying and taking risk, modifying kit, solving problems. It sits in damning contrast to the process of building under maximal regulation, assurance, validation and verification. This has long suggested to me that we have unintentionally killed the tinkering innovation spirit in our armed forces outside these bands of dynamic volunteers and novel thinkers. We have built a culture where the soldier, sailor, airmen and women are expected to passively wait until the kit he or she needs is handed to them with a training manual. As Bernard Schriever wrote of the US DoD in the 1960: “the system now runs the people”. DCM 4, it seems to me, is an antidote. And the PARAs seem to be thriving with having been given permission to innovate, to test and try, in hardware and software – not things we traditionally think of as Parachute Regiment specialities, and a reminder of how capable so many in our Armed Forces are.

In another corner, a young RAF Officer speaks passionately about how much she is learning, how she, and over 1000 participants from 60 different organisations, including 27 nations working in 20 worldwide locations, are mostly volunteers. All are self-taught. They have given up evenings and weekends to learn how to become hackers – they did this outside of military time. Now she is leading a team of Brits from the RAF and Americans, competing against teams from other services and other countries; and also, fascinatingly, other UK Government Departments. The Ministry of Justice has a team here. The NHS too. The Met Police and National Crime Agency. There are teams from across the UK military, though Army personnel and units dominate, since it is primarily the Army, through the Army Cyber Association, that organise the Exercise. Here too are representatives of most of the largest tech firms both globally and in the UK. All compete equally. The strongest performers are not always the teams you’d expect. But the Exercise organisers protect everyone, keeping results - bar those of the teams that win -between them and the teams. The focus is on winning, but also on learning from failure, so any embarrassment must be minimised.

Four and Half Bad Days in Cyberspace

The teams spend 10-months organising and preparing for what they call 4.5 bad days in Cyber Space. They are organised into small rapid reaction teams on the ‘Blue Team’ side. They are tasked with defending unfamiliar networks and systems – from Enterprise IT systems to sophisticated industrial control systems and transport networks on which nations are dependent. There is a large ‘Red Team’, ramping up attacks on them that Blue Teams must defend against, growing in intensity and sophistication as the exercise goes on. A ‘White Team’ provides quantified and objective feedback on performance. There’s a leaderboard. A ‘Green Team’ builds the safe training area competing teams operate in before then defending the whole exercise from real-world cyber-attacks and espionage from outsiders. There are numerous ‘side quests’ – innovation experiments in which the Paras were engaged.

The Green Team is led by a young Army officer. She describes how her all-volunteer team has spent the past 10 months designing and building on the NATO “Cyber range”, essentially a server rack in Estonia within which they have ‘Deployed Infrastructure as Code’ – which is to say they have systems that exist electronically exactly, or as close as possible, to their form in the real-world. Now it is built her team spends their time keeping the network operational while watching for suspicious traffic on the network. If anything looks like it shouldn’t be there, they train to both defend and attack.

The Cyber Range shows how satellites can be hacked, and how not to make this obvious. For example, turning a satellite’s solar power arrays to face away from the sun, leading to a gradual loss of power and functionality that isn’t obviously a cyber-attack. There are lots of ways this might happen – so harder to detect and very effective. They can get a weather camera to misreport, leading to the satellite operators using it at times when the weather makes it less effective, or to choose not to use it when the weather is optimal, because they believe its view occluded by weather [reminder, this is all public domain: see also the DoD’s GitHub ‘How To’ Guide. on satellite hacking, or this online hack-a-sat competition to hack satellites].

The Cyber Range has vulnerabilities in industrial control systems and simulated military command and control networks. They have ‘real systems’ hardware that you might find in a hospital, or alongside train tracks, and players can try to hack it. Sometimes these are linked to wider simulations of, for example, a whole hospital, or a wider power grid. Sometimes digital targets are linked to models – for example, an aircraft carrier or a Hornby-like train set – so you can see the effects of their being hacked on the scale model or the wider simulated system. Sometimes this can find real-world vulnerabilities, something that the exercise organisers are careful to ensure are reported correctly, and knowledge of which is kept contained. The whole team tests and learns while teaching others on the exercise. It is hard not be impressed.

Goliath, an example of the cyber infrastructure used, albeit it stayed in the UK due to size, and was remotely accessed from South Korea. Image Credit: Dan Humphries 2025, also the builder of Goliath!

In another corner, a team is hacking a commercial satellite – a real one – hired for this purpose by the exercise organisers. The satellite is, in the exercise, an adversary’s surveillance satellite. The team on this side quest insert malware so they can watch the enemy through their own surveillance satellite.

Another side quest: a team set to sinking an aircraft carrier – fortunately not a real one this time. The team first gained access to the ships GPS, making the crew uncertain as to its location. They make the ship ‘think’ and report that it was many miles from its real position. Next, the team hack/access the control systems, sending its engines into overdrive and send the ship circling on full power. Turning at full speed causes the ship to list hard. Next, the team, deep inside the ship’s systems, order its ballast system to shift weight rapidly within the hull. Combined with the listing, the carrier passes the point of no return, rolls, and sinks.

There’s an AI Defence Challenge – prompt injection attacks lead another AI to churn out wrong answers, subtly or obviously, or to generate misinformation, or propaganda. There’s drone hacking and spoofing. Teams from all over the world have trained LLMs to play poker. That might not sound like the fare of war, but it has very relevant attributes: incomplete information; uncertainty; a mix of objective and subjective probabilistic assessments – what cards might my opponents have and what might they do?; recursion – what does my opponent think I might have and do, what do they think that I think that they might do, and so on; and of course the staple of the nature of war – chance^[3]. The poker competition was won by the Japanese Maritime Self-Defence Forces (the Japanese Navy). The point here is to begin experimenting with AI decision systems. The first time I have seen this get serious attention on a military exercise, even if it is as a side-quest and game.

I'm told that at last year’s DCM3 the exercise incorporated quantum computing, running an air defence scenario, and using quantum (presumably annealing/combinatorics) for ISR planning & execution, and logistics planning optimisation. This year some early experimentation with quantum radio frequency detection has been seen, something to be expanded on in future exercises.

I head to another stand where a rifleman describes his Rifles’ Cyber Teams and the experimentation they are doing supporting tactical actions with cyber support. Inspired by DCM – they are now using Notebook LLM for briefings and analysis, and Google Threat Detection as part of their workflows, with aspirations to widen the use of such tools from tactical experimentation into their headquarters processes.

Across the exercise there are competing priorities for the Blue teams. Keep the hospital system up and running to save lives or shut it down to protect patient data integrity, knowing that if someone gets into your patient data, you might never be able to trust it again? Treatment itself becomes a threat, if you don’t know your patient’s medical history. Teams are constantly called on to prioritise. The attacks they face include new, novel, cutting edge techniques – perhaps something seen just once or twice globally before – exposing the defenders to these emerging threats to ensure they are ready to face them in the real-world should they come calling.

One of the insights I leave with is that a truly intense cyberattack on the UK, or the UK military, is not something we can defend in the way we might try to hold a frontline, or how we might achieve complete control of the air. However long a border in the physical battlespace, the attack surface in cyberspace is simply vast in comparison. As a result, in a cyberwar, we will be hurt – a successful defence can only reduce harm. Maybe the waters swirling around Washington, the crashed delivery drones, the displaced military headquarters operating from a pub – all these ‘defeats’ in August Cole’s dystopian stories were in fact the losses we accepted, as we kept incubators in neonatal wards running, and defended our nuclear power stations and submarines from cyber-attack.

Destroy & Degrade. Reconstitute and Rebuild

The exercise’s organisers are proud of what they’ve built, and rightly so. As they say, you can’t build an effective Navy by staying in the harbour. DCM 4 gets UK military talent preparing for the wars of the future. The breadth of the teams entering, from across industry, defence, public sector and across countries, makes the training unique and the connections for all parties hugely valuable. These are the people they would fight alongside in a real ‘cyber war’.

DCM aims to increase the skills of everyone involved, and succeeds brilliantly. The teams defend systems and attack them, seeking to destroy, degrade, reconstitute and rebuild them. Everyone involved says there is nothing else like it. It succeeds in part because it is not a formal military exercise. Run by the Army Cyber Association, - set up by soldiers who wanted to enjoy cyber similarly to way the British Army funds others to row the Atlantic - and all by volunteers, it has freedoms that would never be granted to a formal exercise. It can be less deferential to organisational fiefdoms – wrapping cyber offence, defence, civ, mil, industry, espionage and international participation without having to do quite so much politicking across those misaligned incentives.

In DCM4 there are echoes of the early innovative days of the RAF, which I wrote about for the Wavell Room in an article they titled: Saving the Royal Air Force. That article described how private contributions, media campaigns, contests and clubs were instrumental to building the capabilities the RAF would later need, creating many of them before it existed, allowing the nascent RAF to harness them rather than having to build them from scratch. Furthermore, contributions, campaigns, contests and clubs created groups who could pressure Whitehall and the Ministry of Defence to act on Air Power, giving them the evidence, showing how powerful it could be, and building public pressure for proper investment. In an age when budgets are even more tightly constrained than they were in the 1930s – with welfare bills and health spending more tightly tying the hands of the state - a similar approach – using clubs and media campaigns, contests and private contributions on cyber, space, electronic warfare and uncrewed systems might be the only way to prepare. The incentives in the system to go slow and under-allocate on all these vital areas are just too strong.

The intrinsic motivation of all involved to improve their skills and put on a world-class exercise is why it works. Formalise it and you’d kill that intrinsic motivation, get rid of the freedoms and the innovation would go.

On the other hand, the exercise as-is needs some formal approvals from within the MOD if is to be allowed to go ahead. As it grows, these are becoming more challenging to win. It needs approvals for cyber reserves that wish to participate to mobilise and join, it needs approvals to win budget whether as a ‘sport’ like rowing the Atlantic, or some other capacity. Likewise if it is to be able to get the soldiers, sailors and RAF personnel released from their day jobs to join as volunteers, it needs some official standing. Partly because of its success – it’s hard to compare it now to a small team rowing the Atlantic to justify funding and approval – and in part because it is so innovative - these can be hard to win. It doesn’t fit neatly into boxes – it isn’t a formal exercise and yet it is vast. It hasn’t been through the normal processes of training needs analysis and multilevel authorisations, sign-offs and so on. These things would likely kill it, but they are in place for a reason. It seems unlikely the exercise can continue as it is. The organisers seem genuinely worried as to whether DCM5 will be approved in time to be put it on next year, or at all.

Yet this year’s Exercise was put on with the full support of the South Korean Government. It attracted Foreign Office funding. Far more people applied to participate than the Exercise could accommodate.

Perhaps the lesson is to fund DCM through its fifth and sixth incarnations, but not seek to control it. Meanwhile to take inspiration from it in the formal exercise programme, and seek to match or exceed the levels of innovation it achieves. If this can be done, the organisers would likely welcome the chance to handover to the big green machine of military bureaucracy, knowing the fire of innovation they have lit is spreading, and being enabled to burn brighter and longer, wider. But it should be for them to decide the time to do this, when they judge the flame has been passed on, and they need no longer keep their torch burning.

Until then, Defence must find a way to ensure DCM4 continues to thrive as DCM5, DCM6, largely as-is. I don’t want to make this seem simple. All those rules and regulations and controls are in place for a reason. But find a way we must. It will need top-down leadership to succeed – it is in the middle-ranks of the military that the flame of innovation dies. Not because the people are bad, but because usually everyone with responsibility over some aspect or another of an exercise has to say yes, and only one person has to say no. Because if DCM goes well, those middle-level approvers will get no credit. But if it goes wrong, they will get the blame. And because absent a Cyber Service looking to drive the capability to excel, to win resources in the next SDR, there is no champion at the top without some significantly conflicted incentives.

For me, personally, I was extremely proud of to see UK Defence being as innovative as everyone knows it can be, but rarely is. To hear the passion of those involved, and watch them pushing boundaries and better preparing us for the future, was genuinely inspiring, and hopeful.

The exercise felt to me like it was an incarnation of ‘the Science Super Power’ agenda and the Indo-Pacific tilt we wrote about in the 2021 Integrated Review. But it was also explicitly inspired, it’s organisers tell me, by General Richard Barrons’ Warfare in the Information Age (WITIA) paper, published in 2016, but originally written and circulated as his internal paper and idea, in the years prior. That paper called out the growing threats and opportunities cyber warfare was creating, as a capability in its own right, but also, crucially, showed the need to integrate cyber warfare in everything we do.

Watching and listening to the personnel on this exercise, I think we have found the generation who can usher in the vision provided in General Barrons’ paper, make it a reality. But, if they are to succeed, Defence Cyber Marvel and its volunteers will need more support to overcome the innovation-stifling institutional pressures, the misaligned incentives, that have prevented us getting there in the ~15 years since the vision for Warfare in the Information Age, was first sketched out.

nb. with apologies to those that were kind enough to speak to me during my visit: I have chosen not to name anyone for security. I am, however, hugely grateful to you all for your time, insights, and commitment to excellence. You really were genuinely inspiring.

^[1] This is not to pretend I don’t have views, but I feel much more strongly about the need to improve how we plan and build for the future than any individual kit decision – debates around which are nearly always narrowly single-service dominated, dominated by discussion of what was, and to lesser extent what is, rather than what-will be, and selective in both the evidence and logic used to defend or attack them. I find these debates mostly unproductive and distracting.

^[2] Dolman, E.C., 2009. Victory through ~~Air~~ Space Power. Strategic Studies Quarterly, 14(2). Available at: https://www.airuniversity.af.edu/Portals/10/SSQ/documents/Volume-14_Issue-2/dolman.pdf [Accessed 10 March 2025].

Fletcher-Jones, C. (2024). In Favour of an Independent Royal Space Fleet: The Smuts Report and the Precedent of the Royal Air Force. The RUSI Journal, 169(1–2), 22–32. https://doi.org/10.1080/03071847.2024.2359405

^[3] AI is already super-human at poker see Libratus & Pluribus, and like in chess, AI is changing the way humans play the game, with a growing dependency on machines for training and in competition.

New Endnote. 12-03-25.

To be more correct here, in response to helpful constructive criticism:

GCHQ is funded via the Single Intelligence Account [see latest - Cabinet Office Corporate Report 2023-24, Security and Intelligence Agencies Financial Statement 2023-24]. It states:

“The Secretary of State for Foreign, Commonwealth and Development Affairs is the responsible Secretary of State for SIS and GCHQ^{[footnote 2]} and the Secretary of State for the Home Office for MI5^{[footnote 3]} The Agencies ensure that the appropriate Secretary of State is briefed on matters that could become the subject of Parliamentary or public interest and on issues, which they need to be aware of in discharging their wider Ministerial responsibilities. There are well-established arrangements for seeking Ministerial clearance for operations when required.
In line with the responsibility assigned to AOs in Managing Public Money the Principal Accounting Officer (PAO) acts to ensure that the SIA operates effectively and efficiently in support of national security policies, aims and objectives. The Deputy National Security Advisor (DNSA) Matthew Collins was delegated as Temporary Acting Principal Accounting Officer for the SIA given the Cabinet Secretary’s medical leave between October and December 2023. The Heads of the Agencies are each AOs in their own right, with delegated authority from the PAO.”

The point remains unchanged, ‘cyber’ as a capability and domain* must compete not only with the the Army, Navy, and Air Force for funding within the MOD, but also with GCHQ. GCHQ has other budgetary champions in Whitehall - where the Foreign Secretary’s responsibilities, as described above, usually make him/her the main lobbyist for GCHQ in Spending Reviews, Defence and National Security Reviews etc.

There are other senior champions for GCHQ, as can be seen in the latest corporate report. Altogether we can see that these are:

The Secretary of State for Foreign, Commonwealth and Development Affairs;
the Cabinet Secretary;
the National Security Advisor (or his Deputy at present, since the NSA is a Spad not a Civil Servant);
the Home Secretary [since the National Cyber Security Centre, which solely does Cyber Defence but does is funded through GCHQ. As the GCHQ’s website states “The UK's cyber security mission is led by the National Cyber Security Centre (NCSC), which is a part of GCHQ.” Which always amuses me because you could delete ‘which is a part of’ and insert ‘which is subordinate to’ and then you’d have a clearer statement of why this needs to be written at the top of the page. In any case, the Home Secretary, for whom making sure nothing awful happens in the UK is the main job, is therefore also a champion for GCHQ, if she/he wants funds to flow down to NCSC rather than to some competing priority elsewhere in Whitehall].

In addition to which, as the SIA financial statements put it the …Agency Heads have a statutory duty to provide annual reports on the work of the Agencies directly to the Prime Minister… giving GCHQ direct access to the Prime Minister to argue for more funds.**

Given all this strengthens the point I am making, rather than weakens it, I haven’t reworded the main article.

*whether it is a domain is a whole other important debate, but I am drawing the line here because (a) for practical purposes in the context of my argument I think it does not matter and (b) I will end up with endnotes on endnotes.

**in adding this new Endnote, I feel like I also add a tone overly critical of GCHQ - I do not intend this. The point is just to be more technically correct in support of my point that ‘cyber’ as an offensive and defensive capability, lacks a truly independent champion of the standing of the Chiefs of Royal Navy, Army and Royal Air Force, and also has to compete with an organisation focused primarily on intelligence gathering.

Education, Education, Education?

Keith Dear — Wed, 12 Feb 2025 05:01:35 GMT

AI Disruption

It’s Happening. The AI revolution is well underway. Most likely it will accelerate both in rate and scope. We must consider what to do in the worst-case scenario that AI is a true substitute for human cognition, a subject we have discussed several times in this blog.

But also, we need a plan to mitigate the radically disruptive effects of AI even if no further advancements were made. The adoption of existing AI technologies alone would have profound economic and social implications. We propose using AI to radically improve how humans learn. We make the national case here - but it is as relevant to all businesses, organisations and us all, as individuals.

The Case for Optimism

While we know a lot of facts about the brain and human psychology, we don’t really know how either ‘work’. We are not sure what consciousness is. Intelligence is notoriously hard to define. Even reasoning, what constitutes knowledge or understanding, all are uncertain, contested concepts. Under such circumstances, we keep coming up with new benchmarks for testing AI in the way we have tested human intelligence, and increasingly with tests no normal human, and no deeply expert human, would pass. And AI keeps not just passing these tests but surpassing them. Hence the bearish case must be planned for.

But if we don’t know what human intelligence is or how it works, we cannot say that human intelligence won’t continue to play an important role in the economy, in making decisions in the future. It might, even if it is hard to say with any confidence how.

If we assume this optimistic case, AI will not replace all humans. AI becomes a complement to human cognitive labour. We work with it, and it creates new jobs.

Still, many will be displaced from their current roles. We can already see that lots of things we do now will be done better by AI, and could be done better by AI now, even if not all things. How do we plan for the scenario of massive displacement?

There are two key challenges: mitigating large-scale social disruption and ensuring that displaced workers transition into new, productive roles as quickly as possible.

Re-training will be essential. The likelihood that the skills needed in an AI-driven world match those of the pre-AI era is effectively zero. The UK, or any nation that enables large-scale, rapid retraining, stands to gain significant advantages. However, this process is costly and difficult, requiring substantial human and material resources at a time when those resources are increasingly valuable.

Education and AI

Education is a solved problem in principle—humanity knows how knowledge transfer works. The issue lies in implementation: mass-learning is both pedagogically suboptimal and logistically inefficient. The core challenges are:

· A teacher cannot dynamically assess every student's exact knowledge state in real-time.
· Students themselves struggle to identify their precise learning needs.
· Traditional education incurs high logistical costs and opportunity costs.
· The sheer order of magnitude of change will be disconcerting, as we are culturally accustomed to large-class learning rather than individualized tuition.

AI-assisted online learning can address these problems. AI tutors approximate individualized instruction, while online learning significantly reduces costs. The key lies in breaking down knowledge into minimal, structured chunks, providing explanations, practice, instant feedback, and spaced repetition—each of which is a well-understood technique.

AI is uniquely suited to serve as a 1:1 tutor, providing not just instant feedback but also dynamically generating new materials, conducting low-stakes testing, and evaluating progress in near real-time. This allows students to learn at their own pace, maximizing retention and efficiency. Students would have access to tutors with a far higher IQ and expertise than humans can provide. The human average on an IQ test is ~100. Current AI models have surpassed this. They will only improve from here.

Despite this, this approach does not replace teachers but redefines their role. Educators would focus on facilitating hands-on practice, coaching, and higher-order skills like critical thinking, evaluating AI-outputs, and communication. For many adult and adolescent learners, direct contact time with teachers may decrease, but learning efficiency would increase.

Image from @highlyretired on X, 4 Feb 2025.

The Present

There are a growing number of examples where AI is already radically improving education. Recall the mantra that the AI we are using today is the worst AI we will ever use. We can also assume that as the AI itself improves so to will we improve our knowledge of how best to employ it over time. Therefore it is almost certain that the gains we describe here will be improved upon, perhaps radically improved upon, as AI gets better and we get better at using it in education.

Take the World Bank AI pilot in Edo State, Nigeria, which focused on English Language, AI (ChatGPT) Knowledge, and Digital Skills. After six-weeks of access to the AI tutor, the results showed students:

• Gained two-years of learning, a 1200% increase
• Improved their ability to learn
• Benefited more as exposure to ChatGPT training increased, with the ceiling unknown
• Benefited more than students in 80% of all the World Banks other (often larger scale, more expensive) educational interventions

This was likely achieved with OpenAI’s GPT 4-turbo model the most modern at the time of the intervention. GPT-4-turbo is now fifteen months and ~ten models behind the most recent OpenAI releases.

Figure 1. Chart shows how increased exposure to ChatGPT increased Nigerian students performance on the test - we do not where the ceiling is for this 'dose-response' – how much better can students be with greater use of ChatGPT?

More examples:

· In Ghana, the ChatBot ‘Rori’ made available via WhatsApp for maths tuition saw students gain the equivalent of a one year’s worth of classroom learning in six-months. This from one sixty-minute session a week, at a cost of just $5 per student.
· At Harvard, an AI chatbot tutor enabled undergraduate students in physics classes to learn twice as much in less time, reporting significantly higher engagement compared to those in taught with ‘active learning’ techniques in a traditional classroom.
· In disadvantaged areas of the Southern United States an AI ‘Tutor Co-Pilot’ significantly improved tutor performance, as measured in student outcomes (4% improvements on average, 9% improvements for students taught by the weaker tutors).

The Future

Classroom education has changed little in the past century. Despite efforts to optimize, fundamental constraints limit the full application of pedagogical science. Overlaying new solutions on top of existing structures demands extra effort from students and exacerbates educational inequality. Instead, the best learning should be available to all.

Humans will never match AI in raw fact retrieval. While factual knowledge remains essential, superior internal models and conceptual understanding matter more—just as the ability to tackle mathematical problems outweighs rote memorization of formulas.

To remain competitive, humans must optimize their strengths. AI-enhanced learning offers a scalable solution, and current online educational content, though plentiful, lacks the necessary individuation, optimization, and real-time feedback to maximize learning efficiency.

A networked AI can learn what students need to know dynamically and empirically, refining its teaching strategies as it gathers data from learners in real-time. It can personalise based on skill, but also optimise to individual psychological make-ups and preferences. This adaptive approach ensures that education remains responsive to individual needs, eliminating traditional delays in curriculum improvement. The tools to implement this revolution exist today—it is time to use them.

National Strategy

There is more to this than just mitigating consequences domestically, nationally, for our children and all of us as workers, business owners, as a society and as citizens – urgent though this is.

We can and should be using the global reach and credibility of organisations like the BBC and the Open University to provide similar tuition globally. Such a service might offer British credentials – perhaps GCSE’s or ‘A’ Levels, marked by AI with far higher consistency than human teachers can manage (see Olex.AI’s impressive results in marking English school work), or degrees from our top Universities, MOOCs but enabled by cutting edge tech that could hugely increase the effectiveness and efficiency of knowledge transfer.

We could use it:

· to help other nations reduce the disruption the AI revolution will bring
· to reduce migratory pressures on ourselves by helping other nations harness the AI revolution
· to identify global talent, either for recruitment as part of a global talent spotting campaign, or for the award of accelerated immigration admission for the top performers
· to identify or qualify for the award of e-citizenship, e-residency or the right to found an e-business covered by UK law, and able to access UK banking and other services.

For those in poor or corrupt countries trying to build global businesses, this could be an invaluable opportunity.

It may be that poorer countries, and the most disadvantaged in society, benefit most from AI tutoring, overcoming barriers like large class sizes, teacher migration, teacher shortages, and teaching by unqualified staff. In some studies of AI, both in educational and professional setting - such as with Tutor Co-Pilot in the Southern US- it has been the lower performers that improve the most when AI assists.

For the UK, allowing AI talent to earn in the UK while living wherever they want through e-citizenship, e-residency or e-businesses, might raise tax revenues – in 2023 Estonia received €67.4 million in tax from e-citizens and e-businesses, a 33% increase on the previous year. What might the UK raise if not only offered such an option, but proactively awarded it to the highest performing students globally?

An objection here might be that allowing talent in poorer countries to found businesses in the UK will take tax revenue from the areas where they live. But the alternative is often for them to migrate. By staying where they are they spend locally, benefitting the economy. And once their businesses prosper, shielded by the UK’s legal system and with access to global markets from the UK, expansion into their country of origin seems likely - from a much more secure base.

A final caveat: we must remain wary of this early evidence for AI’s benefits in improving learning. From 2008, Professor Sugata Mitra gave inspiring talks on the effectiveness of his ‘Hole in the Wall’ experiments – where students in impoverished areas were given access to a computer connected to the internet and self-learned complex subjects with no tuition. In 2013, Mitra won the Ted Prize for his ‘Kids Can Teach Themselves’ talk. In 2015 the Times Education Supplement ran a headline claiming ‘Internet learning boosts performance by seven years’. These claims are now disputed, albeit not debunked. What we should note is the need for experimentation, testing of claims and careful implementation.

Care and caution must not mean prevarication and delays. It’s Happening. The AI development and AI impact will likely now scale logarithmically rather than linearly. Like the pandemic, we will feel its effects gradually, and then suddenly. We must respond with the urgency needed, and pre-emptively, not retrospectively. Education, Education, Education, as one Labour leader once put it.

It's Happening

Keith Dear — Fri, 07 Feb 2025 13:01:13 GMT

Artificial Intelligence crossing the Rubicon

‘I will show you fear in a handful of dust.’

T.S.Eliot, The Waste Land

There is a profound sadness about watching all this unfold. At Christmas, a good friend and deep ML expert at one of the world’s leading labs wrote in his email circular ‘It’s happening’. And I felt the same thrill and fear flare in me that comes multiple times daily now. He’s right. And we can’t stop it.

I re-read Sarah’s In Defence of Slowness at the End of the World again and again. I find myself back in church on Sunday. I wonder if I should read less for work and go back to theology and philosophy – passions in my teenage years and twenties. In coping with ‘the Coming Wave’ they feel now like they might be more important than anything I have learned since. The thrill is there. More on that another time. But moreover: I am worried.

I have written and spoken, with my usual mix of passion, evangelism and frustration, about where AI is going and how important it is going to be, for some time. We have named our new company ‘Cassi’ after Cassandra the prophetess condemned always to be right and never believed, in part in reference to a nickname given me in previous job. I am very aware that the warnings so many, I included, have advanced on the profundity of this revolution have not had the effect intended. We have failed.

So much has been written and said describing how disruptive a world of superintelligent AIs will be economically, socially, for national security – for humanity - that there is little need to requote and reference. Most of us commenting have striven to remain grounded, to avoid sounding ‘shrill’, to be sufficiently sober to retain a place with ‘the adults in the room’. But even now, as it is happening it remains a fringe issue. Acknowledged by Governments in words but not really in deeds. Treated with amusement and disdain by many in our governing and media classes. As if it is something we can worry about later. Part in consequence, it remains largely outside of electoral consciousness, with few doing the work to wake people up to what is coming.

Most thought ‘Don’t look up’ was about climate change. But not for me.

We are not ready. It is too late to be ready. It is too late to stop it.

I won’t pretend to know precisely how all this will play out. As ever, we do our best to grasp what Peter Hennessey called ‘the thin whisps of tomorrow’. That’s been central to my job for 22 years now. Forecasting breeds humility.

As Sarah writes

‘Those among us that are Situationally Aware must be on guard against arrogance. To anticipate some transformation that most live in ignorance of can easily breed self-importance. …worse, it can degrade the way that we perceive the efforts of others. …I don’t want to presume they’d act any differently if they knew (what I think) I know.’

Somewhere else, she wrote acknowledging we could be wrong – maybe some of us, and I one of them, have drunk the kool aid. Maybe, despite my protestations in ‘In Athena’s Arms’ (on AGI) we have succumbed to a cargo cult. In many ways, I hope so.

But until that becomes apparent, what I think I know is:

1. It is happening

2. We can’t stop it

3. We are not ready.

My view is that so much is unknown that our estimate of the probability of extinction risk from AI cannot be zero. But I do think it is highly unlikely (<5%). More likely is radical economic disruption, and social unrest. My biggest worry, because it feels more real and imminent, is that the things AI will take from us are the things I enjoy the most in my work – the search for new and novel insights, the act of creation, of communication, of anticipation and discovery - and yes, of struggle, and effort, the reward that comes from success at the end of it, or the consolations and lessons that exist in failure. I worry that even if we create a new economy, where intelligence is cheap and the AIs play the dominant role but we still work, the things I find most meaningful will be the things of least value.

I will keep writing, and play my part in what is unfolding as best I can. Because you can’t stop this, I would rather be on the upside of the bet than the downside. I think you can probably do more good to mitigate the disruption by being a part of the revolution than by sitting on the sidelines and commenting. I will try to enjoy the thrill of progress and do my bit, and my best to help others, to ameliorate the risks and harness the opportunities of what is coming. There are, of course potential radical upsides too, much greater prosperity, better health - some would say ‘curing death’, fully-automated luxury communism, Bostrom’s ‘Deep Utopia’. But I don’t think we get there without significant social and political unrest. ‘Significant’ being something of an understatement.

I worry that my writing breathlessly of progress and development is too easily misconstrued as delighting in the disruption. I write this because I want to be understood. There is a thrill to being a small part of - or even just being around to witness - the most far-reaching revolution in human history. But there is also a profound sadness about watching all this unfold. It’s happening.

Humanity's Last Defence Review?

Keith Dear — Thu, 06 Feb 2025 13:31:29 GMT

There has a been a lot of commentary on DeepResearch, the latest model from OpenAI. Tyler Cowan writes on his Marginal Revolution blog that:

“I have had it write a number of ten-page papers for me, each of them outstanding. I think of the quality as comparable to having a good PhD-level research assistant, and sending that person away with a task for a week or two, or maybe more.
Except Deep Research does the work in five or six minutes. And it does not seem to make errors, due to the quality of the embedded o3 model.
It seems it can cover just about any topic?”

What is DeepResearch you ask? OpenAI tells us it is ‘…a new agentic capability that conducts multi-step research on the internet for complex tasks.’

I thought I’d have it write the UK Defence Review, an experiment. I share because I think it is eye-opening. If you want the conclusions, skip to the end, but I encourage you to read it.

The first version I produced drew mainly on the Whitehall think thanks and existing UK Government documents. This was partly because my prompt asked for sobriety. It read to me like precisely the kind of thing Government usually produces - bland, prioritising politics and compromise over confronting issues and coherence.

As the prompt shows, I had it redo it twice more. First to re-write around a clearer theory of victory (which it didn’t do very well, but I think that is partly because in the UK sources it was reading, no-one has really advanced one).

So in the second addition to the prompts, I asked it consider my own writings, those of Ken Payne, Mike Horowitz, Edward Stringer, Christian Brose, Raj Shah and Christopher Kirchhoff, Dominic Cummings, Defence and Security reforms in Scandinavian countries, the Baltics and Japan. I picked those who are in one way or another quite radical, which also I think makes it a better read for the experiment and publication here.

I don’t doubt many of you would have prompted quite differently, and I cringe a little bit about asking it to look at my work. But on the other hand if it is on this blog it might be odder if I didn’t shape it to be like something I would (largely) agree with.

This took 6 minutes for Deep Research to produce. It took me significantly longer to cut/paste and fix the formatting. It is almost certain that any formatting or hyperlink errors in the below are my mistakes, and weren’t in the original.

Here’s the output:

UK Strategic Defence Review 2025: A Bold and Unflinching Overhaul

Executive Summary

Britain’s 2025 Strategic Defence Review calls for radical change to ensure the UK armed forces can fight and win in tomorrow’s battles. This review makes bold decisions to overhaul force structure, embrace emerging technologies, and ruthlessly cut waste. Key imperatives include:

Pivot to a Kill-Chain Warfare Model: Reorganize around fast, networked kill chains – the ability to find, target, and strike enemies with unprecedented speed
(ndupress.ndu.edu). The side with the fastest, most effective kill chain will dominate in modern war (ndupress.ndu.edu). This means linking every sensor to shooters via resilient networks, powered by AI, to compress decision cycles.
Aggressive Tech Integration (AI, Drones, Cyber, Space): Make AI and automation the backbone of UK defense. AI-enabled decision support and autonomous systems can outpace human reaction, giving a decisive edge (Bassett-Cross & Dear, 2024). From swarming drones to space-based assets, the UK must harness technologies that accelerate the kill chain and multiply lethality (ndupress.ndu.edu). We will not be a mere observer in the AI revolution – we will lead it.
Force Structure Overhaul: Build a leaner, highly lethal, networked force. We will field many more expendable, smart weapons (drones, loitering munitions, autonomous vehicles) and fewer expensive legacy platforms. This “Mosaic” or “Moneyball” approach returns mass and resilience to the battlefield by overwhelming adversaries with swarms of low-cost targets (hoover.org). China or Russia would face thousands of UK autonomous systems instead of a handful of inflexible units (hoover.org). Every unit and domain (land, sea, air, cyber, space) will be linked and capable of both sensing and shooting.
Hard Cuts to Legacy Systems: We will divest from systems that do not survive or deliver in future warfare. If a platform cannot contribute to a rapid kill chain or is too easily picked off by modern precision weapons, it will be reduced or eliminated. For example, heavy armored vehicles and manned artillery that are vulnerable to cheap drones and smart missiles have a diminished role (scsp.ai). Expensive programs that squander billions without delivering capability (e.g. past procurement “farces”(ukdefencejournal.org.uk) will be halted. Every pound saved will be reinvested into high-impact capabilities.
Reshape Industry and Procurement: Break the cycle of slow, over-budget acquisition. We will create a fast-track “Unit X” style innovation pipeline to inject cutting-edge commercial tech into the Armed Forces (Unit-X). Bureaucratic red tape will be slashed – speed and outcomes will define procurement success, not process. We’ll partner with the UK tech sector (AI labs, startups, advanced manufacturers) just as Ukraine leveraged SpaceX for resilient comms in war (scsp.ai). A new “British DARPA” ethos will fund high-risk, high-reward projects and field prototypes within months, not years.
Disciplined Execution & Timeline: This transformation will happen on a wartime footing. Clear deadlines will drive change – no more decades-long programs that deliver too late (defensenews.com). Leadership will be held accountable for meeting milestones in force restructuring, tech deployment, and savings reinvestment. The pace of global threats (from China’s rapid military-tech advances (ndupress.ndu.edu) to Russia’s aggression) leaves no room for delay or half-measures.

Bottom Line: The 2025 Review delivers an unflinching plan to make Britain’s military fit for the future. It sacrifices the comfortable old ways in favor of bold action and innovation. By embracing new warfare concepts and slashing inertia, the UK will ensure it can deter, fight, and win against any adversary in the coming decades. The following sections detail the theory of victory and the specific changes required in force structure, technology, industry, and execution to achieve that end state.

Theory of Victory in the Future Battlespace

How will Britain deter and win wars in the 2030s and beyond? Our theory of victory centers on outpacing and outsmarting the enemy through superior networks, information, and agility. In an age of high-tech confrontation, victory will not come from sheer volume of troops or the longest-range gun – it will come from turning inside the enemy’s decision loop and denying them any easy gains. Key elements of this theory include:

Kill-Chain Dominance: We will achieve decision superiority by executing the kill chain (find-fix-finish) faster than any opponent (ndupress.ndu.edu). If we can see and strike the enemy before they can react, we win. This means integrating intelligence, surveillance, and reconnaissance (ISR) with shooters via AI-enabled command systems. For example, UK forces will fuse satellite imagery, cyber intelligence, and forward sensors in real time to direct precision strikes within minutes. In modern war games, forces that master this kind of rapid targeting “have nearly perfect records” of victory, whereas those who lag lose badly(ndupress.ndu.edu). Our theory of victory is to always be the side with the fastest kill chain.
Deterrence by Denial and Rapid Punishment: Britain will deny adversaries their objectives from Day 1 of a conflict. Borrowing from the Baltic states’ resolve, we will make any attempted attack immediately costly and likely to fail. The Baltic and Nordic countries have shown the importance of defending the entire national territory from the outset, rather than ceding ground (osw.waw.pl). Likewise, the UK will field forces (including resilient local defenses, long-range strike, and cyber units) that can blunt and punish an aggressor immediately. By having the capability to “find and finish” enemy units swiftly via networked fires, we ensure any aggressor knows they cannot achieve a quick, low-cost win (scsp.ai). This deters war by raising the risks to the initiator.
Multi-Domain Integration: Victory will rely on seamlessly combining effects across all domains – land, air, sea, cyber, and space. The UK’s theory of victory leverages synchronized operations: e.g. blinding enemy sensors via cyber or electronic attack, while precision missiles from air and sea strike critical targets, and autonomous drones saturate the battlespace. Each domain reinforces the others in a web of capabilities that an adversary cannot easily unravel. This system-of-systems approach mirrors the way Ukraine has fought off a larger foe by networking drones with artillery and space-based comms (ndupress.ndu.edu; scsp.ai). By attacking along every axis (physical and digital) faster than the enemy can respond, we achieve an “offset” – rendering their strengths (like massed tanks or ships) ineffective.
Continuous Innovation & Adaptation: A core pillar of our theory is that the first side to innovate wins. We face a rapidly evolving threat environment – notably the AI revolution in warfare. Our strategy is to lead technological change, not follow. As one expert put it, you need a “theory of winning” that anticipates the future and builds for it, rather than waiting for the future to arrive (Dear, Theory of Victory, 2024).
This means the UK will constantly wargame and experiment with new concepts (from swarming munitions to AI battle management) and quickly adopt what works. We recognize that standing still means falling behind in the face of adversaries (like China) who are innovating at speed (politico.eu). Thus, adaptation and learning are built into our operations – our doctrine will evolve as fast as technology does.
Alliance Leverage and Niche Strengths: Finally, Britain’s theory of victory acknowledges we fight as part of alliances (NATO and partnerships in the Indo-Pacific). We will focus on niche advantages that complement allies – for instance, excelling in certain tech (cryptography, cyber offense, undersea warfare) or unique force packages – to punch above our weight. By contributing disproportionately valuable capabilities to coalitions, we help tip the overall balance in our favor. At the same time, we will not rely on others for critical gaps; the UK must be able to act independently in a pinch. That requires sustaining capabilities like a minimum nuclear deterrent and strategic lift, while modernizing conventional forces to be self-sufficient for homeland defense.

In summary, our theory of victory is about brain over brawn: using intelligence, speed, and innovation to defeat threats. We deter by convincing any adversary that the UK can out-target, out-think, and out-last them in conflict – making aggression futile. Should deterrence fail, we will fight as a nimble, tech-empowered David against any Goliath, confident that precision and networking can defeat brute force. This theory underpins the force design and investments outlined in the next section.

Force Structure Overhaul: Building the Future Force

The UK Armed Forces will undergo a radical force structure transformation to implement this vision. We are moving from a legacy force built for industrial-age warfare to a digital-age force optimized for distributed, network-centric operations. This overhaul prioritizes quality and connectivity over sheer size, and agility over heavy mass. Key changes include:

Smaller, Networked Combat Units: We will break our forces into agile, networked units that can operate semi-independently yet share data instantaneously. Battalion-sized formations will be augmented (or even replaced) by “unit cells” equipped with their own drones, sensors, and long-range fires, all linked via secure communications. Instead of large brigades needing massive logistics, many smaller units will swarm the adversary from multiple directions, complicating their targeting. This is inspired by concepts like the US “Mosaic Warfare” and the Ukrainian model of dispersed resistance, which show that many small, smart units can outperform a few big ones. Importantly, every unit will be capable of plugging into a larger combat cloud – a resilient network that shares targeting data and intelligence in real time.
Return of Mass through Autonomy: Paradoxically, we will increase “mass” (number of platforms and weapons) while downsizing manpower. The answer is autonomous systems in bulk. The UK will deploy swarms of low-cost drones, uncrewed vehicles, and smart munitions to regain numerical advantages that were lost when we relied only on a few high-value assets. A “Moneyball Military” approach aims to field quantity with quality – inexpensive, attritable systems that are networked together (hoover.org). In a conflict, dozens of autonomous drones might accompany each squad or ship, overwhelming enemy defenses by sheer numbers. Because each drone or robot is relatively cheap, losing some is acceptable – they impose disproportionate costs on the enemy by forcing them to waste expensive missiles or expose themselves to counter-fire (hoover.org). This strategy forces the enemy into a dilemma: if they don’t shoot, our swarms strike them; if they do shoot, they reveal their positions and exhaust their munitions on less valuable targets (hoover.org; hoover.org). We saw glimpses of this in Nagorno-Karabakh and Ukraine, where drone swarms decimated armour and artillery (scsp.ai). The UK will lead in fielding thousands of autonomous systems, from air-launched drones to uncrewed submarines, restoring combat mass and complexity in our favor.
Cutting-Edge “Kill Web” Architecture: The new force will be designed around a “kill web” – a web of sensors and shooters – rather than around traditional service silos. In practice, this means a Royal Air Force fighter might cue a British Army long-range missile launcher to fire, based on Navy drone surveillance data. All components act as part of a unified kill chain network. To enable this, we’ll invest heavily in C4ISR (Command, Control, Communications, Computers, Intelligence, Surveillance, Reconnaissance) that is secure and jam-resistant. Every platform – whether a tank, ship, aircraft or satellite – will be a node that can send and receive targeting data. The principle is “Every sensor, every shooter” – if one unit sees a target, any unit can kill it. This will require common data standards and intense training in joint operations. The payoff is enormous: a resilient kill web means even if one node is destroyed, the network fights on. It mirrors how Russia’s enemies demonstrated high-speed kill chains by pairing old artillery with new drone reconnaissance, achieving effects disproportionate to their resources (ndupress.ndu.edu). Our overhaul ensures the UK always fights as an integrated network, not separate service fiefdoms.
Highly Lethal, Niche Capabilities: Within this new structure, we will emphasize high-lethality units armed with next-generation capabilities. For example, long-range precision fires units will be expanded – including land-based rocket forces able to strike targets hundreds of kilometers away with precision. Investments in capabilities like hypersonic missiles and advanced anti-ship missiles will give even small UK units a big punch. We will also stand up dedicated “drone swarms battalions” or squadrons – units trained specifically to deploy and manage swarms of loitering munitions and autonomous drones for reconnaissance and attack. The Air Force will incorporate loyal wingman drones to fly alongside piloted jets, multiplying their effectiveness. The Navy will introduce more uncrewed surface and underwater vehicles to extend the fleet’s reach (for example, uncrewed vessels scouting ahead of manned ships or protecting the strategic nuclear deterrent submarines). Special Forces and cyber units will also grow, focusing on high-impact niche roles like strategic sabotage, counter-space operations, and AI-enabled psychological operations. In short, every part of the force is being re-tooled to hit above its weight class through technology and smart design.
Sustainable and Agile Logistics: A truly agile force overhaul must address logistics – often the Achilles’ heel. We will move away from ponderous supply convoys and vulnerable supply depots, towards just-in-time resupply, 3D printing at forward locations, and unmanned delivery (e.g. drones dropping supplies to front-line units). Smaller units with autonomous platforms also mean leaner support requirements (many drones are electric or simpler than crewed vehicles). The force will pre-position caches of munitions and spare parts in dispersed locations (including at sea on auxiliary ships or offshore platforms) to sustain high-tempo operations. We’re also exploring “resupply swarms” – autonomous cargo vehicles that can navigate to units under fire, meaning the fighting force can keep moving without waiting for traditional logistics. This agile sustainment model is informed by observing how Ukrainian units, using civilian quadcopter drones and local 3D-printed components, maintained their fight despite disrupted supply lines. For the UK, ensuring our kill chain and combat units remain effective in a protracted high-tech war means reinventing logistics with the same innovation we apply to combat units.

This force structure overhaul is unapologetically bold. It will remake the British military into a leaner, tech-heavy, networked force that is deadly and hard to defeat. We acknowledge that fewer soldiers or big platforms does not mean less power – on the contrary, if each remaining unit is backed by swarms of smart weapons and a superior network, our combat power and deterrence credibility will vastly increase. By doing this now, we get ahead of the curve of military transformation, rather than lagging behind and risking defeat (Dear, Theory of Victory, 2024).

Cuts and Reallocation: Ruthless Prioritization

To fund and empower the future force, the MoD will make tough, unsentimental cuts to programs and forces that no longer serve our strategic needs. This Review is explicit about what gets cut and why, breaking the cycle of spreading resources too thinly across legacy programs. Every cut is aimed at eliminating inefficiency or vulnerability and freeing up investment for higher-impact capabilities. Key decisions include:

Reducing Heavy Armor and Manned Artillery: The role of heavy main battle tanks and tube artillery will be scaled back. Lessons from recent conflicts show that such armored forces, if not networked and protected, become prey for modern precision weapons and drones (scsp.ai). Rather than maintain a large fleet of heavy tanks, the Army will retain a smaller core of upgraded tanks for niche uses (e.g. urban breakthrough or as protected fire bases) and redirect funding to mobile anti-armor teams, precision-guided missiles, and drone support. Self-propelled artillery units that lack range or survivability against counter-battery fire will be trimmed; emphasis shifts to long-range rocket artillery (e.g. HIMARS-type systems) and loitering munitions that can strike without a large signature. These cuts recognize that a $100k smart munition can destroy a $5 million tank, so we will invest in the $100k munition rather than add more $5m targets for the enemy. The traditional heavy cavalry is thus no longer a growth area – agility and range are the new firepower.
Mothballing or Selling Off Legacy Aircraft and Ships: We will retire platforms that are too expensive to operate and insufficiently adaptable. Some older Typhoon fighter jets, which will struggle against advanced air defenses and drones, will be phased out faster in favor of newer F-35s and unmanned combat aerial vehicles. Similarly, the Royal Navy will consider decommissioning aging frigates or minehunters that cannot be affordably modernized, reallocating their crews and budgets to frigates with advanced anti-air/anti-submarine capabilities and to uncrewed vessels. No “sacred cows” are exempt – if an asset does not contribute to the future kill chain or cannot be upgraded with networking and AI, it will be a candidate for reduction. This candor follows Dominic Cummings’ critique that MoD often “squanders billions” on legacy purchases that mainly enrich contractors (ukdefencejournal.org.uk). We will not repeat that mistake; platforms that don’t pull their weight will go.
Streamlining Personnel and Commands: A futuristic force needs a high-skill, lean workforce. The review calls for cutting administrative overhead and possibly reducing personnel in certain traditional roles, while increasingpersonnel in tech-focused roles. We anticipate trimming some senior ranks and HQ staff (“tail”) to protect investment in frontline “teeth.” For example, the Army’s total manpower might slightly decrease as some infantry battalions are stood down – but simultaneously we will grow units like the Ranger Regiment (for special operations), cyber forces, and technical cadres for drone operations. The net effect shifts human resources from quantity to quality. Additionally, duplicative command structures will be merged to force tri-service integration. A slimmer command structure also means faster decision-making and less budget spent on bureaucracy. These manpower reallocations ensure we have the right people (data scientists, drone pilots, AI engineers, etc.) instead of large numbers in legacy occupations that technology can augment or replace.
Ending Failing Procurement Programs: Perhaps most emblematic, we will not pour more money into programs that have shown they cannot deliver. The Ajax armored vehicle program, for instance, has been plagued by years of delays and technical problems – initially due in service in 2020, now pushed to 2025-2029 for full capability (defensenews.com). Such a timeline is unacceptable in a fast-changing threat environment. If ongoing reforms cannot quickly salvage Ajax, we will cut our losses and reallocate its £5+ billion budget to proven or emergent systems (like advanced infantry fighting drones or extra precision missiles for existing launchers). The same scrutiny applies to any big project running significantly over time or budget. Parliament and the public demand accountability: “Enough is enough” on programs that devour resources without output (defenceprocurementinternational.com). By cancelling or downsizing these, we free up funds for systems that can be deployed now or in the very near future.
Defense Estate and Legacy Infrastructure: We will also look at closing or repurposing military bases and facilities that were built for a force structure of the past. Maintaining excess real estate diverts funds from combat power. Consolidation of training sites, outsourcing of some non-core services, and energy-efficient base upgrades will save money over the next decade – those savings will be funneled into combat capabilities. Similarly, stockpiles of old munitions or equipment that are obsolete will be retired (with any useful items sold to allies or used for training), reducing storage and maintenance costs.

All savings from these cuts are ring-fenced for reinvestment into the new capabilities identified in this Review – there is no “peace dividend” being pocketed elsewhere. This aggressive reallocation turns slack and waste into fuel for innovation. We recognize these choices have impacts – jobs lost in some traditional sectors, sentimental attachments to historic regiments or ships – but the overriding criterion is combat effectiveness for the threats of tomorrow. As painful as change can be, the greater pain would be sending our troops to fight with the wrong tools. This Review chooses future readiness over past habits, firmly and without apology.

Technology and Kill Chain Acceleration

Modern technology – especially AI and autonomous systems – is the center of gravity for future military power. To ensure the UK is not outpaced, we will launch an aggressive plan to integrate emerging technologies across all defense activities. This is about dramatically accelerating the kill chain: using tech to see, decide, and strike faster and more precisely than ever. Key initiatives include:

AI-Enabled Decision Making: We will embed artificial intelligence at every level of command, from strategic planning to real-time targeting. Advanced AI decision-support tools can analyze vast data (satellite feeds, intelligence reports, sensor readings) far quicker than human staff, cueing commanders to threats and options in seconds. For instance, AI systems will help identify targets in live drone feeds or predict enemy moves by analyzing patterns – tasks that took analysts hours can be done in near-real time. The goal is to compress the OODA loop (Observe–Orient–Decide–Act) to a degree our adversaries cannot match. Already, algorithms can beat humans in complex strategy games and handle massive information streams; applied to warfare, even today’s AI (not to mention tomorrow’s) would be “completely transformative” if fielded at scale (Bassett-Cross & Dear, 2024). As one UK defence advisor put it, if we adapted current AI breakthroughs for military use, it would revolutionize operations – yet nothing as advanced is currently deployed in UK defence (Bassett-Cross & Dear, 2024). We will change that at speed.
Autonomous Weapons and Platforms: Embracing autonomy is essential to winning the kill-chain competition (ndupress.ndu.edu). The review mandates rapid development and acquisition of autonomous weapons – from loitering munitions that can hunt targets independently, to uncrewed combat aircraft, robotic ground vehicles, and AI-driven naval vessels. These systems react faster than human-operated ones and can be risked in dangerous roles without fear of casualties. The ethical standard we will apply is pragmatic: an autonomous system doesn’t have to be perfect, it just needs to make better decisions under combat stress than a human would (ndupress.ndu.edu). Many autonomous platforms already meet that bar in specific tasks (e.g. automated point-defense guns can shoot down incoming missiles faster than human operators). We will deploy autonomous wingmen drones alongside fighter jets, autonomous sentry towers to guard perimeters, and swarm drones that can coordinate attacks on enemy air defenses. By 2030, a significant share of UK firepower will be delivered by uncrewed systems – dramatically speeding up the kill chain and saturating the enemy with threats.
Long-Range Precision Fires & Hypersonics: To strike first, we must strike farther and faster. The UK will invest heavily in long-range precision fires, including land-attack cruise and ballistic missiles, and will join allies in developing hypersonic missiles that can reach targets in minutes. This extends our kill chain across theater distances. Having the ability to hit an adversary’s critical assets (like bases or ships) from hundreds or thousands of kilometers away complicates their plans and keeps them off-balance. Japan’s recent decision to acquire long-range strike weapons (breaking a historical norm) shows the importance of this capability in deterring aggression (rusi.org). We will ensure the RAF and Royal Navy have inventories of conventional precision missiles that can be fired from air, sea, or land, giving Britain a global strike reach. In parallel, we are developing directed-energy weapons (lasers and microwave systems) to counter the enemy’s precision munitions and drones – adding a layer to our kill chain that intercepts incoming threats at the speed of light.
Resilient Networks & Battle Management: All these tech advantages hinge on robust, secure connectivity – our “nervous system” in war. We are going to harden and upgrade our networks using new communication satellites, encrypted tactical datalinks, and backup methods (like mesh networks formed by drones or balloons). The Ukraine war underscored this: having services like SpaceX’s Starlink ensured Ukrainian forces stayed connected even under heavy electronic attack (scsp.ai). We will similarly partner with commercial providers to augment military satcom and internet, ensuring our forces never go dark. Moreover, we’re fielding advanced battle management software (often AI-driven) to orchestrate forces in real time. Picture an AI “chief of staff” that helps a commander rapidly assign targets to the optimal shooter – that is the kind of system we are deploying to knit together our kill webs. Cyber defense and electronic warfare improvements also come under this umbrella: our networks must resist jamming and hacking. In exercises, we will practice “fighting fractured” – assuming networks are contested – to refine the tech and tactics for staying linked under fire. A resilient networked force can coordinate kill chains even in chaos, whereas an adversary with disrupted comms will be paralyzed.
Military Application of Emerging Tech (Quantum, Biotech, etc.): Beyond AI and autonomy, we’re scanning the horizon for any technology that could provide a war-winning edge. For example, quantum computing and sensing could revolutionize intelligence (cracking codes, detecting submarines) – we’re investing in R&D now so we aren’t surprised later. Biotechnology advances might enhance soldier performance or create new materials for defense; we’ll watch those closely too. The key is an open mindset: as one defense innovator stressed, you “don’t wait for the future to arrive – imagine it and work backwards to build what you need” (Dear, Theory of Victory, 2024). In that spirit, we’ve established “Unit X” style incubators and sandboxes to quickly test cutting-edge ideas from labs or startups. Whether it’s drone swarms guided by neural networks or augmented reality helmets for troops, if it can sharpen our kill chain or make our force more lethal and survivable, we will experiment with it immediately. The MoD will take calculated risks on prototypes and pilot programs, knowing some will fail – but those that succeed could be game-changers.

All these efforts combine to accelerate the kill chain – the time from detecting a threat to eliminating it. We’re striving for what the U.S. calls “decision advantage” and “dynamic strike” capabilities, scaled to a nation of Britain’s means. Crucially, this isn’t just about buying gadgets; it’s about integrating them into doctrine and training. We will train AI-assisted operations and human-machine teaming as a norm. Troops will learn to trust and effectively employ autonomous systems. Commanders will learn to delegate certain tasks to AI and focus on the bigger picture. By vigorously pursuing these tech advances, the UK will not only keep up with but in some areas outpace the Pentagon and PLA in military innovation. Indeed, British defence thinkers have warned that if we don’t harness AI and emerging tech, we risk becoming an “AI colony” of those who do (politico.eu). This Review ensures Britain remains a leader, not a follower, in wielding the technologies that are reshaping warfare.

Industrial and Procurement Revolution

Achieving all the above requires a fundamental change in how we develop and acquire military capability. The traditional UK defence procurement system is too slow, too risk-averse, and too insular to deliver the needed innovation on time (ndupress.ndu.edu). We are therefore unleashing an Industrial and Procurement Revolution to rapidly translate money and ideas into fielded capability. Key components of this revolution are:

Embrace a “Unit X” Model: The MoD will establish a new high-level Defence Technology Accelerator – effectively a UK version of the U.S. Defense Innovation Unit (codenamed “Unit X”). Its mission: bridge the gap between the military and the tech sector (Shah & Kirchhoff, 2024).
This organisation will be empowered to bypass normal bureaucracy and work with non-traditional suppliers (startups, academia, big tech firms) to solve military problems with cutting-edge tech. Like a venture capitalist, it can rapidly fund and trial solutions, focusing on speed and outcomes over process. Unit X in the US was able to pull technologies like AI drones and microsatellites into use much faster than traditional programs (Shah & Kirchhoff, 2024). We will do the same, making the UK a welcoming partner for innovators. This also feeds our local tech industry – rather than relying only on a few big defense primes, we’ll tap the creativity of Britain’s wider science and technology base. In an era when China practices “Military-Civil Fusion” – requiring all companies to support the PLA (Shah & Kirchhoff, 2024) – it’s imperative we mobilize our private sector genius for national security as well. The old walls between “Silicon Roundabout” (UK’s tech hub) and Whitehall must come down.
Speed Over Perfection: A key cultural shift in procurement will be adopting a “fast enough to fail, fast enough to fix” mentality. Rather than spend 15 years trying to develop the perfect platform (only for it to be obsolete on arrival), we will aim to get 80% solutions out in the field in a fraction of that time, then continuously improve them. This is the model of software and tech innovation, and it needs to be ours as well. For example, we may launch a new drone program with off-the-shelf components to deploy within 1–2 years, then iterate newer versions every year thereafter. In practical terms, we will streamline requirements (avoiding the deadly creep of adding too many features that delay a project) and use agile development methods. Prototyping and experimentation will count as success, even if some prototypes don’t make it – what matters is learning quickly. We will also exploit the global defense market: if a foreign partner has a system that meets our needs now, we’ll buy or license it rather than reinvent the wheel. The recent creation of a £800 million UK “ARPA” (Advanced Research Projects Agency) is a step in the right direction (wired.com); we will leverage it to fund breakthrough ideas and accept that some projects will fail on the way to big successes. The Net Zero we seek is not carbon (in this context) but zero time wasted in fielding critical tech.
Overhaul Acquisition Processes: We will implement the recommendations of numerous studies and committees to fix MoD procurement. This includes cutting unnecessary layers of approval, setting hard time limits for each stage of a program, and holding project managers accountable for agile delivery. For instance, if a project hasn’t shown usable progress in, say, 2 years, it will trigger an automatic review to reformulate or cancel it. We’ll also reform defence contracting to incentivize speed: contracts will include bonuses for early delivery or innovation, and penalties for excessive delay. Smaller lots and spiral development will be favored over mega-projects that aim to do it all in one go. Additionally, we will seek to “buy before we build” – meaning evaluate commercial technologies first. The old approach saw MoD trying to custom-build even basic tech, whereas now if a British company or ally already has a solution (e.g. a proven AI analytics platform or a drone design), we will adopt or co-develop it, shaving years off schedules. Another aspect is improving how we test and evaluate new tech – setting up sandbox environments (including digital twins and war-gaming with virtual prototypes) so that we can rapidly validate concepts and move them to deployment.
Allied Collaboration and Open Markets: The UK will double down on collaborative programs with close allies to share costs and speed up innovation. We’ll actively participate in joint projects like the Global Combat Air Programme (GCAP) for next-gen fighters and look to co-develop autonomous systems with the US, Australia, and others. When appropriate, we’ll adopt systems that allies have pioneered – for example, if the US fields thousands of “attritable” drones under their new initiative (hoover.org), the UK might piggyback on that production run to equip our forces quickly. We’ll also open our procurement more to allied vendors, ensuring we get the best tech globally while also giving UK firms opportunities in allied projects. Small NATO members (like the Baltics and Nordics) have been nimble by buying proven systems off the shelf – we will learn from them. For instance, Estonia rapidly acquired loitering munitions and anti-tank weapons to boost deterrence instead of launching drawn-out domestic programs. Japan, in its reform, is similarly reaching out for international partnerships to bolster its defense industry and capability base (armscontrol.org; rusi.org). The UK will make its defense market more competitive and outcome-focused, ending the bias toward a few big suppliers who often underperform. This competitive pressure, together with allied cooperation, will spur better results faster.
Military-Civil Fusion (Democratic Style): While we reject authoritarian China’s methods, we acknowledge the benefit of closer integration between military needs and civilian innovation. Our approach is to create a virtuous cycle between the armed forces and Britain’s world-class universities and tech companies. We will sponsor research in AI, quantum, materials science etc., with defense applications in mind, and also spin-off military-developed tech to civilian use (strengthening the economy). A key step is talent exchange: we’ll encourage secondments of tech experts into MoD and military officers into tech firms or research labs, so cross-pollination happens. The faster pace of private sector tech development must be infused into defense. Notably, Ukraine’s war effort was greatly aided by civilian technologists and apps – e.g. a civilian smartphone app was used to crowdsource targeting intel, which AI helped verify for the military (scsp.ai). The UK can similarly leverage civilian ingenuity for national defense (in crisis or peace) by having the right collaborations in place ahead of time. This blur of civil-military lines (in a positive sense) means when a need arises – say rapid production of drones or setting up a secure national network – the expertise and capacity of the entire nation can be brought to bear, not just the traditional defense industry.

In sum, this Industrial and Procurement Revolution frees us from the “optimized for risk-aversion” acquisition system that has held us back (ndupress.ndu.edu). We will be faster, more agile, and more connected with the civilian tech base. The result will be that our forces get the kit they need when they need it. Britain’s inventors and entrepreneurs will find new opportunities to contribute to national security, and the taxpayer will get better value as we avoid the costly fiascos of the past. As SCSP experts noted, we have all the resources, talent, and legal freedom needed to transform – it’s about creating the right incentives and breaking the inertia (ndupress.ndu.edu). This Review commits to doing exactly that, so the pipeline from concept to combat is lean and lightning fast.

Implementation and Execution Discipline

A bold plan means nothing without flawless execution. The Review thus concludes with a concrete roadmap and mechanisms to ensure these changes are realized with urgency and rigor. This is a call to action, not a theoretical paper. Key aspects of our implementation strategy include:

Aggressive Timeline and Milestones: We will not allow this review to sit on a shelf. An implementation task force will oversee an aggressive timeline. Within 6 months, detailed action plans for each major decision (force cuts, new investments, procurement changes) will be approved by ministers. Within 1 year, we expect to see initial operational changes – e.g. at least one brigade equivalent restructured into the new model, one or two autonomous systems units formed, and legacy systems retirements initiated. By 2025’s end, the MoD should have reprogrammed funding in line with the review priorities (moving money from the cut programs to the new ones). By 2030, we aim for the bulk of the force structure changes to be complete: at least 50% of army combat units reorganized and equipped with new-generation systems, initial operational capability of key AI decision-support in headquarters, and several autonomous weapon squadrons active. We will publish annual progress reports to Parliament to ensure transparency. This wartime-like schedule is necessary – as one US defense leader said, we are in a decisive decade where delays could be perilous (scsp.ai). If the Pentagon is seeking to field thousands of drones in 1–2 years (hoover.org), we too must think in months and very few years, not decades.
Leadership Accountability: Execution will be led from the top. The Defence Secretary will personally chair quarterly “Transformation Council” meetings with Service Chiefs and relevant industry heads to track progress. We will assign named senior champions for each major initiative (for example, a three-star officer as the AI Integration champion, another for Autonomous Systems deployment, etc.). Their task is to cut through red tape and deliver results – their performance will be evaluated on hitting the milestones. If obstacles arise (budget overruns, technology hiccups), they are empowered to find solutions or request rapid decisions from the Council. We will instill a culture where not delivering on reform is not an option. The era of military and civil service leaders being rewarded for maintaining the status quo is over; now the incentives align with driving change (ndupress.ndu.edu). Should there be resistance within the bureaucracy, the mandate from this highest level is clear: get on board or get out of the way. We will also engage external auditors (like the National Audit Office and independent experts) to validate that progress is real, not just on paper.
Wargaming and Iteration: To ensure the changes are on the right track, we will continuously test and adjust our plans through wargames, simulations, and exercises. Almost immediately, we will run a series of “future force” war games that pit the envisioned new force (with drones, AI, etc.) against high-end adversary scenarios (e.g. a Baltic defense scenario against Russia, a South China Sea scenario with allies against China). These games will highlight strengths to double down on and identify any gaps or concepts that aren’t working. We’ll iterate the force design accordingly – this is the beauty of starting changes now; we have time to adjust before a real war happens. The lessons from Ukraine’s current war and other conflicts will be continually incorporated – for example, if new electronic warfare tactics emerge, we’ll adapt our drone comms on the fly. This adaptive approach guards against the plan becoming stale or misaligned; we expect the plan itself to evolve (within the Review’s intent) as we implement, guided by empirical evidence of what works.
Resources and Budget Protection: Implementation requires funding. The government is committed to shielding and even increasing the defense budget to resource this review’s ambitions. We are aiming for defence spending nearer to 3% of GDP, following the example of Baltic states that realized higher investment is required in the new threat era (osw.waw.pl). However, even within current budgets, the big reallocation we outlined (cuts to old programs) frees up billions for reinvestment. We will make sure those freed funds do not slip away to other departments or get lost – they are earmarked for defence modernization. A dedicated “Future Capabilities Fund” will hold savings and channel them to priority projects (with Treasury oversight to ensure it’s spent wisely). Essentially, we’re self-funding much of the reform by cutting waste. If additional funds are needed for key leap-ahead technologies (like hypersonics or a sovereign AI capability), we will make the case to Parliament forcefully – noting that our very security and relevance are at stake. The opportunity costs of inaction (or slow action) far outweigh the costs of this transformation.
Cultural Change and Training: Finally, we recognize that new hardware and org charts won’t succeed without a matching mindset change in our people. We are launching a concerted effort to instill a culture of innovation and adaptability in the MoD and forces. This means educating personnel about the nature of future war – helping them understand why we must change and giving them ownership of the new ideas. We’ll expand technology training for all ranks (so a young officer or NCO is comfortable using AI tools or employing drones creatively). Career paths will be modified to reward those who develop tech skills or drive innovation in their units. We’ll also address institutional inertia by rotating fresh talent into key roles – including bringing in external experts (e.g. data scientists on short service commissions) to challenge groupthink. As Kenneth Payne observed, bureaucratic inertia is a big hurdle and current organizations aren’t well-suited to the AI-driven world (kennethpayne.uk). We will tackle this head-on, making adaptiveness a core value. One ethos we’re promoting is “Don’t Blink” – a mantra coined by UK analysts imploring defence leaders to never look away from the rapid progress of AI and tech lest we fall behind (Bassett-Cross & Dear, 2024). In practice, this means our military education will include futurist scenarios, our exercises will feature new tech, and our promotions will favor those who champion improvement.

In implementing this ambitious agenda, we take inspiration from historical successes – such as the rapid mobilization and innovation in WWII, or the way some smaller nations today have leapt ahead by necessity. We are essentially forcing a peacetime revolution in defense. As Brose noted, militaries often only change under the shock of war, but the wise ones change because they see the threat looming (ndupress.ndu.edu). The“something worse than change” for us is clearly the prospect of the UK losing relevance or losing a conflict (ndupress.ndu.edu). We refuse to let that happen. Thus, execution will be relentless and unsparing. If an aspect of the plan falters, we will acknowledge it and fix it – not cover it up. If certain legacy stakeholders complain, we will listen but not be deterred from the mission. The end state we drive toward is a British military by 2030 that any potential foe knows is modern, deadly, and united– a force that can project power alongside our allies or defend the homeland against any threat, with no ifs or buts.

Conclusion: This Strategic Defence Review 2025 has pulled no punches. It spells out a transformation that is aggressive in pace and scope – because the world demands nothing less. We have identified where Britain must pivot: onto the high ground of AI and autonomy, into the new realm of kill-chain warfare, and away from the comforts of the past. We have confronted the hard trade-offs: cutting cherished programs and old force elements so that we can build the capabilities truly needed for future battles. And we have laid out how to get there, fast and sure. In doing so, we are guided by examples from forward-leaning allies and by sobering lessons (both of our own missteps and of wars unfolding in real time).

The message to all of Defence is clear: innovate and adapt, or become irrelevant. With this review, Britain chooses to adapt – aggressively, continuously, and with vision. We choose to be a leader in the revolution in military affairs, not a laggard. Our aim is that by the late 2020s, the UK armed forces will unambiguously be ready for the threats of tomorrow – a deterrent credible enough to keep the peace, and if needed, a force agile and potent enough to win swiftly. There is no room for half measures or delays. As this review has stressed, the time to act is now; the race for military relevance is on, and Britain is determined to win it (Bassett-Cross & Dear, 2024; politico.eu).

Britain’s soldiers, sailors, airmen, marines, and cyber warriors deserve nothing less than the best tools and a winning plan. This Strategic Defence Review provides exactly that – a ruthlessly focused blueprint to ensure that, come what may, the UK will dominate in the future battlespace.

So What?

Alright, tagging myself back in here. What should we make of this?

First, before you decide what you think, remember the standard you are comparing this to is twofold:

How far does this fall short of a perfectly written review? and
How much better is it than - or if your prefer how does it compare to - previous Defence Command Papers (2021, 2023).

It clearly falls well short of ‘perfection’, whatever that might mean to you. My contention would be that it is, in almost every way, better than the two most recent Defence Review papers.

I recognise it over-indexes on my own papers, but then I asked it to. That said, it offers recommendations I disagree with - this is not just the regurgitation of other’s works, or at least, no more so than something a human would write.

To take a few examples of things I don’t agree with, if it were me, I would be committing UK Defence to phasing out all crewed aircraft before 2030, to big investments in AI, and the direct pursuit of AGI, at all levels of Command and Control. I’m not convinced it makes sense to have separate Services in an age of AGI - the subject maybe for a future post. There’s nothing here on the need to reshape the MOD Operating Model. And it has dropped recommendations in the first two previous versions it produced that I thought were excellent - for example:

Accountability and Affordability: The British public and Parliament rightly demand that any extra defence spending is used wisely. To ensure accountability, the Review recommends:

Annual progress reports to Parliament on the implementation of SDR recommendations and how budget increases are translating into outputs (e.g., number of new units formed, equipment delivered, readiness metrics).
Strengthening the role of the National Audit Office (NAO) and parliamentary committees in scrutinizing major defence projects, to catch issues early.
Possibly establishing an independent “Cost Realism” panel with external experts who review MoD procurement plans to ensure costings are accurate and efficiencies are on track – adding credibility to budget requests.
Embracing greater transparency where possible on defence plans, to build public trust that increased investment is necessary and effective.

In short, it is not the Review I would have written. But the real point here is that this could be written by ChatGPT in 6-minutes. When the Review was announced in July 2024 this would have been impossible. It is worth recalling that this (too) is the worst AI you’ll ever use. By the time the Review is published it will be more capable. By the time we begin to implement the Review’s recommendations it will be better still.

Two weeks ago, researchers from around the world released a collaboration called ‘Humanity’s Last Exam’***. At the time the best AI model scored 9% in the test. Two week’s later, DeepResearch scores 26.6% (a 183% improvement in two weeks!).

Image via @kimmonismus on X.

This week, Metaculus’ forecast for the arrival of weak AGI moved from 2027 to 2026.

The conclusion should be that the MOD must address the topic of AGI in this Review. It would be a dereliction of duty to ignore it. And maybe this is Humanity’s Last Defence Review too.

Addendum: a further thought. You can read in my prompt that I fed Deep Research the Review’s Terms of Reference. I then added further prompts and guidance. So too in real Reviews. The reality of policy making is there will be a lot that isn’t said in the public TORs that is vital. Behind closed doors new constraints are added explicitly or implicitly but never recorded. Self-censorship takes place, where what is written is written to please seniors and stakeholders - written for what might get approved, sometimes at the expense of what needs to be said and done.

In 2019 I wrote ‘AI and Decision-Making’ for the RUSI Journal, describing how AI would drive rigour into decision-making. I argued it would force humans to make the implicit explicit, the logic, evidence and data on which decisions were made available for inspection. Which is true, and evident in the AI written Review, via the prompt, the reasoning, and the footnotes/sources. But it is more than this - it also shows how the AI-written Review was framed, what Deep Research was told it could and couldn’t say. What tone it should take. What it should seek to achieve. There’s an honesty to this, that we don’t have in human decision-making systems. I’m not sure what it means for future AI-led reviews. There would still be the Freedom of Information Act problem - if you were to make everything public including the Review’s ‘prompts’ as it were, all of its Terms of Reference, you’d end up encouraging more self-censorship from the humans involved. But given this would be across the board for all policy, maybe it will be part of much wider change in how we manage and consider information.

*you can find my conversation with ChatGPT’s DeepResearch, that led to this output, here.

**for more on AGI and the Review, you might be interested in my earlier piece on this ‘In Athena’s Arms’.

*** What is Humanities Last Exam? The paper describes how “High accuracy on HLE would demonstrate expert-level performance on closed-ended, verifiable questions and cutting-edge scientific knowledge, but it would not alone suggest autonomous research capabilities or “artificial general intelligence.” HLE tests structured academic problems rather than open-ended research or creative problem-solving abilities, making it a focused measure of technical knowledge and reasoning. HLE may be the last academic exam we need to give to models, but it is far from the last benchmark for AI.”

Subscribe now

A Robot Writing a Report on UK Defence

2029: the AGI Election?

Keith Dear — Wed, 22 Jan 2025 23:17:07 GMT

Mark Zuckerberg said recently that ‘…probably in 2025, we at Meta, as well as the other companies that are …working on this, are going to have an AI that can effectively be a sort of mid-level engineer that you have at your company, that can write code. … a lot of the code in our apps…including the AI that we generate, is actually going to be built by AI engineers instead of people engineers’.

OpenAI’s Chief Product Officer speaks to the Wall Street Journal at Davos and is asked to comment on CEO of Anthropic, Dario Amodei’s claim 24hrs earlier that 2027 will be when AI will be better at everything or most things than humans. Responding, he tells us that AI went from ‘…the millionth best coder to the thousandth best coder to the 175^th best coder in 3 to 4 months.. …we’re on a very step trajectory here…I don’t even know if it’ll be 2027’.

A week earlier Bloomberg reported that it expects ‘Global banks will cut as many as 200,000 jobs in the next three to five years as artificial intelligence encroaches on tasks currently carried out by human workers.’

This week (19 Jan) Paul Schrader, writer of Taxi Driver asked ChatGPT for ‘Paul Schrader script ideas’ and noted it had better ideas than him. He sent it a script he’d written years ago and asked for improvements and, he tells us ‘In five seconds it responded with notes as good or better than I’ve ever received from a film executive’. Reflecting on this he mused ‘I’ve just come to realize AI is smarter than I am. Has better ideas, has more efficient ways to execute them. This is an existential moment, akin to what Kasparov felt in 1997 when he realized Deep Blue was going to beat him at chess.’

His words echo those of Lee Sedol, multiple world champion in the strategy game Go, who retired in 2019 saying ‘With the debut of AI in Go games, I've realised that I'm not at the top even if I become the number one through frantic efforts…’

I’ve long suspected that many journalists have realised this too, and it is one of the reasons reporting in the Times, the FT etc is so consistently focused on AI’s errors and failings, and not the breathtaking progress in the field. Contrast the economist Tyler Cowan, and pseudonymous AI-expert ‘Gwern’. Both explicitly acknowledge they are writing not so much for their human readers, as for the AI’s, as Cowan puts it ‘If you wish to achieve some kind of intellectual immortality, writing for the AIs is probably your best chance.’ Making your mark before being eclipsed.

But regardless, by 2029, a great many people are going to feel like Kasparov in 2007, Lee Sedol in 2019, and Paul Schrader in 2025.

Harvard Business Review research published in November suggests something I have long suspected: AI is likely to take more jobs than it creates. ChatGPT has reduced the number of jobs posted for online gig workers ‘…Writing jobs were affected the most (30.37% decrease), followed by software, app, and web development (20.62%) and engineering (10.42%).’ Similarly, in recent months Klarna, the Swedish FinTech, stopped hiring, and started shrinking its workforce as AI increased the company’s productivity. Salesforce stopped hiring software engineers on the basis it had achieved a 30% productivity increase driven by its AI tools.

It’s not hard to see why this is. Take a look at this illustration from @daveshapi on X:

And remember, from my last post, that we are now on OpenAI’s o3 model, which is better by orders of magnitude again.

I suspect that if you don’t subscribe, and routinely use, Claude or ChatGPT or Google’s Gemini, you probably don’t realise how good AI has got. This week it has helped my financial planning, allowing me to ask questions of it based on bank statements, and build projections, it has helped in the gym, ranking my performance and advising on specific exercises to help with injuries or for faster gains. It produced rapid briefings on people I was due to meet. It knocked up a podcast about a company I was to meet that I could listen to on route to meeting them, from just a handful of .pdfs on their products and services found online. It has helped reduce food waste – a photo of the fridge and store cupboard along with a quick request for help with recipe ideas yielding several (delicious!). It has checked out financial forecasts for my new business, refined words for a talk, helped me find sources, assisted me in understanding research papers and enabled me to ask quick questions on concepts I am shaky on while completing an online learning course. In business planning, I think I can scale without hiring, using AI Agents to do many tasks – including some of the most creative and interesting. It regularly refines the text I write and publish. It suggests ideas. …I like to think I have some smart friends (obviously, I never tell them this) but I don’t have any that could provide advice over such a wide range of topics, so quickly.

Consequently, I think we will see jobs lost to AI throughout the remainder of the current Government’s time in office – predominantly middle-class, knowledge-worker jobs. The forecasts I made in ‘In Athena’s Arms’ on AGI look increasingly likely to have underestimated the speed of AI progress, given the widespread reports that a number of the leading labs are very confident they have achieved, or shortly will achieve AGI. Earlier this month (6 January) Sam Altman wrote:

We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

We are beginning to turn our aim beyond that, to superintelligence in the true sense of the word. We love our current products, but we are here for the glorious future. With superintelligence, we can do anything else. Superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we are capable of doing on our own, and in turn massively increase abundance and prosperity.

Then he said hype around AGI was overblown. And then, alongside Oracle’s Larry Ellison, and Masayoshi Son of Softbank, announced the $500bn Project Stargate, probably named after the 1994 film where the Stargate was a portal that allowed advanced alien intelligence into our world. This is the Manhattan Project for AI, and like the Manhattan Project, it is being undertaken because those behind it believe the investment will yield disproportionate advantage to the US’s global power.

A massive increase in US global power is not some abstract thing that won’t affect UK elections. It is likely to mean the US dominating the world economy to a far greater degree, to begin colonising the solar system and beyond, bringing back resources on a scale that makes the wealth of terrestrial empires look trifling. This is a world where the breakthroughs in science and technology and the new products and capabilities they give birth to are concentrated in a single country. It is a world where we in the UK are relatively poorer, and perhaps absolutely poorer as our existing industries are decimated. Maybe it is world where our best option is to join the list to become the 51^st State.

This week also saw the release of the Chinese-developed Deepseek model, that seems to be performing close to the leading existing models but is completely open source. It costs far less to train – reportedly $5.6M compared with $78 million for OpenAI’s GPT-4o, and you can access if for free, or download the model from GitHub with an MIT licence and it’ll run on your laptop. You can modify it, do what you want with it. Should this model keep up in development with the US’ models, proliferate and outcompete the big US AI houses it would allow the UK to harness the AI revolution without the fear that the US might lock-us out. But now we would have to worry that China constantly had the better models, we’d have to worry about the model having security vulnerabilities built into it. Even without these concerns, the continued development of Open Source AI at the current rate of progress would still see AI models become a substitute for cognitive labour – knowledge workers are not displaced, taking on new roles with new skills, but replaced, with no other job available in which they might viably work, since all cognitive tasks are done better by machines.

Take another (kinda) more optimistic take from LinkedIn Founder Reid Hoffman, who sees AI ushering in a world where:

• Everything is cheap, reducing the need to work.
• AI makes services e.g. legal and medical dirt cheap.
• Focus shifts to ideas, not hard work.
• Anyone can start a business with AI tools.

In this imagining, it seems likely that the returns to human intelligence would increase significantly, with those able to frame problems and come up with new ideas becoming radically wealthier, while those less able become unemployable, replaced by armies of AI agents working for the brightest in society. Wealth inequality, social cohesion, even individual agency for many people, would be more severe challenges than we have known before.

There are many, and I among them, that think runaway economic growth is likely to follow the advent of AGI. Goldman Sachs forecasts growth from Generative AI at 15%. For AGI we would have to assume greater economic effects. I think diffusion will be held back by resistance in large, and generally sclerotic bureaucracies of big companies and modern national bureaucracies. It likely the company you work for, and country you live in is under-estimating what is coming.

But growth will still come, driven by start-ups and scale-ups a few large early adopters in business, and in countries where Governments take a longer, more strategic view. As a result. I think growth after AGI is likely to be greater than during the Industrial Revolution (between 1870 and 1900 UK GDP per capita rose by 50 per cent) and much faster in our globalised interconnected world. There will be many more losers than winners in this revolution. A rising tide that submerges many as it lifts just a few boats.

We could take a bearish look at what AGI might mean for the economy, preferring Tyler Cowan’s claim that AI ‘…will boost the rate of economic growth by something like half a percentage point a year.’

Cowan notes that ‘Over 30-40 years that’s an enormous difference. It will transform the entire world.’ Which is true, but I don’t think we’ll need to wait that long to see politics transformed by it, even in this modest estimate of AI’s impact. If we miss out on, say, an additional +2% economic growth over the period 2025-2029, against our anaemic growth performance (forecast at 0.9%), the UK election in 2029 will be the AGI election. Unemployed, replaced workers will be voters in a weaker and poorer Britain. They will ask why didn’t you see this coming? I’m not sure how the UK’s political parties will respond. I am sure they will be doing their election planning, policy planning, manifesto writing, and executing their election campaigns with a heavy reliance on, perhaps following the instructions of, AI. If they aren’t, they’ll lose to those who are. Just like the rest of us.

Eyes Wide Shut. AGI in Plain Sight.

Keith Dear — Sat, 21 Dec 2024 10:47:49 GMT

You won’t read this in the Times. If you are not on X, you might not even know it. But yesterday, OpenAI released it’s o3 model, and achieved 76% on the ARC-AGI (Abstraction Reasoning Corpus-Artificial General Intelligence) test. This is a series of tests deliberately designed as ‘the only formal benchmark of AGI progress’. In this conception AGI is one able to efficiently acquire new skills and solve open-ended problems. ARC was designed by those who believed LLMs would be unable to adapt to problems they hadn’t been trained on, couldn’t cope with novelty and relied on memorisation, rather than being able to reason a response. They distinguished between skill – the ability to learn to solve new problems, and task-specific intelligence – memorising the solution to a problem and then applying it. ARC was designed to test AI skill, ability to solve novel problems and to reason. Here’s how previous models had performed (via @kimmonismus on X) on the test, and how o3 now does:

As a result, many are claiming yesterday was the day AGI arrived. As Professor Ken Payne put it to Al Brown and I recently, the main sound one hears in response is the scrape of moving goalposts. No doubt many will find reasons to say this is not AGI. Francois Chollet acknowledged that this was a significant scientific breakthrough and step-change in AI capabilities, but also that

‘Passing ARC-AGI does not equate to achieving AGI, and, as a matter of fact, I don't think o3 is AGI yet. o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence.’

He might be right. But that isn’t really the point.

o3 has also radically improved AI’s performance on a number of other benchmarks over previous and rival models. For example, in software engineering (via @IterIntellectus on X):

…leading the same poster to ask what any knowledge worker should do now noting that ‘even if the model is $2000/month, it’s still cheaper than a graduate employee’.

o3 also solved 25% of insanely hard frontier maths problems, those that take many hours of intense, focused work from the leading mathematical minds, and on which previous AI models had solved less than 2%.

One OpenAI researcher described o3 as ‘bonkers good’ a ‘massive step up from o1 on every one of our hardest benchmarks’. No doubt I have not captured the full range of breakthroughs on benchmarks that o3 has achieved, and no doubt in any case that it will go further beyond the frontier of AI and human performance as it is tested more in the coming days, weeks and months.

Sam Altman has been quick to point out that not only is the model achieving these incredible results, it is doing so at a massive cost reduction – a trend he expects to continue. AI is not only getting better than you, it is becoming ever cheaper and more efficient than you.

This breakthrough came days after a paper was published showing that the previous o1 preview model had performed at superhuman level in differential diagnosis, diagnostic clinical reasoning, and management reasoning, across medical ‘…tasks that require complex critical thinking…’.

In August Deepmind released a robotic AI that could compete at human-level in table-tennis (winning 45% of games). In September Deepmind announced that its AlphaChip could now design semiconductor chip layouts at ‘superhuman’ levels. Earlier this month (December 2024) Deepmind showed its weather forecasting model GenCast outperformed all current models on the 15-day forecast and in predicting extreme weather events more accurately. Also this month, Deepmind released Veo2 outperforming OpenAI’s Sora in developing hyper-realistic video clips and began the roll-out of Project Mariner, an agent-based AI that can reason across your web browser, moving the cursor on your screen, clicking buttons, filling out forms – like tasking a human to do jobs on your computer for you.

If all that wasn’t enough, this month also saw the release of Genesis, a physics engine that allows robots to train in virtual environments – where one hour of training is the equivalent of 10-years training in the real-world. The breakthrough is remarkable and worthy of a post of its own, for example, given it shows that an “…AI can control 1,000 robots to perform 1 million skills in 1 billion different simulations, then it may 'just work' in our real world…” as one of the researchers that developed it commented. That this breakthrough is a virtual footnote in this article should tell you something. Can you feel the OOMs? (Orders of Magnitude).

In June this year, Leopold Aschenbrenner told us that AGI doesn’t require believing in science fiction, it requires believing in straight lines on graphs. In October, Rob Bassett Cross and I argued that the rate and direction of change was such that the UK Defence Review team might adopt the mantra ‘Don’t Blink‘ whenever discussing AI, as a way of responding to claims that ‘AI will never’ – such was the speed that AI benchmarks were being passed. That same month, I argued that the Review Team must make the arrival of AGI a planning assumption within the Review’s planning horizon of 2050, perhaps within the life of this parliament (2029), and that it would be an abrogation of its responsibilities if it did not include a rigorous forecast of when it expected AGI to arrive, with commensurate investment and action to the risk and opportunity presented. In many talks across the UK and US since, I have argued that ‘if you are not betting on AGI, you are betting against it’. I am far from a lone voice. We cannot say we did not know.

Meanwhile, the UK Strategic Defence Review continues, but the prevailing debate appears fixated on budgetary questions rather than strategic necessities. Lobbyists argue over the merits of acquiring specific assets—how many ships or F-35s are sufficient, the right number of tanks or artillery pieces, and even whether the UK should invest in a continental-style army (spoiler: it shouldn’t). For anyone attuned to the rapid, transformative pace of AI development, this debate feels like watching a government obsess over deckchairs as the iceberg looms. Not so much ‘Don’t Blink’ as ‘Eyes Wide Shut’.

Subscribe now

Indivisible. The Indo-Pacific, the Euro-Atlantic & the UK Strategic Defence Review

Keith Dear — Mon, 04 Nov 2024 07:31:13 GMT

Could there be a worse time for the UK to junk its 2021 ‘Indo-Pacific Tilt’?

There are 10,000 troops from North Korea on their way to fight for Russia in Europe.^[i] Just a few weeks ago, in October 2024, Chinese President Xi threatened Taiwan promising ‘the complete reunification of the motherland’ while ramping up military drills around Taiwan’s coast. Many warn an invasion attempt is likely within the next few years.

The CCP seeks to contest the democratic model globally but particularly so in its own region – making the Indo-Pacific the site of this century’s most important ideological competition,^[ii] in support of the subversion, sedition and propaganda, the CCP also deploys military^[iii] and economic coercion.^[iv]

And yet it isn’t primarily military and security reasons that necessitate the tilt. It’s the economy, stupid.

The world’s economic centre of gravity shifts every year deeper into the Indo-Pacific region.^[v] By 2030, more than half the world’s economic output will come from Asia.^[vi] In the next 10 years a billion more people are expected to join Asia’s middle class, and the consumer market there will be larger than the rest of the world combined.^[vii] The way in which digital standards and regulation are set in the Indo-Pacific & Asia will shape our world in the future, making that ideological competition even more crucial – it will determine whether the global model is largely digital democratic or digital authoritarian.

Brexit Hubris?

Germany, the Netherlands, France, and the European Union, all have Indo-Pacific strategies. Italy is said to have undertaken a ‘quiet pivot’ to the Indo-Pacific. A month ago Sweden’s Ministry of Defence has recentlypresented ‘policy direction’ to deepen its engagement in the Indo-Pacific. in January this year France, Germany and Spain sent Air Forces to exercise across the region. France, Germany and the Netherlands all announced their strategies months before the UK’s Indo-Pacific ‘tilt’, making something of a mockery of the politically convenient idea this was all just post-Brexit hubris. It is a region of increasing importance for the world.

Why should the UK care?

Before we come to interests, consider commitments, and culture.

A war in Taiwan is almost certain to draw in Japan, given both US bases there, and the proximity of Japanese islands to Taiwan. It would probably also draw in South Korea. Australia may well also be part of such a conflict, given its even stronger interest in ensuring the CCP’s authoritarian model and economic and military stranglehold on the region does not become so tight as to choke Australia of access to trade and threaten its security.

I hope we would stand beside many of our increasingly close allies in the region. But I find it particularly hard to imagine that the British public would allow a British Government to sit out a war if Australia were heavily involved and suffering significant casualties. If the solidarity we saw in response to the invasion of

Ukraine is any indicator, there would be Australian flags everywhere. Anger if we try to sit it out.

We have interests and commitments in the region too.

We are members of the Five Power Defence Arrangements (FPDA), requiring us to formally consult with Australia, Malaysia, New Zealand, or Singapore in the event of an attack, for the purposes of deciding what measures to take in response. There are regular exercises between the FPDA nations, and a decision not to help, while feasible, would be damaging – why should they trust us in future? What value such agreements if no help follows?

We have interests at stake too. Our economic dependence is considerable. US think-tank CSIS estimated that ~$2.45 trillion of goods transited the Taiwan strait in 2022 (see the brilliant visualisations here). I’d like to hope the National Security Council is well-briefed on the effect a war in the region would have on the UK economy, and our security, with detailed supporting estimates of our supply chain and export dependencies. Public analyses by the Guardian for example, have tended to focus on what we would lose if China stopped trading with the UK because of a war, usually with an acknowledgement of the centrality of Taiwan’s TSMC semiconductor manufacturer to the global economy. But we should worry too about loss of trade with Australia, Japan, South Korea, Singapore, Malaysia, New Zealand. Calculate the cost. Understand the risks.

Approximately 95% of all UK imports and exports by volume are moved by sea and virtually all data entering and leaving the UK travels by subsea cable. https://www.gov.uk/government/publications/board-of-trade-report-maritime Photo by william on Unsplash

A war would have a moral, cultural, diplomatic, reputational and economic ‘pull’ on the UK that would be difficult to resist – particularly if, as most anticipate, the war were geographically extensive, protracted, bloody and expensive.

Short of war, we have economic interests to advance, with defence and security actions needed in support.

What was the UK’s Indo-Pacific Tilt?

In short: the prioritisation of the region in accordance with its greater and growing import economically and geopolitically.

The ‘tilt’ was, a corrective to the series of decisions to withdraw from the region ‘East of Suez’ in the 1960s, begun under the Conservative Government of Harold MacMillan, but pushed further and most closely associated with the Labour Government of Harold Wilson and his Defence Minister Denis Healey.^[viii]

The ‘tilt’ was also a continuation of a process that began around 2015. when the UK realised how far its diplomatic influence and presence had waned East of Suez, as it tried to deal with the fall out of Parliament’s decision to prevent UK intervention in Syria, and the subsequent spiralling and internationalisation of that conflict by Iran, Russia, and many nations across the Middle East. This was a diplomatic return ‘East of Suez’.^[ix] One the ‘tilt’ sought to push into the Indo-Pacific, pre-empting, and hopefully where possible helping to prevent, war and crisis in this region.

The ’tilt’ was also a corrective to the outsourcing of British geopolitical thinking and active diplomacy during the UK’s membership of the EU. The end of the Cold War had seen cuts to defence and diplomatic mission alike, as we ushered in ‘the end of history’. The EU’s diplomatic capacity grew through this period, alongside the outsourcing of UK global trade deals to EU diplomats/trade negotiators. The result was a reduced UK global diplomatic presence, and a less activist diplomatic service – one that had become unused to thinking sharpy about the UK’s national interests. The tilt was an attempt to reemphasis the Indo-Pacific region in line with its increased and growing import, and, along with the Science Superpower agenda, reenergise and refocus the UK’s diplomatic, trade, and security activity there. It did also mean some harder thought as to what UK Defence could do to help deter war in the region, and advance the UK’s interests.

Unfortunately, the ill-fated timing of the UK aircraft carrier deployment to the region, meant it looked much more like a hard-power commitment, and drew criticism. The reality is that deployment was a commitment going back to then Minister Gavin Williamson’s time as Defence Secretary. But the tilt should be judged against the logic of its creation, which is at least as valid and persuasive now as it was in 2020-21.

Why does it matter for the Defence Review?

All this matters so much because the rhetoric around the UK’s ongoing Defence Review is all focused on NATO and Europe. But in truth, the greater threat to the UK comes from the CCP, and a war in Asia. Deprioritise it politically, and you can be sure that it will be eliminated from defence planning assumptions, on which our force design is based. Take away the focus the last Government put on it and all contingency planning across departments, for a war with the CCP, one that would likely threaten our infrastructure, our trade, our industries, will stop.

It also has clear implications for force structure. As I wrote on the Wavell Room in 2022, now is precisely the wrong time to start focusing our investment on rebuilding the British Army on the Rhine (BAOR), land forces to fight on the Continent. Continental powers are now building continental armies, Germany and Poland in particular - which concluded it’s deal for 1000 new South Korean tanks last week, similarly across the Baltic States of NATO’s Eastern edge along with Norway and new Nordic NATO members Finland and Sweden.

Our commitment to NATO need not be measured in Land Forces, and it makes less sense than ever to do so today. Many of the forces that BAOR was there to protect us against – the soldiers of East Germany, and much of Eastern Europe, are now on our side. We are less worried about ‘keeping the Germans down’ i.e. not having too large a German Army, as memories of the Great War, and Second World War, fade, and Germany is seen as a much different nation today. While just as we are stronger, with new allies, Russia is much weaker than the Soviet Union was, and is drawing its allies from Asia.

Another challenge for this review is that it is taking place, at least insofar as we can know from publicly available information, without any central direction as to what the UK’s vital interests are, and absent any principles against which difficult decisions can be made. There is no National Security Advisor, and no integrated review, rather there are, according to various reports more than 10 reviews (possibly, according to some, a lot more) going on concurrently in Whitehall, with the plan being to try to make some sense of how they intersect after they all complete.

When, in 2021, the team I had been a part of in No10, working with the Cabinet Office, published the Integrated Review we described the concept of ‘systemic competition’ where nations such as Putin’s Russia, the CCP’s China, but also Iran and other autocracies, exploit the seams and siloes of our system to avoid scrutiny, exploit it, and damage it. The worry has to be that there are many such seams and silos between all these concurrent reviews.

Systemic competition also recognised the reality of world politics, that all nations compete and cooperate at the same time, even with allies, and within alliances. Abandoning the Indo-Pacific tilt when others are leaning into theirs would be a recipe for retrenchment. Loss of prosperity and power – the ability to get things done.

Principles, Policy, Preparation

In writing of the preparation for the Second World War, and here recall the dire warnings of the imminence of conflict that we have heard around the world from many nations in the last 12-months or so, Britain’s first Cabinet Secretary, Maurice Hankey, wrote of the necessity of having a clear set of principles in order to build a coherent plan to prepare the country for war. Britain’s first principle, he wrote, should be its commitment to peace, since for ‘…a country dependent for its existence on imports of food and raw material, that have to be paid for by exports… peace is the first essential. However, it was unwise to trust that diplomacy, peace and neutrality were ways to avoid war – preparation was the best way to achieve the end of peace. Furthermore, such preparation must be led from the top due to ‘the same considerations which compel the Prime Minister of the day to take charge of the control and direction of war…’ compel that he leads its preparation (i.e. the need for a strong central power in a system designed for decentralisation, one that trends towards duplication, dysfunction, diffusion of responsibility – lack of accountability - on cross-cutting issues, unless there is central grip). Hankey went on to describe the policy that Britain derived from its principles, to which the plan and preparation must be subordinate. These were:

1. That so long as our naval supremacy is assured against any reasonably probable combination of Powers invasion is impracticable.

2. That if we permanently lose command of the sea, whatever may be the strength and organisation of the home force, the subjection of this country is inevitable.

3. That our army for home defence ought to be sufficient in number and organisation not only to repel a small raid, but to compel an enemy who contemplates invasion to assemble such a force as will make it impossible for him to evade our fleets.

4. That to ensure an ample margin of safety such a force may, for purposes of calculation, be assumed to be 70,000 men.

Of course, such a policy would today need to consider the domains of space and cyber, and the threats to UK security that run through them. But in both domains, the greater threat to the UK comes not from Russia and through Europe, but from growing CCP dominance in space and cyber, the risk that digital regulation developed under the CCP dominates globally. For as long as continental powers are building continental armies – which it is in the interests to do - the greatest threat to Britain’s security comes through the cyber, space, and the air and sea domains. Not from land wars in the East of the European continent. If we were dragged into such a conflict, a land force of 150,000 men, double the current force, would still be half the size of the Italian and French Armies. We would need to mobilise a ‘citizen’s army’ – far larger, maybe all voluntary, maybe conscripted. The opportunity cost of a larger standing army would be a loss of flexibility, and diminution in our capabilities in the cyber, space, sea and air domains, where the more direct threats to the UK lie.

As Hankey noted, the first principle must be peace, the first policy priority must be the prevention of ‘invasion’. Forces able to protect UK access to the ‘global commons’ – defined as those areas of the world beyond the control of any one state —sea, space, air, and cyberspace – forces able to protect our supply chains, to minimise attempts at sedition and subversion in the UK. Crucially: forces with the flexibility to support and protect our allies, support and secure our interests, both in Europe and globally.

HMS Belfast - Tilting back & forth? HMS Belfast, a Royal Navy light cruiser, was launched in 1938, a reflection of the UK’s principles and policies in preparing for war. She was part of the British naval blockade against Germany, escorted Arctic convoys to the Soviet Union, supported the Normandy landings, redeployed to the Far East and joined the British Pacific Fleet, and later supported British Forces in the Korean War. Today she is a much loved museum ship on the Thames in central London. Photo by Piotr Guzik on Unsplash

Leaning into the Tilt

The biggest problem is that tilt is likely to be junked quietly: just not mentioned in the review. For political reasons, this should be accepted. However, some new rhetorical device to emphasise the import of the region and its continued prioritisation will be essential if we are not to see the UK’s regional influence wane once more, and the threats to our security and prosperity increase, as Departments take the signal of its omission to reassign resources and focus elsewhere.

Euro-Atlantic and Indo-Pacific security are indivisible. To deter there, we must deter here, and vice versa. To secure our interests in such a vital region, we must have a clear focus on it, an unmissable signal in the Prime Minister and his Cabinet’s communications and later the various strategies including the Strategic Defence Review, that the region must remain a focus for Whitehall, a diplomatic, trade, and defence priority. Without clear principles and policies, politics and local departmental priorities will see the end of the tilt’s prioritisation of the Indo-Pacific, at precisely the worst time.

^[i] According to the UK Ministry of Defence, some almost certainly (>90% likelihood) fighting against Ukrainian forces in Kursk, with more set to fight for Russia on other fronts in Ukraine.

^[ii] https://www.aspi.org.au/report/truth-and-reality-chinese-characteristics

^[iii] https://www.aspistrategist.org.au/chinas-blue-dragon-strategy-in-the-indo-pacific/

^[iv] https://www.hybridcoe.fi/wp-content/uploads/2023/10/20231026-Hybrid-CoE-Working-Paper-25-Chinese-economic-coercion-WEB.pdf

^[v] Numerous charts and analyses show this, though I have not found a recent update to them – most report it as an accepted fact.

Or Fig 1.3 below from Sainsbury, D., 2020. Windows of opportunity: How nations create wealth. Profile Books.

In 2023, the UK Department of Business and Trade reported that “The world’s economic centre of gravity will continue to shift eastward.” ‘https://www.gov.uk/government/publications/global-trade-outlook-february-2023-report/global-trade-outlook-february-2023-html-executive-summary

^[vi] Ibid.

^[vii] https://www.dfat.gov.au/sites/default/files/minisite/static/4ca0813c-585e-4fe1-86eb-de665e65001a/fpwhitepaper/foreign-policy-white-paper/chapter-two-contested-world/indo-pacific-will-create-opportunity.html

^[viii] James, William D., '‘The most momentous shift in our foreign policy for a century and a half’: British grand strategy and the ‘East of Suez’ decision(s)', British Grand Strategy in the Age of American Hegemony (Oxford, 2024; online edn, Oxford Academic, 22 Feb. 2024), https://doi.org/10.1093/oso/9780198896609.003.0004, accessed 3 Nov. 2024.

^[ix] "The New East of Suez Question: Damage Limitation after Failure Over Syria". Royal United Services Institute. 19 September 2013. Archived from the original on 2 July 2015. Accessed 3 November 2024.