Automate the MOD’s R&D
With AI nationalism heating up, the AI frontier accelerating how does the UK, and MOD, remain relevant and keep up?
AI Nationalism Heats-Up
On Friday (12th June), the US Government banned foreign countries and nationals from accessing Anthropic’s Mythos, an action that has likely only accelerated a process that was coming anyway (see Faustian Bargain 2.0). This follows calls for AI to be nationalised - Palantir’s Alex Karp has warned of it, US Senator Bernie Sanders has called for it, and Republican Senators have called for the same. Meanwhile, a Chinese court has ruled that companies can’t lay off workers on AI grounds, President Trump has used the Defence Production act to accelerate power generation and the electrical grid role out, in part to meet expected AI demand, adding to the US’ aggressive, private-equity meets Mercantalism-like pursuit of the rare earth minerals on which chip-production and grid roll-out (copper is included in the US definition of rare earths) depends (here and here).
The US actions in Venezuela and Iran benefit the US in AI too. Whilst we do not claim these actions are solely or primarily, about AI, they do however benefit the US, which benefits from a domestic supply of oil and high prices for export, higher revenue for oil companies, and higher tax receipts for the US Government, while disproportionately affecting much of the rest of world. Russian Foreign Minister Sergei Lavrov has called this, a ‘doctrine of dominance’, and while we have no admiration for Russia’s foreign policy or its leaders, the phrase is apt for how the US is positioning itself for AI leadership. As a primary ally of the US, this is to be welcomed, but noting that no nation has permanent allies, only interests, and that all nations both compete and cooperate with allies – it is still alarming to see us falling further and further behind as we prevaricate and the US acts.
Automating R&D
What is to be done? Here we argue that automating R&D itself must now be the priority for the UK, and the MOD in particular, perhaps second only to building resilient power (Small Modular nuclear Reactors (SMRs)) and data infrastructure underground across the MOD estate.
A couple of weeks ago Anthropic told the world it is worried by how close it is to recursive self-improvement which it describes in the posts title as when ‘AI builds itself’ or in the report as when “…an AI system ..[is].. capable of fully autonomously designing and developing its own successor.” AI building itself like this is sometimes known as Recursive Self-Improvement or RSI.
Anthropic’s warning followed that of Jack Clark, one of the company’s founders. Jack’s 4 May blog on automating AI research did a great job of assembling the evidence for the progress being made towards RSI, something that is an explicit goal of all the large labs, and would likely lead to what IJ Good described in 1965 as likely the foundation of an ‘intelligence explosion’. Good wrote:
“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind”
This is the world we seem likely to be approaching. Forecasts on Metaculus[1] suggest recursive self-improvement will be achieved ~2029 (Anthropic: 2026-2028; Google: 2029; OpenAI: 2029-2032). Against a much more stringent definition of recursive self-improvement, Cassi’s AI forecast of the likelihood of RSI before 31 Dec 2027 is 13% and 54% by 1 Jan 2036.
In the world Good describes, and Jack Clark and Anthropic suggest is nearing, war and warfare will be won or lost as AI decision systems do battle to probabilistically predict their ‘theory of winning’, to predict the best combination of capabilities and tactics to achieve to win the battles needed within the war, to predict the right strategic objectives to attain the desired outcome. To predict the right doctrine to deploy forces optimally.
But in fact, I think war will be won or lost before this. Prior to a war we must predict which industrial capabilities to build. We must predict what capabilities we will need. We must predict how to deploy them, and where, and to do what. For example, we must predict whether we are better off, as Anduril’s Founder Palmer Lucky has said, building for a future of subterranean warfare, in response to a battlefield that resembles that in Ukraine - like something from the Terminator, with a kill zone perhaps 50km wide in which movement is almost impossible.
As with all prediction, the aim is be less wrong. As Sir Michael Howard put it:
“It is the task of military science in an age of peace to prevent doctrine from being too badly wrong. …What does matter is the ability to get it right quickly when the moment arrives.”
Today, we must be less wrong in our forecasts before the war – of what we must do, what we will need, and how we will need it. And we must build a system that is able to update those predictions where they are wrong, as quickly as possible.
Military science is a subset of science, and science is being increasingly automated.
The U.S. Genesis Mission sets out explicitly to accelerate the application of AI for transformative scientific discovery focused on pressing national challenges. Launched by executive order on 24 November 2025, it puts the Department of Energy at the centre of a secure American science and national security platform, combining federal scientific datasets, supercomputing, AI agents, foundation models and automated laboratories. Its purpose is strategic acceleration: to drive breakthroughs in biotechnology, critical materials, nuclear fission and fusion, quantum, semiconductors and advanced manufacturing, while strengthening national security and US technological dominance. A few days ago, Japan became the first international partner in this mission. In light of the recently imposed US Government ban on foreign countries and citizens accessing Anthropic’s Mythos, we can only hope the UK MoD and our Embassy in Washington are hammering on the door to make Britain the next nation to join.
I have been pushing the urgent need to start automating R&D for some time. In a previous role in industry, I met several times with Professor Faez Ahmed at MIT, who runs their De:CoDE lab dedicated to AI-driven design. We explored options to show how we might autogenerate component parts for drones using generative AI for design. But I could not find a defence industrial customer, nor an MOD agency, department of official, interested in working with them.
Now, De:CoDe is funded by the US Navy to work on ShipGen building a ‘diffusion model for parametric ship hull generation with multiple objectives and constraints’ in other words, ship hull design at the touch of a button with generative AI. Where is the project with even a tenth of this ambition from the MOD?
It is now several years since Deepmind’s Graph Networks for Material Exploration discovered 380,000 potential new materials – equivalent to 800 years of human knowledge. Materials science, you’ll recall from other blogs, was an interdisciplinary science born in the Second World War to produce new material for war-winning weapons. Today, self-driving labs for materials and other sciences are proliferating, automating more and more of the process.
In a further indication of where we are headed, former Director of AI at Tesla, Andrej Karpathy has released his autoresearch repository publicly. Karpathy recently joined Anthropic, with the mission to speed up AI-assisted research.
Engineering’s digitisation has been progressing rapidly, notably hitting the headlines when then USAF acquisitions chief Will Roper argued for a ‘Digital Century Series’ of aircraft – rapidly iteratively produced using digitisation, synthetic environments and simulations, memorably described in the 2020 paper Matrix-referencing paper ‘There is no Spoon’. Roper is now CEO at Istari Digital, and has been working to make this vision a reality. While this article was being drafted, Jeff Bezos’ AI start-up announced it has raised £12bn, following up on a £6.2bn raise last year. This is to build an ‘artificial general engineer’ to, as TechCrunch puts ‘replace large swaths of engineering work with AI’. In a New York Times interview Bezos said the ‘“what Prometheus seeks to do is to offer a set of tools that dramatically accelerates..[the]..invention loop.” Bezos and his cofounder do not believe this will leave to total automation – indeed they argue the demand for engineering skills will increase – but automating R&D as far and as fast as possible is still the explicit goal.
It is not that these systems are perfect and will just work off-the-shelf on purchase. If that were the point we had reached it would likely be too late – others would have recursively self-improving AI, fully automated science, and automated R&D – the UK would be facing an ongoing diminishment as a second-tier AI nation. Today, end-to-end “autonomous discovery” has major limitations. Current systems can generate plausible experiments and papers, but independent evaluations show serious weaknesses in literature review, novelty assessment, coding reliability and substantiation. One evaluation of Sakana’s AI Scientist last year (2025) found that 5 of 12 experiments failed because of coding errors, and described the outputs as fast and cheap but often shallow.
This is precisely why the time to act is now, while the frontier is within reach. A sensible starting point would be for the MOD to seek to work with the UK-based start-ups Recursive and Ineffable (both in receipt of big Seed funding recently ~$500M and £1.1bn respectively), and other UK companies at the frontier - those building the next wave of AI, beyond the current transformer-based LLM architectures.
A reminder of the cliché – today’s AI is the worst it is ever going to be. The limitations in the current tech are the opportunity. This is just the beginning.
So What for Defence?
Why does this matter to the MOD? In a world where scientific breakthroughs, new materials and new designs with those materials can be automated, where we have ultraintelligent systems self-improving, directing battles, constantly seeking to out-predict the adversary as to the next optimal action vs the optimal strategic objective, with the optimal mix of capabilities in the optimal tactical deployment – Howard’s ‘…ability to get it right quickly when the moment arrives’ is going to be dependent on heavily automated our innovation process.
We will need in effect a ‘Dowding System’ – rather than an air-defence network for spotting incoming aircraft, an innovation-defence network for detecting emerging threats before they arrive, whether in the lab, the factory, field trials, the air, ‘frontline to factory’ at machine speed.
A 21st Century Dowding System would pick up faint signals early, interpret them quickly, and task self-driving labs, automated factories, doctrine engines, tactics generators and decision-action systems to respond at machine speed. The aim is to compress innovation cycles from the six-weeks we see in Ukraine, to minutes, days or less: predicting how capabilities must adapt, what technologies are needed, and how they should be employed and deployed before an adversary can exploit the advantage. As we move into an age of digital fabrication and edge intelligence, we could be literally be redesigning aircraft in flight.
This won’t be simple. Our current “analogue” and human-centric processes cannot simply be replicated in digital form. Automated R&D, instantiated in this 21st Century Dowding System would need to consider from scratch what is required, and build. Digitisation of bits of the system won’t be enough. An example would be the impressive Whittle Lab in Cambridge, or the laudable efforts in showcasing digitalisation in manufacturing within Innovate UK’s Catapult network. Useful, but not the systemic approach that is needed. The 21st Century Dowding System, like its Battle of Britain predecessor, would give us a national advantage from its system design and integration of such capabilities.
This might seem far-fetched, but if we are on the brink of RSI and automated science, an intelligence explosion, this is the world we should be preparing for.
When?
For all the furore that will follow in the coming days about the US’ restricting access to Anthropic’s Mythos, a model that:
· according to the UK AI Security Institute can carry out multi-stage attacks on vulnerable networks and discover and exploit vulnerabilities autonomously – tasks that would take human professionals days of work.
· Senator Mark Warner claims that Mythos broke into almost all the US’ classified systems not in weeks, but in hours – likely exaggerated, but still.
· Mozilla, makers for the Firefox browser, fixed 423 Firefox security bugs in April 2026, including 271 found with Claude Mythos Preview, of which 180 were rated high severity, dwarfing the number they normally find.
· in Cloudflare’s AI exploit research, run across more than 50 repositories, Mythos combined multiple low-severity bugs to create higher-impact exploits, generating working proofs of concept, Cloudflare were forced to industrialised validation with 50 concurrent “hunter” agents to patch and defend closer to the pace at which the vulnerabilities were being found.
· is now confirmed as having been deployed by the US’ NSA, its much larger equivalent of GCHQ.
· (highly speculatively) may have been involved in the massive hack of China’s supercomputing infrastructure earlier this year.
· represents a superhuman, on some measures, security threat now employed by a digital espionage agency.
The US Government’s action banning foreign countries and nationals from accessing the latest models has likely only accelerated a process that was coming anyway (see Faustian Bargain 2.0).
And in the end, Mythos is not about Mythos. It is about the trend and the rate.
Given current rates of progress in AI, in four months or less we should expect Mythos’ capabilities to be eclipsed by July/August, with the successor frontier model able to complete coding tasks that would take greater than ~32 hours for a human to complete, with a similar doubling in the model’s ability to complete agentic tasks in scientific reasoning, mathematics, the use of computers for tasks, and rapid, if slower, improvements in agentic self-driving systems. And we should expect a doubling in the length of tasks the models can complete four months later, and four months after that too, and so on. We should expect benchmarks to be saturated, AI capabilities to get stronger and stronger.
Given current rates of progress in open-source models, as reported by Epoch AI’s research, we should expect to see Chinese models, and open-source models which are as good as Mythos, deployed by hostile states, against the UK in approximately four months. We have until August to find out how much, if at all Senator Warner was exaggerating, and for companies and Government alike to learn if Mythos can find vulnerabilities in our software systems at the scale and severity it did vs Cloudflare and Mozilla.
But again, it’s the direction and the rate that matter.
Automating science and R&D, is the endgame of the intelligence explosion. The MOD and UK should begin investing in and building what is needed now.
Addendum
I considered writing this blog to look at what we might do in Defence and national security if we anticipate AGI is imminent, something increasingly urgent, and for which I think there is literally zero planning. but the post on this from 2024, “In Athena’s Arms”, holds up pretty well. Especially the recommendation on spending 1.14% of the Defence budget on AI, equivalent to what Google spends on Deepmind. I note, looking back at it, that neither the MOD nor the UK has done anything that resembles the level of seriousness in those recommendations.
Ken Payne over on his blog described Mythos as a ‘code red’ for the MOD a few weeks back, and followed it up more recently with a superb and short blog observing that next year, the US military will spend more on AI and autonomy than the entire US Marine Corps budget. As Ken wrote ‘We are smaller than the US. I hear you. But we certainly aren’t spending as much as any one of our services on this.’ My 1.14% recommendation in 2024 looks modest by comparison.
More important, because it should precede such capital allocation and inform it, is the recommendation I have made often for many years: the MOD (and HMG) must publish a forecast, or at least have a forecast, for when it believes advanced AI capabilities (AGI, recursive self-improvement, superhuman robotics etc) will be with us, what it believes machines will never do in warfare, with probabilities – it cannot prepare proportionate to the risk, threat and opportunity if it does not state these. This is no different to the way that it needs some assessment of the likelihood of war, and who it might be fought against, to plan effectively, and proportionately. And yet as far as I know no such forecast exists.
A further recommendation would be that the MOD should state specifically what would change these assessments, and setup intelligence requirements and create benchmarks to track and test AI capabilities.
There are no defence benchmarks for AI. There are for almost every other industry, which is how we get a sense of the progress being made towards human-matching or superhuman capabilities.
Moreover, the forecasts, intelligence collection and benchmarks go together. You must be clear in what it is you believe AI and robotics will never be able to do in defence and security, so you can design your benchmarks to challenge your assumptions.
Collecting intelligence, analysing it and using it to forecast ensures you are not complacent, and you continuously test your assumptions.
Benchmarks enable you to define what would change your mind. When AI capabilities continue to improve at the rate they are, being prepared to do this quickly and objectively must surely be essential.
Summary of Recommendations
Automate MOD R&D as a core defence mission: build a “Dowding System” for innovation, linking AI agents, self-driving labs, automated design, automated factories, doctrine generation and rapid capability adaptation.
Create a live MOD/HMG AI forecasting function: publish or internally maintain probability-based forecasts for AGI, recursive self-improvement, superhuman robotics and AI-enabled warfare; state what machines are assumed never to do; define what evidence would change those assumptions.
Fund, benchmark and partner at frontier scale: spend materially on AI and autonomy, at least at the proposed 1.14% of the defence budget; create defence-specific AI benchmarks and intelligence requirements; join or replicate efforts such as the US Genesis Mission; work with frontier UK and allied labs and companies before the frontier moves out of reach.
Infrastructure & Targeting Review: small modular reactors and data centres underground, plans to target adversary AI infrastructure and protect our own, two of the more urgent recommendations lifted from In Athena’s Arms.
But the single biggest recommendation is for urgency, and action proportionate to the risk and threat. The UK must build sufficient capability, economically, and in defence, to have leverage in the world that is coming. Or, after the Mythos restrictions on Friday reduced us to a Tier-2 nation, the world that is already here.
[1] These predictions are limited by the question being forecast, which ask only when OpenAI, Google and Anthropic will report reaching given risk levels on ‘AI R&D self-improvement’, which does mean RSI, but forecasting on risk levels and when they will be reported may, or may not, be the same thing as forecasting when RSI will be achieved.





Richard Dawkins would be proud! I wonder if AI has read The Selfish Gene...