The Blindspot in Sequoia’s Future Tech Stack Vision: Judgement

Two recent essays from Sequoia's Jack Dorsey and Julien Bek paint a vision of future enterprises in a world of ever more capable AIs. We offer a review and challenge.

May 31, 2026

Gemini-generated image of the future tech stack. nb. not a perfect illustration of the essay’s content, and still some typos such as my favourite: ‘autonomed snrdes’ (!). But close enough to prettify the essay.

Sequoia’s recent AI theses share a blindspot. Jack Dorsey argues that AI can replace hierarchy inside the firm. Julien Bek argues that the biggest AI companies will stop selling tools and start selling the work itself. Both may be right. But both visions miss the key element of the stack: judgement.

You can automate heuristics. You can automate routing, triage and much of the routine work that fills calendars and part-justifies organisational roles and structure. But the hard part of management is deciding when not to act, how to evaluate trade-offs, how to identify context shifts from weak signals that invalidate previous heuristics, how to anticipate and make the calls before the signal shows up in the data flows (when it is sometimes too late), to predict and pre-empt. That is judgement. Failing to do these things is the stuff of disciplinary and improvement plan action for middle-managers. Without it, speed is just a faster way to leave the road.

In his co-authored article From Hierarchy to Intelligence Jack Dorsey argues that ‘…speed is the best predictor of start-up success’, and that a company “world model” can replace much of the information flow once carried by middle-management. He is right that hierarchy evolved partly as an information-routing system. He is also right that remote-first, machine-readable companies give AI far richer raw material than older organisations ever did. But middle-management does not just move information. It makes judgement calls, constantly.

Middle managers decide which customer complaint is justified despite not fitting the checklist of valid grievances, which delay can be tolerated, which opportunity is real, which risks are systemic and which temporary noise. They often take anticipatory action before the data is clear. Remove those layers and the decisions do not disappear. They rise. Soon the executive team is swamped by exceptions.

To be fair, Dorsey does concede part of this. He says people at the edge will still handle intuition, cultural context, trust dynamics, ethical decisions, novel situations and high-stakes moments. But that concession is larger than he admits. In many organisations, that is not the residue of management. That is the job, and unless the top executives are going to manage all the exceptions and uncertainties – the judgement calls - themselves, they will be forced to reinvent middle-management again.

Dorsey is right also that money is often an honest signal. It is not always a sufficient one. Take my own situation - a founder weighing a large but uncertain opening in insurance, a slowing renewal with a US defence prime that carries significant lifetime value, a safer but smaller renewal, or a bank opportunity with unclear timing - I cannot decide the prioritisation from revenue alone. The decision turns on expected value under uncertainty. Monetary value is not signal enough to make the call.

Furthermore, as some of Cassi’s dilemmas above highlight, internal speed is rarely the issue in business. We can accept that speed could well be the best predictor of success and that creating greater speed through superior information flow and automated work co-ordination is generally a good thing – and assuming for the sake of argument the slightly problematic assumption that formal recorded data and the accounts are sufficient to do this – our own experience in government and the commercial sector is that slowness in external co-ordination is always much more of an issue. Speed-running your work only makes a difference on rare occasions. The main task becomes coordination with external actors, all running off (to you) asynchronous timetables.

If internal speed mattered that much, it would be better tracked - yet while almost all companies track sales cycles, forecast when customers, investors and others will make decisions that affect revenue, cashflow etc., few if any organisations track internal decision-speed as key operational metric, nor is it something investors of all sizes and shapes show any particular interest in. If it were that predictive of success, you’d expect McKinsey to be turning up at every large corporate with a tried and tested formula to measure decision-speed, and improve it, and private equity would likewise make increasing decision-speed the central goal in turnarounds and post-takeover transformation. They don’t. Most organisations prioritise sound judgement over speed.

Bek gets closer to the heart of the matter in Services: the New Software. He separates intelligence from judgement, defining judgement as “knowing what to build next”. His copilot-autopilot distinction is sharp: copilots sell the tool; autopilots sell the work. But “knowing what to build next” is an insufficient definition, missing its prerequisite, the fundamental inescapable element of judgement: forecasting.

The Cambridge Dictionary definition helps: judgement is the ability to form valuable opinions and make good decisions. Both halves of which again point you at forecasting. An opinion is valuable only if it improves your model of the world, and you can only know how good this is by testing how well it predicts. Similarly, taking the second half of the definition - a decision is a forecast with expected consequences attached: if we do X, Y becomes more likely. Strategy, tactics, prioritisation and planning are all, at their heart, conditional forecasts.

Accept the argument above, and you are compelled to recognise that automating judgement is the frontier challenge. For Dorsey, the missing layer is a system that can decide when to act, when to wait, when to escalate and which trade-off to accept. For Bek, autopilots only capture the full labour budget once they can handle the judgement-heavy parts of the work, not just the intelligence-heavy ones.

Current LLMs are not optimised for judgement. They are tuned, through RLHF and RLAIF, to produce answers that are coherent and acceptable to humans – optimised to seek human approval. As The New York Times reported in January, “judgment is the faculty we rely on when trade-offs are unavoidable and the right answer is not waiting to be computed… a uniquely human skill.” That view is echoed by the London and Harvard Business Schools. It is also familiar. Each time a capability is labelled uniquely human, it tends to be temporary. Planning fell to chess engines, intuition to Go systems, bluffing to poker AIs. Creativity, discovery and reasoning have followed. Judgement will too.

That is not to say this is easy. Today’s models can be shaped, constrained and fine-tuned into narrower behaviours, but their default orientation does not change. A system optimised for human approval – “you’re absolutely right”^[1] - is ill suited to reliably “knowing what comes next”. That requires forecasting: reducing uncertainty about future states of the world. It is a different objective, it is hard, and optimising for it would mean telling your consumer base what they need to know, not what they want to hear – given (a) the dominant use case of LLMs is said to be therapy, (b) the timeless mantra that the customer is always right – delivering the unvarnished truth is likely to be value-destructive for currently dominant LLM companies. Moreover, building systems that do judgement well requires scaffolding, structure and empirical feedback that most current deployments lack. Dorsey’s model therefore assumes an optimisation function that does not yet exist in his proposed stack. It might be the most important frontier challenge. As Elon Musk has said “The right metric for intelligence is probably the ability to predict the future. You’re as intelligent as you can predict the future well.”

There are capabilities that allow you to do this. At Cassi our work reframes judgement as calibrated forecasting: explicit probabilities, tracked over time, tied to outcomes, predicting also the subobjectives and indicators that matter most. On ForecastBench, this approach has taken our forecasting to the level of the top individual human forecasters and close to the weighted superforecaster mean.Preliminary results on ‘benchmark’ questions – those questions where there are no forecasts on prediction markets or similar to baseline against – show our latest model surpassing that superforecaster mean. ‘The weighted superforecaster mean’ as the site reports it, is a wonkish measure of the best judgemental forecasts humanity can make under uncertainty. Once judgement is treated as something that can be scored, it can be improved.

This is the missing layer in both Sequoia theses. If corporations are, as Norbert Wiener described, “machines of flesh and blood”, then much of their function is amenable to automation. But judgement does not disappear when you remove hierarchy or sell outcomes instead of tools. Selling outcomes is what we do at Cassi – a fully generalisable solution – and it cannot be done without judgemental forecasting. Without a system to support it, executives inherit thousands of unresolved decisions. To repeat: speed helps, but only if the direction is sound. If corporations are, as Dorsey, Bek and I seem to agree, decision-making systems, competition will be in both speed and judgement.

This is also partly why management consulting remains valuable. It is judgement under uncertainty – you hire a smart, motivated hard-working group of people to enhance your judgement with theirs - which market to enter, which risk to take, which trade-off to accept – and the halo effect of having decisions blessed by brands known (or at least expected) to be ‘less wrong’ than others. Consulting is, as Bek puts it: “mostly judgement”. But that exposes the opportunity. If judgement can be decomposed into forecasts and assumptions, then the core of consulting becomes legible and, crucially, improvable. Not a one-off deck, but a continuously updated model with feedback on forecasting accuracy and calibration, and a record of which recommendations actually move outcomes, whose judgement to trust the most. Instead of giving McKinsey your watch so they can tell you the time, you have a proper scoring systems and a truth engine that surfaces the analysis and forecasts of those in your organisation with the best judgement, free of human psychological biases, the distorting influence of misaligned incentives and the stultifying effect of suboptimal processes. Do this and you save the outsourcing costs to consultants, get better, more defensible results, continuously, rather than in set-piece briefings, and save time in all those interviews you and your staff otherwise have to undertake with a consultant’s team – a more labourious and less effective form of elicitation. Cassi’s objectively scored forecasts and assumptions in our arena for human and machine judgement is a much bigger threat to McKinsey than Dorsey’s company and customer-world models.

Bek says that “knowing what to build next” sits at the centre of value. But this can’t be done optimally without rigorous forecasting. His article needs to push further and articulate what judgement, what ‘knowing what to build next’, means in practice. We argue that judgemental forecasting requires taking a structured, deliberate approach to uncertainty: carefully defining the desired outcome – what you would actually see in the world if you were to achieve what you want, forecasting how likely this is to be achieved, identifying indicators - predicting what would maximally increase the likelihood of success, and what would maximally decrease the probability of your desired outcome, tracking the changing probabilities and identifying new indicators as conditions change – continually minimising uncertainty - mapping the optimal path to success. As our tagline puts it: Everything is Prediction. The firms that solve this will automate work and capture the margin pool currently reserved for advisory. Copilot and autopilot.

Automating judgement is therefore the gap in the stack. It is what allows Dorsey’s organisations to move quickly without drifting, and Bek’s autopilots to take on the hardest parts of the job. Without it, speed amplifies error and software remains confined to heuristics, never fully replacing the service it imitates.

Addendum

Bek’s essay also has implications for business and the wider economy that he does not fully explore. We couldn’t make that discussion fit in the above argument, so we try our readers patience with this addendum. The previous argument is self-contained and standalone – this rabbit-hole can be safely ignored, unless you are interested in wider exploration of the future of work.

Generally, in business, we teach that a focus on core offerings and specific customer need is the key to success. While vertical integration (owning all functions from e.g. farm to fork) and conglomeration (diversification into unrelated industries) come and go in business strategy, maintaining a narrow focus on who you serve and what you do is, few would demur, key to success.

In contrast, Bek’s article implies a much more generalised approach. In the extreme reading of his argument, a small number of AI companies:

a. become 100% vertically integrated, disintermediating everywhere, taking over the functions of their customers, suppliers, service providers, distribution, sales partners etc.;
b. become maximally horizontally integrated, merging, acquiring or outcompeting all companies in any given sector except those companies that also switch to the fully-automated AI-enabled model Bek describes;
c. become conglomerated at cross-industry breadth that would make Japan’s Trading Companies, or those of the mercantilist, colonial era like the East India or Hudson Bay Company, blush.

This is in some ways not a new idea – fully-automated luxury communism, or monopolistic or oligopolistic outcomes of the AI revolution, a single or small number of companies doing all economic work - are frequently discussed as possible outcomes from the AI revolution. What is interesting about Bek’s essay is that he begins to describe how this might happen – indeed is recommending that AI-companies start thinking on these lines.

However, there seem to us some gaps in Bek’s logic.

Bek says that you should start selling outcomes. He cites AI-law firm Crosby as one example, noting they sell legal documents direct to companies that need them, rather than to the lawyers that once would have written them. In another, WithCoverage disintermediates insurance brokers by selling policies direct to companies needing insurance.

Bek says that as companies like Crosby learn more and more about the customer, they will replace the current service providers – in this case large law firms – entirely. In doing so Bek assumes that AI systems will accumulate proprietary data about the domains they operate in them (depth) and this will create long-term narrow advantage. But this requires Bek to ignore the fact that the success of LLMs has largely been based on overall scaling dominating narrow specialisation. Why should you buy an application that writes NDAs, instead of one that writes documents, including NDAs? If an agent can exercise judgement, it should not need the focus and scaffolding that businesses currently provide when they pick a specialised, targeted application and seek to outcompete in a given niche.

Even if Bek is right and these companies like Crosby do dominate narrowly via their access to industry or company specific data, the dominant AI companies should, on Bek’s logic, just buy a single competitive player in any of the fields Bek lists and then incorporate the data they have into their training of the next generation of AI models.

Bek ignores the ‘general’ in AGI. AGI’s definition is often contested, but however one defines it, it should include the general ability to complete legal and insurance documents, the two primary examples Bek cites.

In contrast, our contention would be that current models don’t optimise for judgement. If AGI means anything, then performing specific intelligence-based tasks is what it has to mean. So there must only be the judgement layer(s) left to solve. We disagree with Bek on the ability of current models to provide this, but are firmly in agreement that only judgement-based systems will be commercially relevant after AGI. Hence Cassi.

Or visit us at www.cassi-ai.com

^[1] See March 2026 Stamford study, Sycophantic AI decreases prosocial intentions and promotes dependence: “Cheng et al. measured the prevalence of social sycophancy across 11 leading large language models …. The model’s responses were nearly 50% more sycophantic than humans’, even when users engaged in unethical, illegal, or harmful behaviors. Users preferred and trusted sycophantic AI responses, incentivizing AI developers to preserve sycophancy despite the risks.”

Discussion about this post

Ready for more?