Insight: AI is shifting workloads. Is it making less work?

2026-04-26

By Alexander11 min read

Earlier in the week, I was sent a video from a YouTube channel where an influencer was demonstrating their use case for agentic AI. You’ve likely seen similar videos where a person demonstrates how they’re using an LLM-based tool in an agentic mode to conduct a legal document review or do a bulk accountancy task.

Understandably, I get asked my opinion on AI tools quite a bit and in fact, as most working in technology, I found myself an early adopter of many tools now considered AI, like Google Vision (for OCR) and Cursor (for coding). Nevertheless, I (and I am not the only one) caution about following these sorts of demos.

As I will explain below, many of the use cases AI could be applied to simply don’t meet the bar of what it should be applied to. Worse than that, I feel as though misapplication of AI is, in many cases, expanding workloads and creating new types of problems that many people don’t adequately anticipate.

Lots of these tools are revolutionary when scoped correctly and are being adopted dramatically faster than many of the other similarly hyped technologies over the last decade (here is looking at you, Blockchain). Nevertheless, it's important to understand how enthusiasm for new technology can overstretch its actual abilities.

Demos, as I mentioned above, look compelling, but they rarely say what happens next. Usually, at some point in the video, the host says, “The AI made a mistake here, that company already exists, I will go in and fix that later…” or some other sort of innocuous hand wave. Misclassifying an entire company can be quite costly. The critical questions these demos don’t ask are: What does it cost when it gets it wrong? How long will it take to get it right? Would a human make these same types of mistakes?

I do have some key insights from the world of technology, specifically coding, that I feel apply broadly to AI workflows, so please bear with me as I explore them.

Insight from application of AI to Coding #1: The perception gap is real and documented.

The effect of AI on human coding workflows has been studied scientifically. Researchers writing on arXiv undertook a randomised controlled trial in which 16 experienced developers were given 246 real tasks, with and without AI, randomly assigned. AI made them 19% slower. However, importantly, the developers predicted it would make them 24% faster and, even afterwards, still believed they'd been faster by ~20%.

The gap between belief and reality is the most important number in the study because users of AI feel productive even when it isn't. Prompting AI and getting the prompt properly understood by the LLM can, and often is, slower than a human who, with context awareness and experience, intrinsically understands the task steps without explanation. Consider the recent release of AI Memory systems, which are really just a new layer of software designed to do something instantaneous to a human: remember details. Consider the cost of implementing and securing systems which contain the entire context of an organisation to improve the accuracy of an LLM’s output - is time saved, or is it shifted elsewhere? Can you even legally do it? Is it wise to do it? How does adding context affect spend? Will it be cheaper?

There’s something about delegating to AI that makes people a little overconfident, too. They believe the technology should be improving things but do not necessarily examine later if it actually is.

Insight from application of AI to Coding #2: The bug rate is worse, and evidence of that fact is converging across independent sources.

There are multiple studies that show in real-world use cases, AI tools are introducing new errors that humans were not making before. CodeRabbit looked at 470 Pull Requests and found that AI produces 1.7x more issues overall. Critically, logic errors were 75% more common, and security issues were up to 2.74x higher. Both of these are areas where failure can be extremely costly.

AI developers have seen this as a data problem. They simply need to provide more and more training data for the situations where logic issues have been created, and they will eliminate them. However, humans often act in ways that defy logic or the business's aims, and these behaviours have resisted more basic forms of automation in the past. What do you do when you accept the customer simply isn’t logical, or at least is working on a bounded rationality not in line with the business? How do you explain that to a machine when it's in your specific domain?

Moving on, Faros AI conducted a large-scale telemetry scan and found that AI adoption increased bug counts by 9% but increased pull request sizes by 154%. There was no significant correlation between AI adoption and improvements at the company level across throughput or KPIs. AI was writing more code, but with more bugs; review time went up 91%. The workload shifted, and organisations had to solve new problems.

Git Clear found in December 2025 that “Code churn, defined as the percentage of code that gets discarded less than two weeks after being written, is increasing dramatically.” Copy-pasted code, which hit the git repo unedited, also increased, suggesting that much of the work was being done by people accepting the code generated as suitable. Several open-source projects have been overwhelmed by AI generated "spam code" and been forced to close their contributions entirely. The problem is real, present and growing.

Finally, Stack Overflow's 2025 Developer Survey also validates that the findings are felt among active code maintainers. Trust in AI amongst technologists has actually been falling, not rising, with adoption, as more people encounter the issues outlined above.

Insight from application of AI to Coding #3: The workload shift problem. Validation isn’t free and often is more expensive.

The difference in pay scale between a junior developer and a senior developer isn’t trivial, and shifting work higher in many cases costs more than getting it right in the first place. For example, if you are paying $40 an hour for 10 hours of human coding, you will spend $400, but if you are paying $150 an hour for 3 hours of code review, testing, or worse, $250 an hour for emergency repairs / cleanup, you will spend $450 or $750, respectively. This is before you factor in API call costs / subscription fees and other costs associated with AI tools. Absurdly, many organisations in implementing AI have begun reviewing AI token usage of developers as a KPI positively, assuming more AI tokens mean more AI use, and therefore more efficient adoption. As TechCrunch recently reported, that’s created a perverse incentive in some development teams to both spend more money and slow down.

Like many, we use generative tools for wireframing, conceptualising and basic development. In the hands of already skilled users, with adequate caution, AI tools are an accelerator. AI Code review is a thing, and indeed, also in capable hands may be an accelerant; whether this will reduce costs or increase them remains to be seen over time. Caution is advised.

This problem is not unique to generative or agentic AI, and The Economist mused back in 2014 that the economics of driverless cars don’t necessarily work out in their favour. It is now more than a decade later, and, at this point in 2026, it seems autonomous vehicles are coming, but slowly, and we may see broader adoption in the 2030s, assuming human legislators decide they are even in the public interest. Even then, I suspect they will compare unfavourably with the economics and health benefits of simple bicycles for inner-city commuting, so people will still use them, and I also suspect car enthusiasts will find the driverless experience soulless.

Summarising Insights from AI in Coding

So to summarise, the evidence as of 2026 is showing that AI coding tools, although very powerful in the hands of experienced, prudent, responsible coders, and used ubiquitously at this point, are shifting work more than eliminating it, and appear to have licensed (probably for human reasons) a whole category of new error types which require more, not less, human attention, and perhaps at a higher skill level.

AI may be automating the easy jobs, leaving us with more hard ones.

So, why are some tasks a particularly bad fit…

Gartner predicts that by 2027, half of the companies that attributed headcount reduction to AI will rehire staff to perform similar functions.

Extrapolating the findings from the field of coding, where AI is quite advanced: error tolerance in some work should often be near zero because mistakes compound and don’t just sit there. Consider a legal or bookkeeping review of 1,000 entries, and a 2% error rate would mean 20 errors to find manually, in domains where finding them requires expertise. You haven’t removed the expert from the loop, but you have changed what they’re doing and possibly made their job harder, as they now have to review output, source data, and then correct them.

The irony in all this is that the larger the volume of AI processes, the more validation work you create.

Where Agentic AI actually makes sense, today…

There are three clear categories of tasks you can apply AI to today without much consequence.

First, if your task is one in which errors are low-stakes or self-correcting (drafts, summaries, ideation), then AI can certainly be an accelerant. It is fantastic for bouncing ideas off (if somewhat sycophantic at times) and surfacing information from large datasets. The defining characteristic here is that the human is the final step anyway, so AI errors get caught as a matter of course rather than requiring a separate validation pass. For example:

Meeting notes and summaries: if the AI misattributes a comment or misses a nuance, the people in the room know, and it costs nothing to fix (usually).

Drafting Marketing copy and social posts: the edit pass was always going to happen; AI compresses drafting time, not judgment time.

Research Synthesis or Reviews for specifics: surfacing and summarising sources is genuinely useful even if you never cite the output directly without checking (and you should not).

Second, high-volume, genuinely low-precision tasks where 90% accuracy is sufficient and, importantly, the 10% doesn't matter too much. For example:

Sentiment analysis at scale: if a few miscategorisations don't change the aggregate signal and you're looking for the trend, not the individual data point, you may decide its fine. Though in many cases, you may be better with conventional code, tools like Puppeteer, and a word dictionary.

Document classification: routing invoices, contracts, or support tickets to the right folder or team. Misroutes get caught; the alternative is someone doing it manually all day.

Third, your task has a tight feedback loop and is easily reversible. Think of things like code in a well-tested environment, where the tests catch the errors immediately, and rollback is trivial. This is why AI coding tools work better for isolated modules than for complex systems.

The thread connecting these is that the output feeds a human decision or gets aggregated into a signal, and individual errors are either caught downstream, reversible or statistically irrelevant. If that does not apply, then you need to examine if it's a work shift or genuine time saved.

Key questions to ask before adopting an AI workflow.

The marketing and influencers show the demo and almost never show the validation pass or aftermath. Developers themselves cannot accurately perceive whether AI is even helping, so what hope does a non-technical buyer have?

So, as we navigate the future of technology, we need to ask ourselves four questions before adopting AI in our workflows.

1. What is the actual cost of an error in this domain?

2. Who catches the errors, and how long will that take?

3. Does validation require the same expertise as the person doing it?

4. Am I saving work time or just moving it?

"Agentic AI" describes a capability, not a quality guarantee, so the right question isn’t “What can it do?” The question is “Should AI be doing it?”

Subscribe to Our Newsletter

The Dangerous Myth of the Independent Buyer

In the race to streamline and digitise, many businesses have overcorrected—removing human touchpoints in favour of frictionless, automated eCommerce. But for complex products, B2B environments, and traditional industries, this shift can alienate customers who still need guidance, reassurance, or conversation.

Read Insight