An AI industry of more than $1 trillion that can’t ship any functional production code An AI industry of more than $1 trillion that can’t ship any functional production code If you’re worried about the so-called AI apocalypse everyone keeps bemoaning, consider this: after burning hundreds of billions, we still can’t ship reliable code with the AI toys we were sold. (See here for further details on the burst of the AI bubble ) (See here for further details on the burst of the AI bubble Count the “breakthrough” announcements. Count the CEO promises. Now count the mess: an exponential surge of garbage code, driven by the same four errors (no, not “hallucinations”, that’s marketing, not science) that nobody has actually fixed. same four errors You’ll see those four in the table below. Then we’ll walk through real, checkable examples, so you can spot them instantly in your own AI-generated code. Are they really that decisive? Yes. OpenAI ships GPT-5— same four errors.Google rolls outGemini 2.5— same four errors.Anthropic unveilsClaude Sonnet 4.5— same four errors.Microsoft wiresCopilot into everything — No surprise there, right? GPT-5 Gemini 2.5 Claude Sonnet 4.5 Copilot Every launch arrives with AGI fanfare. And within days, the stories pile up: tangled spaghetti code, fresh security holes and the same sad punchline — AI code looks smart, behaves dumb, and still needs humans to clean up the mess. Every launch arrives with AGI fanfare. And within days, the stories pile up: tangled spaghetti code, fresh security holes and the same sad punchline — AI code looks smart, behaves dumb, and still needs humans to clean up the mess. The numbers are painfully staggering Short-term churn rose by roughly 100% compared with 2021. Developers using AI write 40% more security vulnerabilities Experienced developers are 19% SLOWER with AI tools Engineers in a selected group of significant companies reject about 70% of AI code suggestions. Short-term churn rose by roughly 100% compared with 2021. rose by roughly 100% rose by roughly 100% rose by roughly 100% Developers using AI write 40% more security vulnerabilities 40% more security vulnerabilities 40% more security vulnerabilities 40% more security vulnerabilities Experienced developers are 19% SLOWER with AI tools 19% SLOWER 19% SLOWER 19% SLOWER with AI tools with AI tools Engineers in a selected group of significant companies reject about 70% of AI code suggestions. reject about 70% reject about 70% reject about 70% Many companies were betting everything on the promise of “10x productivity”, a mathematical impossibility that’s is now about to create a crisis unlike anything the software industry has ever seen. Here’s what’s really happening: we’re not building the future faster, we’re building tomorrow’s minus two trillion dollars of code. At unprecedented scale. With unprecedented confidence. And unprecedented blindness to the catastrophe we’re creating. Here’s what’s really happening: tomorrow’s minus two trillion dollars of code Let me show you exactly how these Four Horsemen of Bad Codeare turning the global codebase into an unmaintainable disaster.And if you’re a software engineer, get ready, soon your inbox will be overflowing with frantic emails from companies begging for full engineering teams to rescue their broken codebases. Four Horsemen of Bad Code Yes, you’ll see why companies will soon be desperately rehiring the same human programmers they replaced — after taking the bait of the “we won’t need programmers in a few months” fantasy. desperately rehiring the same human programmers they replaced “we won’t need programmers in a few months” “we won’t need programmers in a few months” The Four Horsemen Riding on tons of garbage code As mentioned, the four errors you see in the chart above now plague most AI-generated code. Companies are patching them at full speed, but the disaster is only being delayed. Sooner or later, they’ll have to admit the truth — AI code will have to be treated for what it really is: autocomplete on steroids, not automation. autocomplete on steroids To understand the scale of the havoc these errors are unleashing, we need to classify them into two categories. The first two are the classic statistical errors everyone knows; the next two are new cognitive errors, born from the mechanics of AI itself, the ones nobody is talking about. Both sets are still being lazily labeled as “hallucinations”, a buzzword invented for marketing, not engineering. classic statistical errors new cognitive errors “hallucinations” THE CLASSICAL TWO ERRORS everybody knows Type II Errors : The “Obvious” Mistakes Type II Errors : The “Obvious” Mistakes What it means (in code): the simple stuff went wrong. The linter’s rolling its eyes. What it means (in code): # AI generates (wrong parameter use; forgot to apply discount) def calculate_discount(price, customer_id, discount_rate): return price * discount_rate # Oops—no discount applied # Should have been def calculate_discount(price, customer_id, discount_rate): return price * (1 - discount_rate) # Apply the discount properly # AI generates (wrong parameter use; forgot to apply discount) def calculate_discount(price, customer_id, discount_rate): return price * discount_rate # Oops—no discount applied # Should have been def calculate_discount(price, customer_id, discount_rate): return price * (1 - discount_rate) # Apply the discount properly Other examples frequently seen in AI-generated code include: Missing error handling (try/catch blocks) Incorrect operators (using = instead of == in conditions) Off-by-one errors in loops Wrong method names (string.contains() in Python instead of in operator) and a long etc Missing error handling (try/catch blocks) Incorrect operators (using = instead of == in conditions) = == Off-by-one errors in loops Wrong method names (string.contains() in Python instead of in operator) string.contains() in and a long etc Type II Errors (Logic/Causal): Wrong Business Logic, Perfect Syntax Type II Errors (Logic/Causal): Wrong Business Logic, Perfect Syntax What it means (in code): Runs perfectly; does the wrong thing. Or, to put it more vividly, it arrives right on time… at the wrong station. Charming. What it means (in code): right on time… at the wrong station. Case in point — and you can scale this to your own business: the AI noticed that 80% of premium users in its training data lived in major cities, so it wrongly concluded that city location determines shipping rates. city location shipping rates # AI generates (correlation != causation) def determine_shipping_cost(user): # AI saw premium users often live in cities with free shipping if user.is_premium: return 0 # Free shipping for premium elif user.city in ['New York', 'Los Angeles']: return 0 # AI incorrectly learned cities get free shipping else: return 9.99 # Should have been based on actual business rules def determine_shipping_cost(user, order_total): if order_total > 50 or user.is_premium: return 0 return 9.99 # AI generates (correlation != causation) def determine_shipping_cost(user): # AI saw premium users often live in cities with free shipping if user.is_premium: return 0 # Free shipping for premium elif user.city in ['New York', 'Los Angeles']: return 0 # AI incorrectly learned cities get free shipping else: return 9.99 # Should have been based on actual business rules def determine_shipping_cost(user, order_total): if order_total > 50 or user.is_premium: return 0 return 9.99 The New Two Cognitive Errors Nobody Talks About The next two errors are cognitive in nature. You will not find them in classical probability textbooks. They appear to be native to AI systems, emerging from how modern models compress meaning and glue concepts together during training and generation. They fail in different ways: one as entanglement of representations, the other as collisions in how memories are addressed and recalled. entanglement collisions Type III Errors (Spaghetti/Entanglement): When Everything Gets Tangled Type III Errors (Spaghetti/Entanglement): When Everything Gets Tangled This is the first real cognitive mess. Separate parts of the system end up glued together: tweak an email, and payments break; change auth, and logging freaks out. Nothing works in isolation, so every tiny fix turns into a full tour of the codebase. It’s what happens when AI mixes up representations and leaves you with a giant ball of architectural spaghetti. # AI generates - everything tangled together def process_user_action(user, action_type, data): # User validation, logging, auth, payment, and email ALL mixed if not user.is_authenticated: log_event("auth_failed", user.id) send_email(user.email, "Login attempt failed") return False if action_type == "purchase": # Purchase logic mixed with user verification if user.email_verified and user.age > 18: payment = process_payment(data['amount']) log_event("purchase", user.id, payment.id) update_inventory(data['item_id']) send_email(user.email, "Purchase confirmed") update_user_points(user, data['amount'] * 10) check_fraud(payment) # Why is fraud check AFTER payment? return payment elif action_type == "login": # Login mixed with analytics and recommendations update_last_login(user) log_event("login", user.id) recommendations = get_recommendations(user) # Why here? send_email(user.email, "Welcome back!") # Email on EVERY login? return {"success": True, "recommendations": recommendations} # AI generates - everything tangled together def process_user_action(user, action_type, data): # User validation, logging, auth, payment, and email ALL mixed if not user.is_authenticated: log_event("auth_failed", user.id) send_email(user.email, "Login attempt failed") return False if action_type == "purchase": # Purchase logic mixed with user verification if user.email_verified and user.age > 18: payment = process_payment(data['amount']) log_event("purchase", user.id, payment.id) update_inventory(data['item_id']) send_email(user.email, "Purchase confirmed") update_user_points(user, data['amount'] * 10) check_fraud(payment) # Why is fraud check AFTER payment? return payment elif action_type == "login": # Login mixed with analytics and recommendations update_last_login(user) log_event("login", user.id) recommendations = get_recommendations(user) # Why here? send_email(user.email, "Welcome back!") # Email on EVERY login? return {"success": True, "recommendations": recommendations} Summing up, you can’t: Summing up, you can’t: Change email logic without affecting payments Modify authentication without breaking logging Update the purchase flow without touching user verification Test any component in isolation Change email logic without affecting payments Modify authentication without breaking logging Update the purchase flow without touching user verification Test any component in isolation This error explains the 100% increase in code that has to be completely rewritten within 2 weeks because it’s simply unmaintainable. 100% increase in code rewritten within 2 weeks rewritten within 2 weeks Type IV Errors (Memory / Collision Cascades): Unrelated behaviors share the same “address” Type IV Errors (Memory / Collision Cascades): Unrelated behaviors share the same “address” This is the second and more fatal cognitive error. As mentioned, you will not find it in probability textbooks, but it is, in fact, lurking in your AI-generated code. The simple image of this blunder: two coats on one peg; one falls when you take the other. The simple image of this blunder: two coats on one peg; one falls when you take the other. It shows up when compressed representations put different ideas too close together. At runtime, recalling one quietly wakes the other. You reset a password, and account cleanup tags along. You export data, and a stray payment runs. Nothing looks broken in isolation, yet side effects keep resurfacing because the model stored neighbors in the same mental drawer. # AI generates — password reset collides with account cleanup def reset_password(user_email): user = find_user(email=user_email) if user: token = generate_reset_token() send_reset_email(user.email, token) # From a different flow, yet here we are if user.last_login < 30_days_ago: user.status = "inactive" cleanup_user_data(user) return token return None # Another function unexpectedly handles payments def export_user_data(user_id): data = get_user_data(user_id) # Payment processing in a data export—what could go wrong? if data.outstanding_balance > 0: process_payment(data.outstanding_balance) return create_export_file(data) # AI generates — password reset collides with account cleanup def reset_password(user_email): user = find_user(email=user_email) if user: token = generate_reset_token() send_reset_email(user.email, token) # From a different flow, yet here we are if user.last_login < 30_days_ago: user.status = "inactive" cleanup_user_data(user) return token return None # Another function unexpectedly handles payments def export_user_data(user_id): data = get_user_data(user_id) # Payment processing in a data export—what could go wrong? if data.outstanding_balance > 0: process_payment(data.outstanding_balance) return create_export_file(data) Real-world echoes (reported in research): Real-world echoes (reported in research): Search functions that modify DB records Auth flows that trigger analytics “View” endpoints that export to logs APIs that mix multiple unrelated operations Search functions that modify DB records Auth flows that trigger analytics “View” endpoints that export to logs APIs that mix multiple unrelated operations The goal of the fix is clear: give behaviors separate namespaces and explicit boundaries, then route anything stateful through a dedicated path that you can test on its own. Use retrieval grounding, key-value memory, and “fail-closed” side effects to keep things from colliding. Yes, I know, and I also feel the pain of having reworked something we had done manually earlier and better. Yes, I know, and I also feel the pain of having reworked something we had done manually earlier and better. Yes, I know, and I also feel the pain of having reworked something we had done manually earlier and better. How Are These Errors Affecting Production Software? Here’s the part everyone was waiting for — what really happens in the code AI ships. Chart 3 above shows the most common bugs reported by developers worldwide and how they align with the four error types (three of which matter most in practice). Chart 2 above and Chart 4 below show what really happens when developers bring AI assistants into the workflow. When used strictly as autocomplete on steroids, Types I–II errors are mostly caught by humans or tooling. You get a modest productivity bump — typos squashed, imports fixed, the usual housekeeping. Nice, but hardly a revolution. autocomplete on steroids Then come Types III–IV — spaghettization and semantic collisions. That’s when the wheels come off. Concepts tangle across modules, responsibilities blur, and “harmless” helpers start misaddressing meaning. The result? Brittle code that fails integration, resists testing, and inflates maintenance costs. Many teams eventually learn it’s faster — and safer — to rip out the AI-generated parts than to patch them forever. Types III–IV And of course, you can clearly see how the “10× automation” dream gets buried, it completely flipped the ROI and reminds us that, as of 2025, true code-generation automation, and any significant productivity gains for human developers, remain nothing more than a distant, epic fantasy. So, to sum it up: as a software dev, you don’t need to panic every time some doom-poster waves a new snake-oil “AI code wand.” As Chart 4 above shows, the only real ROI bump is tiny, and it appears only when you use AI for the boring, low-risk stuff that’s easy to diff, lint, and roll back. Think of it as power-autocomplete, not a robot engineer. power-autocomplete robot engineer Boilerplate / scaffolding: CRUD handlers, DTOs, wiring Testing glue: stubs, table-driven tests, fixture data Mechanical edits: repetitive refactors; rename/move ops Lightweight content: docs/comments, regexes, simple SQL, migration templates Housekeeping: lint/format hints; config & YAML snippets Boilerplate / scaffolding: CRUD handlers, DTOs, wiring Boilerplate / scaffolding: Testing glue: stubs, table-driven tests, fixture data Testing glue: Mechanical edits: repetitive refactors; rename/move ops Mechanical edits: Lightweight content: docs/comments, regexes, simple SQL, migration templates Lightweight content: Housekeeping: lint/format hints; config & YAML snippets Housekeeping: Here’s the good news for every developer feeling a bit hopeless: the whole “10× with AI” fantasy is already crashing in production. Companies are about to need real programmers — lots of them — to untangle spaghetti, fix logic, and rebuild systems that actually work. So keep leveling up your skills. The cleanup crews are forming, and guess what, yup, you’re on the call list. real programmers — lots of them — you’re on the call list. Post-Note Readers of our past stories already know the antidote exists — a real mathematical cure for the four-error plague haunting today’s AI. But of course, that would mean slowing down, rewiring the system, and admitting that hype isn’t research. It will take years, no doubt, if anyone puts on the brakes and starts doing more science in silence and less marketing a real mathematical cure for the four-error plague a real mathematical cure for the four-error plague