AI Isn't Making You Worse. Not Reviewing It Is.

The take going around is that AI is quietly rotting your skills. Use it long enough and the muscle wastes away. The model writes the email, builds the deck, runs the numbers, and one day you wake up unable to do the work yourself. They're right that something's rotting. They've got the cause wrong. It isn't the tool. It's the part where you stopped reading what it handed you.

For the last eighteen months, reviewing AI output hasn't been part of my job building LLM-powered tools; it's been the whole thing. The work I've done without a model in the loop adds up to minutes. My day is reading what models produce: feature output, generated copy, research, the emails, the Slack messages, all of it. So I'm not guessing about what review does or doesn't do to you. I do it all day, and I know exactly how it goes wrong, because I've gotten it wrong.

I build and maintain LLM-powered tools used by thousands of real users at Enrollment Resources and VirtuOHS.

You've taken this exam before

Think of every piece of work you produce as a test. Your judgment gets graded by a client, by a compiler, by reality, and for your whole career you took that test the slow way, working each problem out by hand before you turned it in. The work was the exam. You can't do good work without using the judgment it takes to do it, so doing the work and being tested were the same thing. That's why it sharpened you. Every problem was a rep whether you thought of it that way or not.

Then a stranger started sliding you their finished answer sheet.

That's the real change. Not that AI does your work, but that it hands you finished answers right when you'd normally start working. And here's the one thing the atrophy crowd has right: the writing muscle only gets worked by writing. If the skill you're protecting is starting from a blank page, nothing replaces starting from a blank page. Give them that one. But everything after it they get wrong, because the test didn't go away when the stranger's answers showed up. Every answer the model hands you is still a question aimed at your judgment: is this right, is it the best version, would you have done it this way. You're being tested as often as before. More often, actually, because the model cranks out ten answers in the time you used to produce one.

Which should make you sharper. More reps on the exact skill they say is wasting away. Take that test honestly and there's no way you come out worse. So why does anyone come out worse? Because of how the answer shows up, and what it tempts you to do with it.

The model hands you a finished answer with the work erased. You used to see the problem and the path to the answer as one thing; you couldn't have the answer without walking the path, and walking it was the test. Now the answer shows up clean and confident, with no trace of what else it could have been or whether it's even right. The work is gone, and what's left doesn't read like a question. It reads like a fact. You don't grade facts, you accept them. So you write your name at the top and turn it in. Because the answer was usually close enough, you got away with it, and you called that working fast.

Taking the test now means doing something the by-hand version never asked of you: refusing to accept an answer just because it shows up looking done. That's the same judgment you used to build the work yourself, pointed at a finished answer instead of a blank page. Do it and the skill gets worked as hard as the slow days ever worked it. Skip it and nothing gets worked at all. That's the only thing that rots you.

The question where I learned this

The clearest place I've watched myself fail is data analysis. Hand a model a CSV (a few thousand rows of lead data, conversion numbers, whatever) and ask what it shows. It comes back with a clean breakdown. Totals by source, a conversion rate, a confident note about which segment is winning. Every number looks right. None of them carry any hint of I might be wrong, because the model never says that.

Here's the part that took me too long to learn.

Checking the work IS the work.

The only way to know whether that breakdown is right is to do the breakdown myself. To check the total, I have to add up the column. To check the rate, I have to work out the rate. There's no quick glance that confirms it, not like skimming a paragraph and feeling whether it reads right. Grading the answer takes the same effort as producing it, which means the model didn't save me any labor. It moved the labor from "doing" to "checking" and dressed the checking up as optional.

And that's exactly where the damage happens. When checking honestly costs as much as doing the thing, the pull to skip the check is huge. The number looks right. I trust myself to have noticed if something were off. So I accept it. I can't even point you to my worst screwup, because that's not how it fails. It fails quietly: a total off by a little, a rate close enough to feel right but still wrong, mistakes too reasonable to stick in your memory. Which is worse than failing loudly. Nothing ever breaks hard enough to teach you to stop trusting it.

It gets worse. When the model answers straight off the cuff, it isn't even doing math. It can't hold a few thousand rows in its head and add up a column reliably; it's producing something that looks like a right answer. A number shaped like a sum, not one that is. Yes, the chat apps increasingly catch this and run real code under the hood. Good. But that's the app doing the right thing for you, and it tells you nothing about the API call in your own pipeline or the tool you built yourself. What matters is where the math actually happens, and when it doesn't happen at all, the model sounds exactly as confident as when it does. The polish is identical either way. So a clean, confident table on a question that actually matters is exactly where I should be most suspicious, and exactly where I'm most tempted not to be.

So make the check cheap

Here's what the "just review more carefully" advice misses. The problem isn't a lack of discipline. It's math. You're now grading ten answers for every one you used to produce, and checking each one as carefully as it deserves is more total work than building one by hand ever was. Telling people to check harder is telling them to eat that cost on willpower. They won't, and they shouldn't have to.

Take an answer you can check at a glance over one you have to take on faith.

So don't fix the willpower. Fix the cost. There's one rule under every way of doing that: take an answer you can check at a glance over one you have to take on faith. When the answer carries the proof of its own correctness, checking it stops being "redo everything" and becomes "look it over." The whole game is building that into the answer up front, instead of hoping you'll find the discipline to check one that doesn't have it.

The cleanest version is when part of the answer is just math you can pull out of the model entirely. Don't ask for the answer. Ask for a script that produces it, and run it.

import pandas as pd

df = pd.read_csv("leads.csv")
by_source = df.groupby("source")["converted"].agg(["sum", "count"])
by_source["rate"] = by_source["sum"] / by_source["count"]
print(by_source.sort_values("rate", ascending=False))

Two things change at once. The math runs on a real computer instead of inside the model's head, so the number is the actual number, every single time. And checking drops from "redo the entire analysis" to "read ten lines of obvious logic." Is it grouping on the right column. Adding up the right field. Handling the blanks. That's a fifteen-second check instead of redoing the whole job. You're checking the method, not rerunning the answer. The model didn't get smarter; the work moved somewhere you can both trust it and check it cheap.

When there's no script, make the work visible

Most of what I check in a day has no math to pull out. Copy, research, an email, a Slack message. You can't run prose through a computer, because the thing you're judging is the feel of it. So the lazy read of the script trick is that it only saves you on the odd numbers question and leaves you hanging on everything else.

But the script was never the rule. Easy checking was. And you can make prose easy to check without a computer anywhere in sight, by making the model show the work it normally erases, so you check the work instead of redoing the answer.

A bare conclusion is exactly the thing that invites a blind hand-in. So don't take it bare. Ask for the claim and the source for it, so checking is a click instead of a memory test. Ask for the assumptions listed at the top, so a bad one sits right on the surface instead of buried three steps deep. Ask for the reasoning, not just the verdict, so you're checking steps you can see. None of that makes the answer true, but all of it does the same job the script did: it drops the price of checking until you'll actually pay it. A claim with its source attached is a quick check. The same claim bare is a research project, which means in practice it's a coin you didn't flip.

Same move, both cases. Where part of the answer is math, pull it into code and let the computer make it solid. Where it isn't, make the model show its work and let that make it easy to check.

Check harder when being wrong costs more

"Work every problem, always, forever" isn't how anyone takes a test, and you know it. You don't sweat the practice questions that don't count. You sweat the ones on the final.

That's the difference the atrophy argument flattens. The question was never "did AI touch this." It's "am I turning this in, unchecked, somewhere it counts." Plenty of answers can go in unread: a throwaway script, scaffolding you'll rewrite anyway, a list you'll eyeball in context. No skill is being tested there, because you're not pretending the answer is yours. You wanted it to exist and you don't care if it's perfect. Fine, as long as you're honest about the deal: an unchecked answer could be wrong and you'll never know which one was.

So the failure is narrow and specific: handing in a finished-looking answer as if you'd worked the problem, on a question where being wrong costs something. A clean, confident answer on a question that matters should raise your suspicion, not settle it, because clean and confident is exactly what a wrong answer looks like too. The polish isn't proof the work is right. Treat it as no proof at all and check accordingly.

This also kills the obvious dodge: "I'll write a rough version myself, then let the model polish it." Drafting first genuinely works the writing muscle; that's the rep the atrophy crowd is right about, and if that's the skill you're keeping, draft away. But it buys you nothing on the checking test. What comes back is still a finished answer you have to judge, now with the extra pull of seeing your own work in it and waving it through. Two different skills, two different tests. Drafting first passes the first one and gets you out of neither.

The ones who'd ace it are the ones who skip it

Checking an answer takes two things: knowing enough to catch the mistake, and actually working the problem. Miss the knowledge and your review is just for show: a question outside your field, an "A" you can't judge and can't even tell you can't judge. But that failure shows itself eventually. The answer turns out wrong, the thing breaks, someone traces it back.

The expert's failure doesn't show itself. Have the knowledge and skip the work, and you've produced the one wrong answer nobody catches, including you, because the skill was right there and everyone assumes a skilled person used it. This is the failure most often mistaken for decay: a sharp person turns in a wrong answer and the room calls it rot. It isn't. The skill never dulled. The polish invited a hand-in instead of a check, and a hand-in is exactly what the polish is built to earn.

And here's the cruel part: the better you are, the worse the trap, because skill breeds trust in your own glance. The rookie doesn't trust themselves yet, so they work the problem. The expert trusts themselves, and on a clean, confident answer that trust turns straight into a hand-in. The test ends up skipped by exactly the people who would have aced it.

The disease was never atrophy

Your skill doesn't waste away from using the tool. It wastes away from one reflex: reading a finished-looking answer as a correct one and turning it in without working the problem. And you don't beat a reflex that strong with willpower. You beat it by making the right move the cheap one. Pull the math into a script where there's math to pull. Make the model show its sources and assumptions where there isn't. Check harder when being wrong costs more, and treat polish as no proof at all.

The test is still running. You only fail it by handing it in blank.

SuedePritch