How the Machine Learns

There is no rulebook inside ChatGPT.

This surprises almost everyone the first time they hear it. Our intuition about software comes from forty years of programs that do exactly what someone told them to do: if the invoice is thirty days past due, send the reminder. Somewhere, a person wrote that rule, and the software follows it forever.

Nobody wrote the rules of English into a language model. Nobody taught it the difference between a polite decline and a rude one, or what a purchase order looks like. It absorbed all of that the way an apprentice absorbs a trade: by exposure, repetition, and correction, at a scale no human apprenticeship could survive.

01THREE STAGES

The Three Stages of an Education

The first and longest stage is called pretraining, and it is the brute-force reading program described in the previous essay. The model works through a corpus measured in trillions of words, drawn from the public internet, digitized books, articles, and code, guessing the next word over and over. Each wrong guess triggers a tiny adjustment to its billions of internal dials, a process engineers call gradient descent, which is a precise mathematical version of “warmer, colder.” Run that loop long enough, on enough computers, and the dials settle into a configuration that has compressed an astonishing share of human written knowledge. This stage is why training a frontier model costs hundreds of millions of dollars. The reading list is the internet, and the tuition is paid in electricity and chips.

What emerges from pretraining is knowledgeable and useless, like a brilliant recluse who has read everything and never held a conversation. It can continue text. It has no idea it is supposed to be helpful. The second stage, fine-tuning, fixes that. The model is shown many thousands of examples of instructions paired with good responses, written and curated by people, until it learns the format of being useful: answer the question, follow the instruction, stop when you are done.

The third stage is the one with the clunky name: reinforcement learning from human feedback. People compare pairs of the model’s answers and mark which one is better, more helpful, more honest, less likely to walk someone through building a weapon. The model is adjusted toward the preferred answers. This stage is where a model’s manners come from, and it is why different companies’ models have noticeably different personalities. The base capability comes from the reading. The behavior comes from the raising.

02APPRENTICES LIMITS

The Apprentice’s Limits

An education like this produces predictable blind spots, and three of them matter for anyone running a business.

The first is the cutoff. Training ends on a date, and the model’s knowledge of the world ends with it. Many tools now bolt on live web search to compensate, but the model itself is a snapshot. Ask it about last quarter and it is guessing unless something feeds it last quarter.

The second is that the model learned from the public record, and your business is private. Your pricing logic, your customer history, the reason you stopped working with that one vendor in 2019: none of it is in the dials. The machine arrives knowing the world and ignorant of you, a gap that a later essay in this series addresses directly.

The third is subtler. The model learned what answers look like, which includes learning that answers sound confident. An apprentice who was never allowed to say “I don’t know” becomes a journeyman who bluffs. The newer models are better about admitting uncertainty, but the tendency is baked into the education, and you should plan for it.

Now, a fair objection: if the model trained on the public internet, did it train on mine? On my website, my reviews, maybe my data? On the website, quite possibly. On what you type into it today, that depends entirely on which product you use and on which terms, and the question deserves its own essay, which it gets later in this series. The short version is that business-grade accounts generally commit in writing to keep your inputs out of training, and free consumer products deserve a closer read.

03OWNER DOES

What the Owner Does With This

Understanding the education tells you how to manage the graduate. You would never hand a gifted new apprentice the keys to client relationships on day one; you would give them context, supervision, and work matched to their strengths. The same management instinct, the one you already have, transfers directly. Give the machine your documents when the task involves your business. Check its confident claims when the stakes are real. Use it hardest where its education was deepest: language, summary, structure, code, the written patterns of the world.

There is one more thing the apprenticeship frame gives you, and it may be the most practical. The education is general, which means the differentiation is local. Every competitor can rent the same graduate. What they cannot rent is your workflows, your data, and your judgment about where the work actually hurts. The model is becoming a commodity. What you wrap around it is the business.

The Three Stages of an Education

The Apprentice’s Limits

What the Owner Does With This

The Prediction Engine

Teaching the Machine Your Business

Like how we think? Put it to work.

Read before you ever pick up the phone.