Origin Context
"The Scriptorium" separates form from function. In "Inside the Cathedral," the author describes this phase as the acquisition of "Amoral Mastery." The model learns that a legal argument looks like this and a love poem looks like that. It masters the syntax of hate crime just as fluently as the syntax of a sonnet, because to the model, both are just statistical patterns of token co-occurrence.
This phase is technically known as "next-token prediction," but the simplicity of the term belies the complexity of the result. To perfectly predict the next word in a mystery novel, the model must implicitly model the plot, the characters' motivations, and the physical setting. Thus, shallow prediction forces deep understanding—or at least, a high-fidelity simulation of it.
The Stochastic Parrot
This phase produces what researchers Bender et al. famously termed a "Stochastic Parrot"—an entity that can stitch together linguistic forms with probability mechanics, creating the illusion of meaning without the presence of communicative intent. The "Scriptorium" metaphor is chosen to emphasize this: the monk copies the ancient text perfectly, perhaps without reading the language he is transcribing.
Field Notes & Ephemera
Field Standard: The Scriptorium produces a "Base Model." Base models are wild, hallucination-prone, and incredibly creative. They are often "lobotomized" by later safety training (The Gymnasium) to become the boring chatbots we use in corporate products.
Trivia: A modern Scriptorium run (training a foundation model like GPT-4) consumes more electricity than a small city and costs upwards of $100 million in compute credits. It is the most expensive "spelling test" in human history.