{"collection":"posts","slug":"1780416704000-toy-gpts","cid":"bafkreihv7kvbbfsjwq6yr5amsldmjeex2czuqy7mf23wqkygkihhdhhucu","title":"Toy GPTs","excerpt":"Two tiny GPTs out of Karpathy's microGPT. One keeps its weights on Ethereum. The other holds a séance with dead presidents.","body":"Back in February, Karpathy released [microGPT](https://karpathy.github.io/2026/02/12/microgpt/), 200 lines of dependency-free Python that trains a GPT: a scalar autograd engine, a small transformer, an optimizer, and a train loop, with nothing to import. It rattled around in my head for months. Two side projects came out of it.\r\n\r\nMeet Bard and Dead Presidents.\r\n\r\nBard is a tiny GPT trained on [Tiny Shakespeare](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt) whose weights live on Ethereum. Dead Presidents is trained on the inaugural and State of the Union addresses of presidents who have died.\r\n\r\n## What a GPT is, roughly\r\n\r\nA GPT predicts the next token from the tokens before it. Show it enough text and it learns which token tends to follow which. To generate, you hand it a few tokens, ask what comes next, append the answer, and ask again.\r\n\r\nBoth of these work one character at a time. The token is a single letter, space, or punctuation mark. Bard has a 66-character vocabulary, Dead Presidents has 34. They read a run of characters and guess the next one.\r\n\r\nThese models continue text. Hand Dead Presidents \"the state of the\" and it runs with it, in the cadence it learned.\r\n\r\n## Bard\r\n\r\n[Bard](https://bard.farfield.systems/) trains offline in numpy, then puts its weights on chain. It is 811,520 parameters: 4 layers, 128 wide, a 64-character context. After training, an export step quantizes the weights to int8, gzips them to 684 KB, and hashes the result with SHA-256. That artifact goes on Ethereum.\r\n\r\nNormal contract storage would make 684 KB cost a fortune, so the weights ride on SSTORE2, which stores each chunk as the bytecode of a throwaway contract and reads it back far cheaper. The artifact splits into 29 chunks of 24,000 bytes, so the model lands on chain across 29 transactions inside a `WeightManifest`.\r\n\r\nA CREATE2 salt grind lands the contract at a vanity address starting `0xBA2D`. ba2d is bard in hex. The deploy commits that address first, empty, then sets the config (including the artifact hash) afterward, which keeps the address independent of the model. Once the uploader pushes and verifies every chunk, the owner calls `seal()` and freezes the manifest for good.\r\n\r\nInference happens in the browser. The frontend reads the 29 chunks straight off chain, reassembles them, gunzips, and checks the SHA-256 against the contract's hash. It runs only on a match, which proves the weights in the browser are the ones the owner sealed. Then it generates Shakespeare-shaped text, a character at a time. This is a Sepolia testnet proof of concept, and at this size the output reads as Shakespeare without quite cohering.\r\n\r\n## Dead Presidents\r\n\r\n[Dead Presidents](https://dead-presidents.farfield.systems/) points the same architecture at American political speech. The corpus is every inaugural and State of the Union address from presidents who are no longer with us, George Washington in 1789 through George H.W. Bush. The corpus leaves out the five living presidents, which means the builder keeps `george_bush` and drops `george_w_bush`. That comes to 255 addresses, normalized into 30,725 lowercase sentences, around 3 million characters, a 34-token vocabulary. The State of the Union speeches are the bulk of it, and they pull the diction toward the administrative: secretary of the treasury, expenditures, appropriations.\r\n\r\nThere are two engines for one model. The scalar engine is pure Python with no NumPy, running at about 8 seconds per training step. Every multiply and every gradient is a visible Python object, which makes it readable and unusably slow. The fast engine is the same math in NumPy at roughly 5,400 examples a second. They agree to machine precision, logits matching to about 1e-16, so you train on the fast one and sample through the slow legible one and get the same numbers.\r\n\r\nSlow training makes good weights expensive, so finding them became its own search. The search copies Karpathy's autoresearch loop: one metric, a fixed step budget per run, keep the result if it improved. The metric is val_bpc, bits per character on a held-out split. Seven researcher agents each took one region — width, depth, learning-rate schedule, batch size, context length, attention heads, a wildcard combo — across 59 experiments, then re-ran the top configs on three seeds to reject luck.\r\n\r\nWarmup mattered most. Without a 150-to-250-step warmup, every bigger model looked worse, an init-time instability that masqueraded as a capacity ceiling. Context length came next: block size 24 to 112 dropped the metric the whole way. Depth beat width at matched parameter counts.\r\n\r\nThe numbers in one line: 5.09 to 2.04 to 1.667 to 1.530 to 1.475 bits per character. Untrained, then inaugural-only, then adding the State of the Union speeches, then scaling the model up, then training straight across sentence boundaries instead of one sentence at a time. The winner is 464,928 parameters (96 wide, 4 layers, 192 context), about 1.8 MB as float32.\r\n\r\nThe samples come out unmistakably presidential and not quite grammatical:\r\n\r\n```\r\nthe constitution of the country and personal agreement has been made in the state of the union.\r\n\r\nour efforts is the soviet union and the prosperity of the secretary of the united states.\r\n```\r\n\r\nOne phrase reaches back to the 1800s, another to the Cold War, both from 34 characters and 465k parameters. It runs client-side too. A Web Worker holds the weights behind an OpenAI-shaped API, and the chat UI seeds from your message and continues in dead-president voice, a séance with live token counts.\r\n\r\n## Farfield\r\n\r\nBoth ship their weights to the browser and run them in a Web Worker, a character at a time. They land on [Farfield](1779064713644-farfield.md), the catch-all domain for my side projects and self-hosted services. Bard is live there now; Dead Presidents goes up today.","tags":["machine-learning","ai","onchain"],"published":true,"createdAt":"2026-06-02T16:11:44Z","updatedAt":"2026-06-03T17:25:12Z"}