23 Jun 2026 · engineering

The headline writes itself: a 295K model in your browser

The giant words at the top of the home page are not typed by me. They are generated, right now, in your browser, by a Smedjan model.

It is tiny — about 295,000 parameters — trained as a title generator and compiled to WebAssembly. When the page loads, the same inference path the engine uses on a GPU runs on your CPU, in a sandbox, with no network call. There is no server deciding what it says. The model, the tokenizer, and the sampling loop are all shipped to you and run on your machine.

That is the whole argument for Smedjan in one gesture. A language model does not have to be a remote service you rent. It can be a file you own — small enough to fit in a web page — that does its work where you are. The 295K model is a toy on purpose: the point is not that it is good, it is that the entire loop, from weights to sampled tokens, is yours and runs anywhere.

The full engine scales the same loop up: the same checkpoint format, the same sampler, the same code, just a bigger model on a real GPU. If a 295K model can write its own headline in a browser tab, a larger one can run a private agent on your own hardware. Same idea, more parameters. If you want to see how the pieces fit, the inference docs walk through the sampling path.

— Andrei

Watch the repository for releases, or come help forge it.