Quantize & export
Shrink a trained checkpoint and move it into other ecosystems — GGUF for llama.cpp, safetensors for HuggingFace — plus checkpoint merging and progressive growth.
Quantize
Post-training quantization shrinks a checkpoint with little quality loss. Q8 is about 4× smaller, Q4 about 8×.
# Q8 ≈ 4× smaller, Q4 ≈ 8× smaller smedjan quantize --checkpoint final.bin --output model.q4.qbin --bits 4 smedjan quantize --checkpoint final.bin --output model.q8.qbin --bits 8
| Flag | Default | What it does |
|---|---|---|
--bits | 4 | Quantization bits: 4 or 8. |
--output | model.qbin | Quantized output path. |
Export to GGUF
Emit a GGUF file with standard GGML quantized weights for the GGUF tooling ecosystem.
# GGUF: standard GGML quantized weights for the GGUF ecosystem # --quant f32 | q8_0 (≈4× smaller) | q4_0 (≈8× smaller) — real GGML blocks smedjan export-gguf --checkpoint final.bin --output model.gguf --quant q4_0
Real state: GGUF export covers f32, q8_0, and q4_0 as standard GGML blocks (f16 block scales, 32-byte tensor alignment, 1-D norms kept f32) — validated against the reference GGUF dequantizer. The file is a valid GGML weight container, but a Smedjan checkpoint is not yet a turnkey llama.cpp inference model: the tokenizer isn't embedded and Smedjan's RoPE/QK-norm conventions differ from the llama architecture. Running it directly under llama.cpp is on the roadmap alongside HuggingFace inference parity.
safetensors & HF import
Zero-dependency safetensors export and import, including an HF-Llama weight remap for continued-training retrofits. import-hf reads a HuggingFace model directory (config.json → model geometry, F32/BF16/F16 weights) and writes a Smedjan checkpoint you can keep training.
# export a checkpoint to the HuggingFace ecosystem smedjan export-safetensors --checkpoint final.bin --output model.safetensors # import a HuggingFace Llama model (config.json + F32/BF16/F16 weights) to keep training smedjan import-hf --model-dir ./llama-hf --output imported.bin
Real state: safetensors import now reads F32, BF16, and F16, and config.json maps straight to a Smedjan model. This is the import/retrofit path — faithful bit-exact HuggingFace inference parity (HF half-split vs Smedjan interleaved RoPE, the fixed QK-norm) is still on the roadmap and is adapted away by continued training, not reproduced.
Merge & grow
merge averages two or more checkpoints in weight space — a cheap way to squeeze out a little extra benchmark performance. grow expands a small trained model into a bigger architecture so you can keep training rather than starting over.
# weight-space averaging of several checkpoints (a small benchmark gain) smedjan merge --checkpoints a.bin b.bin c.bin --output merged.bin # grow a small trained model into a larger architecture, then keep training it smedjan grow --checkpoint small.bin --output big_init.bin \ --dim 1024 --layers 24 --heads 16 smedjan train --dataset train.bin --tokenizer tokenizer.bin \ --size custom --dim 1024 --layers 24 --heads 16 \ --pretrained big_init.bin