// guide

Quantize & export

Shrink a trained checkpoint and move it into other ecosystems — GGUF for llama.cpp, safetensors for HuggingFace — plus checkpoint merging and progressive growth.

// 01

Quantize

Q4 / Q8 post-training

Post-training quantization shrinks a checkpoint with little quality loss. Q8 is about 4× smaller, Q4 about 8×.

# Q8 ≈ 4× smaller, Q4 ≈ 8× smaller
smedjan quantize --checkpoint final.bin --output model.q4.qbin --bits 4
smedjan quantize --checkpoint final.bin --output model.q8.qbin --bits 8
FlagDefaultWhat it does
--bits4Quantization bits: 4 or 8.
--outputmodel.qbinQuantized output path.
// 02

Export to GGUF

GGML quantized weights

Emit a GGUF file with standard GGML quantized weights for the GGUF tooling ecosystem.

# GGUF: standard GGML quantized weights for the GGUF ecosystem
# --quant f32 | q8_0 (≈4× smaller) | q4_0 (≈8× smaller) — real GGML blocks
smedjan export-gguf --checkpoint final.bin --output model.gguf --quant q4_0

Real state: GGUF export covers f32, q8_0, and q4_0 as standard GGML blocks (f16 block scales, 32-byte tensor alignment, 1-D norms kept f32) — validated against the reference GGUF dequantizer. The file is a valid GGML weight container, but a Smedjan checkpoint is not yet a turnkey llama.cpp inference model: the tokenizer isn't embedded and Smedjan's RoPE/QK-norm conventions differ from the llama architecture. Running it directly under llama.cpp is on the roadmap alongside HuggingFace inference parity.

// 03

safetensors & HF import

HuggingFace interop

Zero-dependency safetensors export and import, including an HF-Llama weight remap for continued-training retrofits. import-hf reads a HuggingFace model directory (config.json → model geometry, F32/BF16/F16 weights) and writes a Smedjan checkpoint you can keep training.

# export a checkpoint to the HuggingFace ecosystem
smedjan export-safetensors --checkpoint final.bin --output model.safetensors

# import a HuggingFace Llama model (config.json + F32/BF16/F16 weights) to keep training
smedjan import-hf --model-dir ./llama-hf --output imported.bin

Real state: safetensors import now reads F32, BF16, and F16, and config.json maps straight to a Smedjan model. This is the import/retrofit path — faithful bit-exact HuggingFace inference parity (HF half-split vs Smedjan interleaved RoPE, the fixed QK-norm) is still on the roadmap and is adapted away by continued training, not reproduced.

// 04

Merge & grow

reuse trained weights

merge averages two or more checkpoints in weight space — a cheap way to squeeze out a little extra benchmark performance. grow expands a small trained model into a bigger architecture so you can keep training rather than starting over.

# weight-space averaging of several checkpoints (a small benchmark gain)
smedjan merge --checkpoints a.bin b.bin c.bin --output merged.bin

# grow a small trained model into a larger architecture, then keep training it
smedjan grow --checkpoint small.bin --output big_init.bin \
  --dim 1024 --layers 24 --heads 16
smedjan train --dataset train.bin --tokenizer tokenizer.bin \
  --size custom --dim 1024 --layers 24 --heads 16 \
  --pretrained big_init.bin