Skip to main content

Backend comparison

BackendStartupPyTorch requiredCustom HF models
onnx (default)~100msNoONNX-exported models only
torch~2–3sYesAny HuggingFace model

ONNX backend (default)

Uses fastembed with ONNX Runtime. Recommended for most users.
  • Fast startup (~100ms model load on first embed() call)
  • No PyTorch installation required
  • Works with ONNX-exported HuggingFace models
# Default — no configuration needed
vecgrep

PyTorch backend

Uses sentence-transformers with PyTorch. Use this when you need a model that isn’t available in ONNX format.
  • Slower startup (~2–3s)
  • Supports any HuggingFace sentence-transformer model
  • Automatically uses Metal (Apple Silicon), CUDA (NVIDIA), or CPU
VECGREP_BACKEND=torch vecgrep