VL Embeddings - Qwen3-VL-Embedding-8B

Multimodal embeddings using Qwen3-VL-Embedding-8B. Supports text-only, image-only, or combined text+image inputs.

  • Text + Image: one unified embedding capturing both modalities
  • Image only: visual embedding for image search/similarity
  • Text only: high-quality text embedding with visual grounding

Best for: documents with figures/tables, mixed content, visual RAG pipelines. For pure text at scale, use the Text-Embeddings-8B space instead.

Embedding dimension (Matryoshka)

Routing guide:

  • Documents with charts/images → use this space
  • Pure text articles → use Text-Embeddings-8B
  • Content type unsure → run classifier first