VL Embeddings - Qwen3-VL-Embedding-8B

Text + Image: one unified embedding capturing both modalities
Image only: visual embedding for image search/similarity
Text only: high-quality text embedding with visual grounding

Multimodal embeddings using Qwen3-VL-Embedding-8B. Supports text-only, image-only, or combined text+image inputs.

Best for: documents with figures/tables, mixed content, visual RAG pipelines. For pure text at scale, use the Text-Embeddings-8B space instead.

Text input (one per line for batch, or leave empty for image-only)

Image input (optional — enables multimodal embedding)

Embedding dimension (Matryoshka)

Routing guide:

Embeddings JSON Preview (truncated)

Stats

Download full embeddings JSON