Model Catalog
All open-source. All running on our GPU clusters. One API key for everything.
Image
(4)FLUX.1 Schnell
Ultra-fast image generation optimized for speed. High-quality images in under a second.
Best for
FLUX.1 Dev
Development-grade image model with higher fidelity and more creative control.
Best for
FLUX.1 Pro
Professional image generation with maximum quality and prompt adherence.
Best for
Stable Diffusion XL
Industry-standard text-to-image model with extensive ecosystem and LoRA support.
Best for
Vision
(3)Qwen2.5-VL-7B Instruct
Multimodal vision-language model that accepts text and images for understanding visual content.
Best for
Llama 3.2 11B Vision Instruct
Multimodal model supporting interleaved text and image inputs for visual reasoning.
Best for
Qwen2.5-VL-72B Instruct
Large-scale vision-language model with state-of-the-art visual understanding.
Best for
STT
(2)Whisper Large v3
Industry-standard speech-to-text supporting 90+ languages via multipart form upload.
Best for
Whisper Large v3 Turbo
Faster variant of Whisper with near-identical accuracy at lower latency.
Best for
TTS
(1)Kokoro-82M
Lightweight text-to-speech with multiple voices. Outputs mp3, wav, or pcm audio.
Best for
Video
(2)CogVideoX-5B
Text-to-video generation. Submit a prompt and poll for the completed MP4 video.
Best for
Wan 2.1 T2V 14B
High-quality text-to-video model with consistent motion and scene understanding.
Best for
Chat
(9)Llama 3.1 8B Instruct
High-quality open-source LLM for chat, instruction following, and general-purpose text generation.
Best for
Llama 3.1 70B Instruct
Large-scale LLM with superior reasoning, coding, and multilingual capabilities.
Best for
Llama 3.3 70B Instruct
Latest Llama generation with improved instruction following and safety.
Best for
Mistral 7B Instruct v0.3
Efficient instruction-tuned model excelling at text generation and summarization.
Best for
Mixtral 8x7B Instruct
Sparse mixture-of-experts model delivering excellent quality with efficient inference.
Best for
Qwen2.5 7B Instruct
Multilingual model with strong performance in Chinese and English tasks.
Best for
Qwen2.5 72B Instruct
Large-scale multilingual model rivaling frontier closed-source models.
Best for
DeepSeek V3
Efficient MoE model with excellent coding and math capabilities at lower cost.
Best for
Gemma 2 9B Instruct
Compact yet powerful model from Google's Gemma family, optimized for helpful responses.