Compute
Inference
Storage
API Reference
Documentation
Help & Requests
Log in
Sign up
Open sidebar
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.
Inference
Quickly deploy a public or custom model to a dedicated inference endpoint.
New Deployment
Public models
Text
ASR
Multimodel
Image
Model
Parameters
Size
Context Window
Gemma3-1B
Continue
1 B
1 GB
128 K
Mistral-7B-Instruct-v0.3
Continue
7 B
4.1 GB
32 K
Mistral-Nemo-Instruct-2407
Continue
12.2 B
7.1 GB
128 K
Mistral-Small-24B-Instruct-2501
Continue
24 B
14 GB
128 K
Mistral-Small-Instruct-2409
Continue
7 B
4.7 GB
128 K
DeepSeek-R1-Distill-Llama-70B
Continue
70 B
4.7 GB
128 K
DeepSeek-R1-Distill-Qwen-14B
Continue
14 B
9 GB
130 K
DeepSeek-R1-Distill-Qwen-32B
Continue
32 B
20 GB
128 K
Llama-3.1-8B-Instruct
Continue
8 B
4.7 GB
128 K
Llama-3.2-1B-Instruct
Continue
1 B
1.3 GB
128 K
Llama-3.2-3B-Instruct
Continue
3.2 B
2 GB
128 K
Llama-3.3-70B-Instruct
Continue
70 B
43 GB
128 K
Qwen2.5-14B-Instruct-GPTQ-Int8
Continue
14 B
4.7 GB
128 K
Qwen2.5-7B-Instruct
Continue
7 B
4.7 GB
128 K
Qwen2.5-Coder-32B-Instruct
Continue
32 B
20 GB
128 K
QwQ-32B
Continue
32.5 B
65.5 GB
32 K
QwQ-32B-Preview
Continue
32.5 B
65.5 GB
32 K
Phi-3.5-MoE-instruct
Continue
32 B
8.4 GB
128 K
phi-4
Continue
14.7 B
8.4 GB
16 K
Marco-o1
Continue
7.6 B
4.7 GB
128 K
Llama-3.1-Nemotron-70B-Instruct
Continue
70 B
43 GB
128 K
aya-expanse-32b
Continue
32 B
20 GB
130 K