Sesterce

B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.B200 PRE-ORDER AVAILABLE. H200 AVAILABLE BARE METAL.

Inference

Quickly deploy a public or custom model to a dedicated inference endpoint.

Public models

Text

Model
Parameters
Size
Context Window

Gemma3-1BContinue
1 B
1 GB
128 K

Mistral-7B-Instruct-v0.3Continue
7 B
4.1 GB
32 K

Mistral-Nemo-Instruct-2407Continue
12.2 B
7.1 GB
128 K

Mistral-Small-24B-Instruct-2501Continue
24 B
14 GB
128 K

Mistral-Small-Instruct-2409Continue
7 B
4.7 GB
128 K

DeepSeek-R1-Distill-Llama-70BContinue
70 B
4.7 GB
128 K

DeepSeek-R1-Distill-Qwen-14BContinue
14 B
9 GB
130 K

DeepSeek-R1-Distill-Qwen-32BContinue
32 B
20 GB
128 K

Llama-3.1-8B-InstructContinue
8 B
4.7 GB
128 K

Llama-3.2-1B-InstructContinue
1 B
1.3 GB
128 K

Llama-3.2-3B-InstructContinue
3.2 B
2 GB
128 K

Llama-3.3-70B-InstructContinue
70 B
43 GB
128 K

Qwen2.5-14B-Instruct-GPTQ-Int8Continue
14 B
4.7 GB
128 K

Qwen2.5-7B-InstructContinue
7 B
4.7 GB
128 K

Qwen2.5-Coder-32B-InstructContinue
32 B
20 GB
128 K

QwQ-32BContinue
32.5 B
65.5 GB
32 K

QwQ-32B-PreviewContinue
32.5 B
65.5 GB
32 K

Phi-3.5-MoE-instructContinue
32 B
8.4 GB
128 K

phi-4Continue
14.7 B
8.4 GB
16 K

Marco-o1Continue
7.6 B
4.7 GB
128 K

Llama-3.1-Nemotron-70B-InstructContinue
70 B
43 GB
128 K

aya-expanse-32bContinue
32 B
20 GB
130 K