B3005.67$/h
288GB
$0.23
B2004.61$/h
192GB
$0.34
H2002.69$/h
141GB
$0.59
GH2001.64$/h
96GB
$0.00
RTXPro60001.49$/h
96GB
$0.27
A100_80G1.32$/h
80GB
$0.04
H1002.09$/h
80GB
$0.26
A60000.54$/h
48GB
$0.01
A400.50$/h
48GB
$0.00
L40S0.81$/h
48GB
$0.00
RTX6000Ada0.85$/h
48GB
$0.32
L401.09$/h
48GB
$0.00
A1001.38$/h
40GB
$0.00
RTX50900.71$/h
32GB
$0.06
V100_32G2.57$/h
32GB
$0.00
A50000.50$/h
24GB
$0.00
RTX60000.55$/h
24GB
$0.00
A100.82$/h
24GB
$0.00
L40.92$/h
24GB
$0.00
RTX40900.44$/h
24GB
$0.10
A40000.17$/h
16GB
$0.00
V1000.43$/h
16GB
$0.06
A160.56$/h
16GB
$0.00
CPU0.39$/h
0GB
$0.00
Inference
Quickly deploy a public or custom model to a dedicated inference endpoint.
- Model
- Parameters
- Size
- Context Window
gte-Qwen2-1.5B-instructContinue- 1.5 B
- 2 GB
- 128 K
Qwen2.5-7B-InstructContinue- 7 B
- 4.7 GB
- 128 K
Qwen2.5-Coder-32B-InstructContinue- 32 B
- 20 GB
- 128 K
Qwen2.5-Instruct-GPTQ-Int8Continue- 14 B
- 4.7 GB
- 128 K
Qwen3-14BContinue- 14.2 B
- 9 GB
- 32 K
Qwen3-30B-A3BContinue- 30 B
- 17 GB
- 128 K
Qwen3-32BContinue- 32.8 B
- 33 GB
- 128 K
QwQ-32BContinue- 32.5 B
- 65.5 GB
- 32 K
QwQ-32B-PreviewContinue- 32.5 B
- 65.5 GB
- 32 K
Mistral-7B-Instruct-v0.3Continue- 7 B
- 4.1 GB
- 32 K
Mistral-Nemo-Instruct-2407Continue- 12.2 B
- 7.1 GB
- 128 K
Mistral-Small-Instruct-2409Continue- 7 B
- 4.7 GB
- 128 K
Mistral-Small-Instruct-2501Continue- 24 B
- 14 GB
- 128 K
DeepSeek-R1-Distill-LlamaContinue- 70 B
- 4.7 GB
- 128 K
DeepSeek-R1-Distill-QwenContinue- 32 B
- 20 GB
- 128 K
DeepSeek-R1-Distill-QwenContinue- 14 B
- 9 GB
- 130 K