Discover our new AI Inference feature!
Model image
SDXL-Lightning Gradio

SDXL-Lightning Gradio

Image Generation
Llama-3.2-1B-Instruct

Llama-3.2-1B-Instruct

Text Generation
Llama-3.2-3B-Instruct

Llama-3.2-3B-Instruct

Text Generation
Llama-3.1-8B-Instruct

Llama-3.1-8B-Instruct

Text Generation
Mistral-7B-Instruct-v0.3

Mistral-7B-Instruct-v0.3

Text Generation
Mistral-Nemo-Instruct-2407

Mistral-Nemo-Instruc...

Text Generation
Pixtral-12B-2409

Pixtral-12B-2409

Multimodel
whisper-large-v3-turbo

whisper-large-v3-turbo

ASR
whisper-large-v3

whisper-large-v3

ASR
Qwen2-VL-7B-Instruct

Qwen2-VL-7B-Instruct

Multimodel
Qwen2.5-7B-Instruct

Qwen2.5-7B-Instruct

Text Generation
ByteDance/SDXL-Lightning

ByteDance/SDXL-Lightning

Image Generation
stable-cascade

stable-cascade

Image Generation
FLUX.1-schnell

FLUX.1-schnell

Image Generation
FLUX.1-dev

FLUX.1-dev

Image Generation
stable-diffusion-xl

stable-diffusion-xl

Image Generation
stable-diffusion-3.5-large-turbo

stable-diffusion-3.5...

Image Generation
stable-diffusion-3.5-large

stable-diffusion-3.5...

Image Generation
QwQ-32B-Preview

QwQ-32B-Preview

Text Generation
Qwen2.5-Coder-32B-Instruct

Qwen2.5-Coder-32B-In...

Text Generation
Qwen2.5-14B-Instruct-GPTQ-Int8

Qwen2.5-14B-Instruct...

Text Generation
Marco-o1

Marco-o1

Text Generation
Mistral-Small-Instruct-2409

Mistral-Small-Instru...

Text Generation
Llama-3.3-70B-Instruct

Llama-3.3-70B-Instruct

Text Generation
Set startup command
Port
Pod configurations

To run the model, the system will deploy a Kubernetes pod and allocate the specified amount of memory (MiB) and CPU or GPU-optimized resources.

Routing placement
Please, select flavor
Autoscaling limits

If the pod cannot handle all user requests, the system will deploy additional pods. When the load is down, they will be deleted.

sec

Autoscalling triggers

When triggers exceed thresholds, pod numbers increase within limits and decrease to the initial level when values drop, ensuring stable operation and high performance.

%
%
Environment variables

Environment variables created in your container will be available only here.

API Keys
Enable API Key authentication
Pod lifetime

This is how long the autoscaling waits before deleting the pod that doesn't receive requests. The countdown starts from the last request.

sec
Deployment details
Plan
$ / per monthApproximately: $ per hour
Type
Regions