To run the model, the system will deploy a Kubernetes pod and allocate the specified amount of memory (MiB) and CPU or GPU-optimized resources.
If the pod cannot handle all user requests, the system will deploy additional pods. When the load is down, they will be deleted.
When triggers exceed thresholds, pod numbers increase within limits and decrease to the initial level when values drop, ensuring stable operation and high performance.
Environment variables created in your container will be available only here.
This is how long the autoscaling waits before deleting the pod that doesn't receive requests. The countdown starts from the last request.