Concurrency Rules
What is Kling API concurrency?
Kling API concurrency refers to the maximum number of generation tasks that an account can process in parallel at any given time. This capability is determined by the resource package. A higher concurrency level allows you to submit more API generation requests simultaneously (each call to the task creation interface initiates a new generation task).
Notes
- This only applies to the task creation interface; query interfaces do not consume concurrency.
- This limitation concerns the number of concurrent tasks and is unrelated to Queries Per Second(QPS)— the system imposes no QPS limit.
Core Rules
| Dimension | Rule Description |
|---|---|
| Application Scope | Applied at the account level. Calculated independently per resource pack type (video/image/virtual try-on). All API keys under the same account share the same concurrency quota. |
| Occupancy Logic | A task occupies concurrency from entering submitted status until completion (including failures). Released immediately after task ends. |
| Quota Calculation | Determined by the highest concurrency value among all active resource packages of the same type. Example: If a 5-concurrency + 10-concurrency video package are both active → video concurrency capacity = 10 |
Special Notes
- Video / Virtual Try-on tasks: Each task occupies 1 concurrency.
- Image generation tasks: Concurrency used = the n value in the API request parameter. (Example: n = 9 → occupies 9 concurrency)
Over-limit Error Mechanism
When the number of running tasks reaches the concurrency limit, submitting a request will return an error.
Recommended Approach
Since this error is triggered by system load (not by parameter issues), it is recommended to:
- Backoff Retry Strategy: Use an exponential backoff algorithm to delay retries (recommended initial delay ≥ 1 second).
- Queue Management: Control the submission rate through a task queue and dynamically adapt to available concurrency.