The Kling 3.0 series models API is now fully available
Learn More
Get Started
Overview
Quick Start
Changelog
API Reference
General Info
Rate Limits
Callback Schema
Video Generation
Models
Video Omni
Text to Video
Image to Video
Reference to Video
Motion Control
Multi-elements to video
Extend Video
Lip Sync
Avatar
Text to Audio
Video to Audio
Text to Speech
Voice Clone
Image Recognize
Element
Effects
Effect Templates
NEW
Video Effects
Image Generation
Models
Image Omni
Image Generation
Reference to Image
Extend Image
AI Multi-Shot
Virtual Try-On
Others
Query user info
Pricing
Billing Info
Prepaid Resource Packs
Protocols
Privacy Policy of API Service
Terms of API Service
API Service Level Agreement
Video Models

kling-video-o1

	
std（3s～10s）

	
pro（3s～10s）


text to video

	
single-shot-video generation

	
✅（only 5s、10s）

	
✅（only 5s、10s）


voice control

	
❌

	
❌


others

	
-

	
-


image to video

	
single-shot-video generation

（only start frame）

	
✅（only 5s、10s）

	
✅（only 5s、10s）


start & end frame

	
✅

	
✅


element control

（only multi-image elements）

	
✅

	
✅


cideo reference

(including multi-image elements)

	
✅

	
✅


voice control

	
❌

	
❌


others

	
-

	
-

kling-v3-omni

	
std（3s～15s）

	
pro（3s～15s）


text to video

	
single-shot-video generation

	
✅

	
✅


multi-shot-video generation

	
✅

	
✅


voice control

	
❌

	
❌


others

	
-

	
-


image to video

	
single-shot-video generation

	
✅

	
✅


multi-shot-video generation

	
✅

	
✅


start & end frame

	
✅

	
✅


element control

（video character elements & multi-image elements）

	
✅

	
✅


reference video

	
✅（only 3s～10s）

	
✅（only 3s～10s）


voice control

	
❌

	
❌


others

	
-

	
-

kling-v1

	
std 5s

	
std 10s

	
pro 5s

	
pro10s


text

to video

	
video generation

	
✅

	
✅

	
✅

	
✅


camera control

	
✅

	
-

	
-

	
-


image

to video

	
video generation

	
✅

	
✅

	
✅

	
✅


start/end frame

	
✅

	
-

	
✅

	
-


motion brush

	
✅

	
-

	
✅

	
-


others

	
-

	
-

	
-

	
-


video extension

（Not supported negative_prompt and cfg_scale)

	
✅

	
✅

	
✅

	
✅


video effects

Dual-character: Hug, Kiss, heart_gesture

	
✅

	
✅

	
✅

	
✅


others

	
-

	
-

	
-

	
-

kling-v1-5

	
std 5s

	
std 10s

	
pro 5s

	
pro10s


text

to video

	
video generation

	
-

	
-

	
-

	
-


others

	
-

	
-

	
-

	
-


image

to video

	
video generation

	
✅

	
✅

	
✅

	
✅


start/end frame

	
-

	
-

	
✅

	
✅


end frame

	
-

	
-

	
✅

	
✅


motion brush

	
-

	
-

	
✅

	
-


camera control

（simple only）

	
-

	
-

	
✅

	
-


others

	
-

	
-

	
-

	
-


video extension

	
✅

	
✅

	
✅

	
✅


video effects

Dual-character: Hug, Kiss, heart_gesture

	
✅

	
✅

	
✅

	
✅


others

	
-

	
-

	
-

	
-

kling-v1-6

	
std 5s

	
std 10s

	
pro 5s

	
pro10s


text

to video

	
video generation

	
✅

	
✅

	
✅

	
✅


others

	
-

	
-

	
-

	
-


image

to video

	
video generation

	
✅

	
✅

	
✅

	
✅


start/end frame

	
-

	
-

	
✅

	
✅


end frame

	
-

	
-

	
✅

	
✅


others

	
-

	
-

	
-

	
-


multi-image2video

	
✅

	
✅

	
✅

	
✅


multi-elements

	
✅

	
✅

	
✅

	
✅


video extension

	
✅

	
✅

	
✅

	
✅


video effects

Dual-character: Hug, Kiss, heart_gesture

	
✅

	
✅

	
✅

	
✅

kling-v2-master

	
5s

	
10s


text

to video

	
video generation

	
✅

	
✅


others

	
-

	
-


image

to video

	
video generation

	
✅

	
✅


others

	
-

	
-


others

	
-

	
-

kling-v2-1

	
std 5s

	
std 10s

	
pro 5s

	
pro10s


text

to video

	
all

	
-

	
-

	
-

	
-


image

to video

	
video generation

	
✅

	
✅

	
✅

	
✅


start/end frame

	
-

	
-

	
✅

	
✅


others

	
-

	
-

	
-

	
-


others

	
-

	
-

	
-

	
-

kling-v2-1-master

	
5s

	
10s


text

to video

	
video generation

	
✅

	
✅


others

	
-

	
-


image

to video

	
video generation

	
✅

	
✅


others

	
-

	
-


others

	
-

	
-

kling-v2-5-turbo

	
std 5s

	
std 10s

	
pro 5s

	
pro10s


text

to video

	
video generation

	
✅

	
✅

	
✅

	
✅


others

	
-

	
-

	
-

	
-


image

to video

	
video generation

	
✅

	
✅

	
✅

	
✅


start/end frame

	
-

	
-

	
✅

	
✅


others

	
-

	
-

	
-

	
-


others

	
-

	
-

	
-

	
-

kling-v2-6

	
std 5s

	
std 10s

	
std x other duration

	
pro 5s

	
pro10s

	
pro x other duration


text to video

	
video generation

	
✅ (only no audio)

	
✅ (only no audio)

	
-

	
✅

	
✅

	
-


others

	
-

	
-

	
-

	
-

	
-

	
-


image to video

	
video generation

	
✅ (only no audio)

	
✅ (only no audio)

	
-

	
✅

	
✅

	
-


start/end frame

	
-

	
-

	
-

	
✅ (only no audio)

	
✅ (only no audio)

	
-


voice control

	
-

	
-

	
-

	
✅

	
✅

	
-


motion control

	
-

	
-

	
✅

	
-

	
-

	
✅


others

	
-

	
-

	
-

	
-

	
-

	
-

kling-v3

	
std（3～15s）

	
pro（3～15s）


text to video

	
single-shot-video generation

	
✅

	
✅


multi-shot-video generation

	
✅

	
✅


voice control

	
❌

	
❌


others

	
-

	
-


image to video

	
single-shot-video generation （only start frame）

	
✅

	
✅


multi-shot-video generation

	
✅

	
✅


start & end frame

	
✅

	
✅


element control

（video character elements & multi-image elements）

	
✅

	
✅


motion control

	
✅

	
✅


voice control

	
❌

	
❌


others

	
-

	
-

no related of model

	
support or not

	
description


avatar

	
✅

	
Generate digital human broadcast-style videos with just one photo


lip sync

	
✅

	
Can be combined with text or audio to drive the mouth shape of characters in the video


video to audio

	
✅

	
Supports adding audio to all videos generated by Kling models and user-uploaded videos


text to audio

	
-

	
Supports generating audio by text prompts


others

	
-

	
-

Model

	
kling-v1

	
kling-v1-5

	
kling-v1-6

Image to Video

	
kling-v1-6

Text to Video

	
kling-v2 Master


Mode

	
STD

	
PRO

	
STD

	
PRO

	
STD

	
PRO

	
STD

	
PRO

	
-


Resolution

	
720p

	
720p

	
720p

	
1080p

	
720p

	
1080p

	
720p

	
1080p

	
720p


Frame Rate

	
30fps

	
30fps

	
30fps

	
30fps

	
30fps

	
30fps

	
24fps

	
24fps

	
24fps

Model

	
kling-v2-1

Image to Video

	
kling-v2-1 Master

	
kling-v2-5

Image to Video

	
kling-v2-5

Text to Video


Mode

	
STD

	
PRO

	
-

	
PRO

	
PRO


Resolution

	
720p

	
1080p

	
1080p

	
1080p

	
1080p


Frame Rate

	
24fps

	
24fps

	
24fps

	
24fps

	
24fps

Previous chapter：Callback Schema
Next chapter：Video Omni
The Kling 3.0 Series Models API is Now Fully Available
– All in One, One for All！

Models Available in This Release

Kling 3.0 Motion Control, Kling Video 3.0, Kling Video 3.0 Omni, Kling Image 3.0, Kling Image 3.0 Omni

Refer to <Kling AI Series 3.0 Model API Specification>

Key Highlights of the Models

3.0 All-in-One: A unified model for multi-modal input and output.

Most powerful consistency across the universe: Subject consistency (supports cameo, subject with voice control, i2v + subject) and text consistency.
Narrative control at your fingertips: More freedom, precision, and control—up to 15 seconds long, video scene cuts, ultra-high-definition storyboards/images, custom seconds.
Upgraded native audio-visual output: Supports multiple speakers and languages (with accents).

Kling 3.0 Motion Control

Consistent Facial Identity from any angle
Complex Emotions faithfully reproduced
High fidelity Restoration, Even with Face Occlusions
Consistent Facial Clarity Across Dynamic Framing

User Guide ->

Kling Video 3.0

Compared to 2.6, expected improvements:

Supports subject upload in I2V scenarios for enhanced consistency
Significant improvement in multi-character referencing, especially for three-person scenarios
Supports Japanese, Korean, and Spanish in addition to Chinese and English
Capable of generating certain dialects and accents
Better distinction and control over different types of audio (speech, sound effects, BGM)
Improved text retention in I2V scenarios
Supports scene transitions, with up to 6 shots and customizable storyboarding

User Guide ->

Kling Video 3.0 Omni

Compared to O1, expected improvements:

Native audio-visual synchronization
Supports video subject creation
Further improved consistency in reference-based tasks, especially for characters and products
Combined capabilities of reference + storyboarding + audio-visual sync significantly enhance usability
Supports scene transitions, with up to 6 shots
Extended generation duration up to 15 seconds

User Guide ->

Kling Image 3.0

Highly consistent feature retention
Precise response to detail modifications
Accurate control over style and tone
Rich imaginative capabilities

User Guide ->

Kling Image 3.0 Omni

Enhanced narrative sense
New storyboard image set generation, retaining reference image features with scene relevance
Direct output of 2K/4K ultra-high-definition images
Further improved detail consistency

User Guide ->

Thank you for your support and understanding!

I Got It