Image to Video
Create Task
Please note that in order to maintain naming consistency, the original model field has been changed to model_name. Please use this field to specify the model version in the future.
We maintain backward compatibility. If you continue using the original model field, it will not affect API calls and will be equivalent to the default behavior when model_name is empty (i.e., calling the V1 model).
Request Header
Data Exchange Format
Authentication information, refer to API authentication
Request Body
Model Name
Reference Image
- Supports image Base64 encoding or image URL (ensure accessibility)
- Important: When using Base64, do NOT add any prefix like
data:image/png;base64,. Submit only the raw Base64 string.
- Correct Base64 format:
- Incorrect Base64 format (with data: prefix):
- Supported image formats:
.jpg / .jpeg / .png
- File size: ≤10MB, dimensions: min 300px, aspect ratio: 1:2.5 ~ 2.5:1
- At least one of
imageorimage_tailmust be provided; both cannot be empty
Support varies by model version and video mode. See Capability Map for details.
Reference Image - End frame control
- Supports image Base64 encoding or image URL (ensure accessibility)
- Important: When using Base64, do NOT add any prefix like
data:image/png;base64,. Submit only the raw Base64 string.
- Supported image formats:
.jpg / .jpeg / .png
- File size: ≤10MB, dimensions: min 300px
- At least one of
imageorimage_tailmust be provided; both cannot be empty
image_tail,dynamic_masks/static_mask, andcamera_controlare mutually exclusive - only one can be used at a time
Support varies by model version and video mode. See Capability Map for details.
Whether to generate multi-shot video
When true: the prompt parameter is invalid, and the first/end frame generation is not supported.
When false: the shot_type and multi_prompt parameters are invalid
Storyboard method
When multi_shot is true, this parameter is required
Positive text prompt
The Omni model can achieve various capabilities through Prompt with elements, images, videos, and other content:
- Specify elements/images/videos using <<<>>> format, e.g.: <<<element_1>>>, <<<image_1>>>, <<<video_1>>>
- For detailed capabilities, see: KLING Omni Model User Guide, Kling VIDEO 3.0 Omni Model User Guide
- Cannot exceed 2500 characters
- When multi_shot is false or shot_type is intelligence, this parameter must not be empty.
- Use
<<<voice_1>>>to specify voice, with the sequence matching the voice_list parameter order
- A video generation task can reference up to 2 voices; when specifying a voice, the sound parameter must be "on"
- The simpler the syntax structure, the better. Example:
The man<<<voice_1>>> said: "Hello"
- When voice_list is not empty and prompt references voice ID, the task will be billed as "with specified voice"
Support varies by model version and video mode. See Capability Map for details.
Information about each storyboard, such as prompts and duration
Define the shot sequence number, corresponding prompt word, and duration through the index, prompt, and duration parameters, where:
- Supports up to 6 storyboards, with a minimum of 1 storyboard.
- The maximum length of the prompt for each storyboard 512 characters.
- The duration of each storyboard should not exceed the total duration, but should not be less than 1.
- The sum of the durations of all storyboards equals the total duration of the current task.
Load with key:value format as follows:
When multi_shot is true and shot_type is customize, this parameter is required.
Negative text prompt
- Cannot exceed 2500 characters
- It is recommended to supplement negative prompt via negative sentences within positive prompts
Reference Element List, based on element ID from element library
- Supports up to 3 reference elements
The elements are categorized into video customization element (named as Video Character Elements) and image customization elements (named as Multi-Image Elements), each with distinct scopes of application. Please exercise caution in distinguishing between them. See Kling Element Library User Guide.
- Load with key:value format as follows:
Support varies by model version and video mode. See Capability Map for details.
Element ID from element library
List of voices referenced when generating videos
- A video generation task can reference up to 2 voices
- When voice_list is not empty and prompt references voice ID, the task will be billed as "with specified voice"
- voice_id is returned through the voice customization API, or use system preset voices. See Custom Voices API; NOT the voice_id of Lip-Sync API
- element_list and voice_list are mutually exclusive and cannot coexist
Example:
The support range for different model versions and video modes varies. For details, see Capability Map
Whether to generate sound when generating video
The support range for different model versions and video modes varies. For details, see Capability Map
Flexibility in video generation; higher value means lower model flexibility and stronger relevance to user prompt
- Value range: [0, 1]
kling-v2.x models do not support this parameter
Video generation mode
- std: Standard Mode - basic mode, cost-effective
- pro: Professional Mode (High Quality) - high performance mode, better video quality
Support varies by model version and video mode. See Capability Map for details.
Static brush mask area (mask image created by user using motion brush)
The "Motion Brush" feature includes Dynamic Brush (dynamic_masks) and Static Brush (static_mask)
- Supports image Base64 encoding or image URL (same format requirements as image field)
- Supported image formats:
.jpg / .jpeg / .png
- Aspect ratio must match the input image (image field), otherwise task will fail
- Resolution of static_mask and dynamic_masks.mask must be identical, otherwise task will fail
Support varies by model version and video mode. See Capability Map for details.
Dynamic brush configuration list
- Can configure multiple groups (up to 6), each containing "mask area" and "motion trajectory" sequence
Support varies by model version and video mode. See Capability Map for details.
Dynamic brush mask area (mask image created by user using motion brush)
- Supports image Base64 encoding or image URL (same format requirements as image field)
- Supported image formats: .jpg / .jpeg / .png
- Aspect ratio must match the input image (image field), otherwise task will fail
- Resolution of static_mask and dynamic_masks.mask must be identical, otherwise task will fail
Motion trajectory coordinate sequence
- For 5s video, trajectory length ≤77, coordinate count range: [2, 77]
- Coordinate system uses bottom-left corner of image as origin
Note 1: More coordinate points = more accurate trajectory. 2 points = straight line between them
Note 2: Trajectory direction follows input order. First coordinate is start point, subsequent coordinates are connected sequentially
X coordinate of trajectory point (pixel coordinate with image bottom-left as origin)
Y coordinate of trajectory point (pixel coordinate with image bottom-left as origin)
Camera movement control protocol (if not specified, model will intelligently match based on input text/images)
Support varies by model version and video mode. See Capability Map for details.
Predefined camera movement type
- simple: Simple camera movement, can choose one of six options in "config"
- down_back: Camera descends and moves backward ➡️ Pan down and zoom out. config parameter not required
- forward_up: Camera moves forward and tilts up ➡️ Zoom in and pan up. config parameter not required
- right_turn_forward: Rotate right then move forward ➡️ Right rotation advance. config parameter not required
- left_turn_forward: Rotate left then move forward ➡️ Left rotation advance. config parameter not required
Contains 6 fields to specify camera movement in different directions
- Required when type is "simple"; leave empty for other types
- Choose only one parameter to be non-zero; rest must be 0
Horizontal movement - camera translation along x-axis
- Value range: [-10, 10]. Negative = left, Positive = right
Vertical movement - camera translation along y-axis
- Value range: [-10, 10]. Negative = down, Positive = up
Horizontal pan - camera rotation around y-axis
- Value range: [-10, 10]. Negative = rotate left, Positive = rotate right
Vertical tilt - camera rotation around x-axis
- Value range: [-10, 10]. Negative = tilt down, Positive = tilt up
Roll - camera rotation around z-axis
- Value range: [-10, 10]. Negative = counterclockwise, Positive = clockwise
Zoom - controls camera focal length change, affects field of view
- Value range: [-10, 10]. Negative = longer focal length (narrower FOV), Positive = shorter focal length (wider FOV)
Video duration in seconds
Support varies by model version and video mode. See Capability Map for details.
Whether to generate watermarked results simultaneously
- Defined by the enabled parameter, format:
- true: generate watermarked result, false: do not generate
- Custom watermarks are not currently supported
Callback notification URL for task result. If configured, server will notify when task status changes.
- For specific message schema, see Callback Protocol
Customized Task ID
- Will not overwrite system-generated task ID, but supports querying task by this ID
- Must be unique within a single user account
Scenario invocation examples
Image to video with multi-shot
Image to video with element
Generate video with voice control
Query Task (Single)
Request Header
Data Exchange Format
Authentication information, refer to API authentication
Path Parameters
Task ID for image to video
- Request path parameter, fill value directly in request path
- Choose one between task_id and external_task_id for querying
Customized Task ID for image to video
- The external_task_id provided when creating the task
- Choose one between task_id and external_task_id for querying
Query Task (List)
Request Header
Data Exchange Format
Authentication information, refer to API authentication
Query Parameters
Page number
- Value range: [1, 1000]
Data volume per page
- Value range: [1, 500]