The Kling 3.0 series models API is now fully available Learn More Get Started Overview Quick Start Changelog API Reference General Info Rate Limits Callback Schema Video Generation Models Video Omni Text to Video Image to Video Reference to Video Motion Control Multi-elements to video Extend Video Lip Sync Avatar Text to Audio Video to Audio Text to Speech Voice Clone Image Recognize Element Effects Effect Templates NEW Video Effects Image Generation Models Image Omni Image Generation Reference to Image Extend Image AI Multi-Shot Virtual Try-On Others Query user info Pricing Billing Info Prepaid Resource Packs Protocols Privacy Policy of API Service Terms of API Service API Service Level Agreement Video Models kling-video-o1 std(3s~10s) pro(3s~10s) text to video single-shot-video generation ✅(only 5s、10s) ✅(only 5s、10s) voice control ❌ ❌ others - - image to video single-shot-video generation (only start frame) ✅(only 5s、10s) ✅(only 5s、10s) start & end frame ✅ ✅ element control (only multi-image elements) ✅ ✅ cideo reference (including multi-image elements) ✅ ✅ voice control ❌ ❌ others - - kling-v3-omni std(3s~15s) pro(3s~15s) text to video single-shot-video generation ✅ ✅ multi-shot-video generation ✅ ✅ voice control ❌ ❌ others - - image to video single-shot-video generation ✅ ✅ multi-shot-video generation ✅ ✅ start & end frame ✅ ✅ element control (video character elements & multi-image elements) ✅ ✅ reference video ✅(only 3s~10s) ✅(only 3s~10s) voice control ❌ ❌ others - - kling-v1 std 5s std 10s pro 5s pro10s text to video video generation ✅ ✅ ✅ ✅ camera control ✅ - - - image to video video generation ✅ ✅ ✅ ✅ start/end frame ✅ - ✅ - motion brush ✅ - ✅ - others - - - - video extension (Not supported negative_prompt and cfg_scale) ✅ ✅ ✅ ✅ video effects Dual-character: Hug, Kiss, heart_gesture ✅ ✅ ✅ ✅ others - - - - kling-v1-5 std 5s std 10s pro 5s pro10s text to video video generation - - - - others - - - - image to video video generation ✅ ✅ ✅ ✅ start/end frame - - ✅ ✅ end frame - - ✅ ✅ motion brush - - ✅ - camera control (simple only) - - ✅ - others - - - - video extension ✅ ✅ ✅ ✅ video effects Dual-character: Hug, Kiss, heart_gesture ✅ ✅ ✅ ✅ others - - - - kling-v1-6 std 5s std 10s pro 5s pro10s text to video video generation ✅ ✅ ✅ ✅ others - - - - image to video video generation ✅ ✅ ✅ ✅ start/end frame - - ✅ ✅ end frame - - ✅ ✅ others - - - - multi-image2video ✅ ✅ ✅ ✅ multi-elements ✅ ✅ ✅ ✅ video extension ✅ ✅ ✅ ✅ video effects Dual-character: Hug, Kiss, heart_gesture ✅ ✅ ✅ ✅ kling-v2-master 5s 10s text to video video generation ✅ ✅ others - - image to video video generation ✅ ✅ others - - others - - kling-v2-1 std 5s std 10s pro 5s pro10s text to video all - - - - image to video video generation ✅ ✅ ✅ ✅ start/end frame - - ✅ ✅ others - - - - others - - - - kling-v2-1-master 5s 10s text to video video generation ✅ ✅ others - - image to video video generation ✅ ✅ others - - others - - kling-v2-5-turbo std 5s std 10s pro 5s pro10s text to video video generation ✅ ✅ ✅ ✅ others - - - - image to video video generation ✅ ✅ ✅ ✅ start/end frame - - ✅ ✅ others - - - - others - - - - kling-v2-6 std 5s std 10s std x other duration pro 5s pro10s pro x other duration text to video video generation ✅ (only no audio) ✅ (only no audio) - ✅ ✅ - others - - - - - - image to video video generation ✅ (only no audio) ✅ (only no audio) - ✅ ✅ - start/end frame - - - ✅ (only no audio) ✅ (only no audio) - voice control - - - ✅ ✅ - motion control - - ✅ - - ✅ others - - - - - - kling-v3 std(3~15s) pro(3~15s) text to video single-shot-video generation ✅ ✅ multi-shot-video generation ✅ ✅ voice control ❌ ❌ others - - image to video single-shot-video generation (only start frame) ✅ ✅ multi-shot-video generation ✅ ✅ start & end frame ✅ ✅ element control (video character elements & multi-image elements) ✅ ✅ motion control ✅ ✅ voice control ❌ ❌ others - - no related of model support or not description avatar ✅ Generate digital human broadcast-style videos with just one photo lip sync ✅ Can be combined with text or audio to drive the mouth shape of characters in the video video to audio ✅ Supports adding audio to all videos generated by Kling models and user-uploaded videos text to audio - Supports generating audio by text prompts others - - Model kling-v1 kling-v1-5 kling-v1-6 Image to Video kling-v1-6 Text to Video kling-v2 Master Mode STD PRO STD PRO STD PRO STD PRO - Resolution 720p 720p 720p 1080p 720p 1080p 720p 1080p 720p Frame Rate 30fps 30fps 30fps 30fps 30fps 30fps 24fps 24fps 24fps Model kling-v2-1 Image to Video kling-v2-1 Master kling-v2-5 Image to Video kling-v2-5 Text to Video Mode STD PRO - PRO PRO Resolution 720p 1080p 1080p 1080p 1080p Frame Rate 24fps 24fps 24fps 24fps 24fps Previous chapter:Callback Schema Next chapter:Video Omni The Kling 3.0 Series Models API is Now Fully Available – All in One, One for All! Models Available in This Release Kling 3.0 Motion Control, Kling Video 3.0, Kling Video 3.0 Omni, Kling Image 3.0, Kling Image 3.0 Omni Refer to Key Highlights of the Models 3.0 All-in-One: A unified model for multi-modal input and output. Most powerful consistency across the universe: Subject consistency (supports cameo, subject with voice control, i2v + subject) and text consistency. Narrative control at your fingertips: More freedom, precision, and control—up to 15 seconds long, video scene cuts, ultra-high-definition storyboards/images, custom seconds. Upgraded native audio-visual output: Supports multiple speakers and languages (with accents). Kling 3.0 Motion Control Consistent Facial Identity from any angle Complex Emotions faithfully reproduced High fidelity Restoration, Even with Face Occlusions Consistent Facial Clarity Across Dynamic Framing User Guide -> Kling Video 3.0 Compared to 2.6, expected improvements: Supports subject upload in I2V scenarios for enhanced consistency Significant improvement in multi-character referencing, especially for three-person scenarios Supports Japanese, Korean, and Spanish in addition to Chinese and English Capable of generating certain dialects and accents Better distinction and control over different types of audio (speech, sound effects, BGM) Improved text retention in I2V scenarios Supports scene transitions, with up to 6 shots and customizable storyboarding User Guide -> Kling Video 3.0 Omni Compared to O1, expected improvements: Native audio-visual synchronization Supports video subject creation Further improved consistency in reference-based tasks, especially for characters and products Combined capabilities of reference + storyboarding + audio-visual sync significantly enhance usability Supports scene transitions, with up to 6 shots Extended generation duration up to 15 seconds User Guide -> Kling Image 3.0 Highly consistent feature retention Precise response to detail modifications Accurate control over style and tone Rich imaginative capabilities User Guide -> Kling Image 3.0 Omni Enhanced narrative sense New storyboard image set generation, retaining reference image features with scene relevance Direct output of 2K/4K ultra-high-definition images Further improved detail consistency User Guide -> Thank you for your support and understanding! I Got It