# Kling API field status register

Status: active cross-reference register
Captured: 2026-03-29

## Purpose
Provide a compact cross-document register of the most important field clusters and their current confidence state so future implementation work can avoid mixing:
- doc-derived truth
- live-confirmed truth
- live-rejected paths
- unverified hypotheses

---

## Status legend
- **doc-derived**: seen in Qingque-derived capture or preserved source notes
- **live-confirmed**: observed working in live API runs
- **live-rejected**: observed failing in live API runs
- **hypothesis-only**: plausible but not yet anchored in doc or live confirmation

---

# A. Core video-generation fields

## `model_name`
- status: doc-derived + live-confirmed
- current pipeline policy: only 3.0 / 3.0 Omni models allowed

## `prompt`
- status: doc-derived + live-confirmed

## `duration`
- status: doc-derived + live-confirmed
- current pipeline policy: single-scene ceiling = 15s

## `mode`
- status: doc-derived + live-confirmed
- `std` -> live-confirmed
- `standard` -> live-rejected
- `pro` -> doc-derived quality signal, not yet sufficiently live-profiled in current session

## `aspect_ratio`
- status: doc-derived + live-confirmed
- confirmed example: `16:9`

---

# B. Multi-shot cluster

## `multi_shot`
- status: doc-derived + live-confirmed

## `shot_type`
- status: doc-derived + live-confirmed
- known valid value: `customize`

## `multi_prompt[]`
- status: doc-derived + live-confirmed

### `multi_prompt[].index`
- status: doc-derived + live-confirmed

### `multi_prompt[].prompt`
- status: doc-derived

### `multi_prompt[].duration`
- status: doc-derived + live-confirmed

### known rule
- if `multi_shot=false`, `shot_type` and `multi_prompt` are invalid

---

# C. Image reference cluster (Omni)

## `image_list[]`
- status: doc-derived + live-confirmed

### `image_list[].image_url`
- status: doc-derived + live-confirmed
- note: supports base64 or URL according to SOT-derived text

### `image_list[].type`
- status: doc-derived + live-confirmed
- known values: `first_frame`, `end_frame`

## live-rejected legacy path
### `image_list[].image`
- status: live-rejected
- should be treated as invalid legacy assumption for Omni

### `image_list[].image + index`
- status: live-rejected

## separate weaker path
### top-level `image=<base64>` on Omni
- status: live-confirmed create/succeed possible
- caution: weak or insufficient reference adherence in observed tests
- use with care; not the preferred strong-reference path

---

# D. Video reference cluster (Omni)

## `video_list[]`
- status: doc-derived + partially live-confirmed

### `video_list[].video_url`
- status: doc-derived + partially live-confirmed
- base64 video -> live-rejected (`Video URL is invalid`)
- remote URL -> live-confirmed create/query success

### `video_list[].refer_type`
- status: doc-derived
- observed value: `base`

### `video_list[].keep_original_sound`
- status: doc-derived
- observed value: `yes`

## old invalid method
### top-level improvised `video_url`
- status: methodology rejected
- not SOT-grounded and should not be used as validated contract proof

---

# E. Element cluster

## `element_list[]`
- status: doc-derived

### `element_list[].element_id`
- status: doc-derived
- downstream attachment structure observed directly in captured material
- upstream element-create/query contract is now materially transcribed from preserved artifact extraction
- preserved deep HTML now also exposes child-attribute blocks confirming:
  - Omni-Image: `element_id` = `long`, required, “Element ID from element library`
  - Image Generation: `element_id` = `long`, required, `Element ID`
- remaining weakness is no evidence of any richer nested element child object beyond this single child field

## General / Element API surface
- Create Element -> doc-derived section confirmed
  - endpoint: `POST /v1/general/advanced-custom-elements`
  - exact create fields now preserved from artifact extraction:
    - `element_name`
    - `element_description`
    - `reference_type`
    - `element_image_list`
    - `element_video_list`
    - `element_voice_id`
    - `tag_list`
    - `callback_url`
    - `external_task_id`
- Create Multi-Image Elements -> doc-derived invocation example confirmed
  - `reference_type=image_refer`
  - `element_image_list.frontal_image`
  - `element_image_list.refer_images[].image_url`
- Create Video Character Elements -> doc-derived invocation example confirmed
  - `reference_type=video_refer`
  - `element_video_list.refer_videos[].video_url`
- Query Custom Element (Single) -> doc-derived section confirmed
  - endpoint: `GET /v1/general/advanced-custom-elements/{id}`
  - preserved response includes `task_result.elements[]` with:
    - `element_id`
    - `element_name`
    - `element_description`
    - `reference_type`
    - `element_image_list`
    - `element_video_list`
    - `element_voice_info`
    - `tag_list`
    - `owned_by`
    - `status`
- Query Custom Element (List) -> doc-derived section confirmed
  - endpoint: `GET /v1/general/advanced-custom-elements`
  - query params: `pageNum`, `pageSize`
- Query Presets Element (List) -> doc-derived section confirmed
  - endpoint: `GET /v1/general/advanced-presets-elements`
  - query params: `pageNum`, `pageSize`
- Delete Custom Element -> doc-derived section confirmed
  - endpoint: `POST /v1/general/delete-elements`
  - delete body field: `element_id`

## current gap
- nested child-field coverage is now stronger for `element_image_list.frontal_image`, `element_image_list.refer_images[].image_url`, `element_video_list.refer_videos[].video_url`, `tag_list[].tag_id`, and `element_voice_info.{voice_id,voice_name,trial_url,owned_by}`, but these are still recovered from structure/example/response-shape evidence rather than preserved standalone child-row tables everywhere
- single-query path semantics remain doc-derived but slightly internally inconsistent (`{id}` route vs listed `task_id` / `external_task_id` names)

---

# F. Audio / voice cluster

## standalone audio endpoints
- `POST /v1/audio/tts` -> doc-derived
- `POST /v1/audio/text-to-audio` -> doc-derived
- `POST /v1/audio/video-to-audio` -> doc-derived
- `POST /v1/general/custom-voices` -> doc-derived
- `GET /v1/general/presets-voices` -> doc-derived

## `voice_id`
- status: doc-derived
- confirmed on TTS and custom-voice result/list surfaces

## `voice_language`
- status: doc-derived
- confirmed on TTS create surface
- observed enum: `zh|en`

## `voice_speed`
- status: doc-derived
- confirmed on TTS create surface
- range: `[0.8, 2.0]`

## `sound_effect_prompt`
- status: doc-derived
- confirmed on Video-to-Audio create surface

## `bgm_prompt`
- status: doc-derived
- confirmed on Video-to-Audio create surface

## `asmr_mode`
- status: doc-derived
- confirmed on Video-to-Audio create surface
- default `false`

## `voice_name`
- status: doc-derived
- confirmed on Custom Voice create/result surfaces

## `voice_url`
- status: doc-derived
- confirmed on Custom Voice create surface

## `voice_list`
- status: doc-derived
- exact request-body row preserved on Image-to-Video
- shape preserved as:
  ```json
  "voice_list": [
    {"voice_id":"voice_id_1"},
    {"voice_id":"voice_id_2"}
  ]
  ```
- constraints preserved:
  - up to 2 voices
  - billed as specified-voice generation when prompt references the voice ID
  - `voice_id` must come from Custom Voices or Presets Voices, not Lip-Sync voice IDs
  - `element_list` and `voice_list` are mutually exclusive on Image-to-Video
- Text-to-Video parity is prompt-note-confirmed but row-level extraction is still weaker

## `<<<voice_1>>>`-style voice tags
- status: doc-derived prompt-level guidance
- preserved on Text-to-Video prompt notes and Image-to-Video prompt notes/examples
- current strongest request-body evidence:
  - speaker tag is embedded inside `prompt`
  - Image-to-Video invocation example shows `prompt` + sibling `voice_list` + `sound: "on"`
- exact dedicated standalone field does not exist in preserved evidence; current evidence points to prompt-embedded placement

## native audio / lip sync
- status: split
- stronger than blog-only because preserved Text-to-Video/Image-to-Video prompt notes explicitly require `sound=on` when specifying voice, and preserved Image-to-Video rows expose `voice_list`
- still incomplete because no separate fully closed field cluster for lip-sync/native-audio behavior has been extracted beyond prompt tags, `voice_list`, and `sound`

## `keep_original_sound`
- status: doc-derived inside `video_list[]`
- exact runtime semantics not yet live-confirmed

---

# G. Quality / mode cluster

## Professional Mode / 1080p higher-quality output
- status: doc-derived hint from Qingque capture
- exact runtime mapping still needs more systematic live profiling

## current practical interpretation
- `pro` is likely the premium-quality path
- should be prioritized later for high-quality one-scene testing once the core contract is locked tightly enough

---

# H. Legacy assumptions to avoid
- `image_list[].image` for Omni
- `image_list[].image + index` for Omni
- top-level guessed `video_url=<base64 video>` as if it were validated video-reference support
- multi-clip chaining as the default answer for one scene

---

# I. Most important remaining unknowns
1. exact Text-to-Video `voice_list` row parity and broader audio-binding parity across generation surfaces
2. exact interaction rules between element-bound voice and generation-time `voice_list`, if supported
3. exact `pro`-mode runtime behavior and cost profile in intended tasks
4. capability-map-specific support ranges that are referenced but not fully exposed in preserved field tables
5. deeper nested child-row tables for camera / mask / element subobjects where preserved artifacts only expose structure/examples

---

## Use of this register
When implementing or testing any new payload:
1. check this register first
2. if a field is only doc-derived, use care but it is SOT-grounded
3. if a field is live-rejected, do not reuse it casually
4. if a field is hypothesis-only, do not send it to a billable create endpoint without explicit approval


## 2026-03-29 newly extracted Omni row constraints
- `sound` is doc-derived with enum `on|off`; when reference video exists, SOT says `sound` can only be `off`
- `mode` is doc-derived with default `pro`, enum `std|pro`, and explicit 720P/1080P meanings
- `video_list[].video_url` is doc-derived as a single uploaded video with strict constraints:
  - 1 video only
  - `.mp4/.mov`
  - duration 3~10s
  - resolution 720px~2160px inclusive on both axes
  - frame rates 24~60 fps, output 24 fps
  - max 200MB
- `image_list[].image_url` has doc-derived constraints:
  - formats `.jpg/.jpeg/.png`
  - max 10MB
  - width/height >= 300px
  - aspect ratio between 1:2.5 and 2.5:1
- `aspect_ratio` becomes required when first-frame reference or video-editing features are not used

## 2026-03-29 extraction-confidence refinement
- Omni-Video Create row extraction is now supported by both preserved `page.txt` and preserved rendered `page.html`
- Text-to-Video Create exact rows are recoverable from preserved deep-page text extraction
- Image-to-Video Create exact rows are recoverable from preserved deep-page text extraction
- Text-to-Video Query Single/List exact path/query/response rows are recoverable from preserved deep-page text extraction
- Image-to-Video Query Single/List exact path/query/response rows are recoverable from preserved deep-page text extraction
- remaining uncertainty for Text/Image is now mostly about nested child-row polish and the slight single-query placeholder inconsistency (`{id}` section title vs `{task_id}` cURL example)
