# Kling field extraction gap log

Status: refinement-only residuals after documentation closeout
Captured: 2026-03-29
Updated: 2026-03-31

## Purpose
Track exactly which field-level details are still missing from the current Qingque-derived documentation so extraction work can proceed methodically instead of vaguely.

---

# 1. Tier 1 gaps

## Omni-Video
### Already strong
- endpoint existence
- create path
- query single/list section existence
- `multi_prompt[]`
- `image_list[]`
- `video_list[]`
- `element_list[]`
- `mode`, `duration`, `aspect_ratio` working knowledge

### Still missing or not fully explicit
- any additional optional create fields beyond those already identified
- only minor nested child-row polish beyond the now-recovered core query/create coverage
- single-query placeholder naming remains slightly inconsistent in preserved docs (`{id}` section title vs `{task_id}` cURL example), but the current canonical interpretation is fixed: treat `{id}` as a generic path placeholder and `task_id` / `external_task_id` as the practical selector semantics

## Text to Video
### Already strong
- create/query section existence
- text-to-video multi-shot section existence
- exact create-body field table rows
- core create-side enums / defaults / conditions for:
  - `model_name`
  - `multi_shot`
  - `shot_type`
  - `prompt`
  - `multi_prompt`
  - `negative_prompt`
  - `sound`
  - `cfg_scale`
  - `mode`
  - `camera_control`
  - `aspect_ratio`
  - `duration`
  - `watermark_info`
  - `callback_url`
  - `external_task_id`

### Still missing
- only minor nested child-row polish beyond the current create/query summary where needed
- path-placeholder naming remains slightly inconsistent in preserved docs (`{id}` section title vs `{task_id}` cURL example), but the current canonical interpretation is fixed: treat `{id}` as a generic path placeholder and `task_id` / `external_task_id` as the practical selector semantics

## Image to Video
### Already strong
- create/query section existence
- image-to-video with multi-shot section existence
- image-to-video with voice of element section existence
- exact create-body field table rows
- exact create-side voice-related surface existence and key rules for `voice_list`
- exact create-side rows for:
  - `image`
  - `image_tail`
  - `multi_shot`
  - `shot_type`
  - `prompt`
  - `multi_prompt`
  - `negative_prompt`
  - `element_list`
  - `voice_list`
  - `sound`
  - `cfg_scale`
  - `mode`
  - `static_mask`
  - `dynamic_masks`
  - `camera_control`
  - `duration`
  - `watermark_info`
  - `callback_url`
  - `external_task_id`

### Still missing
- only minor child-row polish still needed for nested mask / camera / voice structures
- path-placeholder naming remains slightly inconsistent in preserved docs (`{id}` section title vs `{task_id}` cURL example), but the current canonical interpretation is fixed: treat `{id}` as a generic path placeholder and `task_id` / `external_task_id` as the practical selector semantics

---

# 2. Tier 2 gaps

## General - Create Element
### Already strong
- exact endpoint path: `POST /v1/general/advanced-custom-elements`
- exact top-level request-body rows
- exact create response-body shape
- exact differentiator field: `reference_type`
- exact branch objects:
  - `element_image_list` for `image_refer`
  - `element_video_list` for `video_refer`
- optional voice/tag/callback/task-id fields
- downstream `element_list[].element_id`

### Still missing / still weaker
- exact dedicated child-row tables are still not preserved as standalone table rows for `element_image_list` / `element_video_list` / `tag_list`; however, current child recovery is already source-grounded strongly enough from structure blocks, examples, and preserved child-attribute snippets for implementation use
- runtime-confirmed semantics for the create->query handoff to obtain the final `element_id`

## Query / List / Delete element sections
### Already strong
- exact single/list/presets/delete endpoint paths
- exact list query params: `pageNum`, `pageSize`
- exact delete body field: `element_id`
- exact response metadata family:
  - `element_id`
  - `element_name`
  - `element_description`
  - `reference_type`
  - `element_image_list`
  - `element_video_list`
  - `element_voice_info`
  - `tag_list`
  - `owned_by`
  - `status`

### Still missing / still weaker
- only minor child-field table polish remains beyond the currently recovered `element_voice_info` child fields and top-level element metadata cluster
- path-parameter naming is still slightly inconsistent in preserved docs for single query (`{id}` route vs listed `task_id` / `external_task_id` names), but the current canonical interpretation is fixed: treat `{id}` as a generic path placeholder and `task_id` / `external_task_id` as the practical selector semantics
- no live non-billable query example against real created elements in this pass

---

# 3. Tier 3 gaps

## Omni-Image / Image Generation
### Already strong
- create/query section existence
- image generation with element example existence
- exact create-body rows recovered for both families
- exact query single/list response rows recovered for both families
- quality/resolution/result-shape rows recovered

### Still missing
- full confirmation of model-version-specific support ranges referenced by “Capability Map” notes
- no additional element-specific nested create fields were found beyond `element_list[].element_id`; current blocker is now narrowed to absence of further preserved child rows rather than unsearched material
- live validation of which image models truly support `4k`, `series`, and element attachment in practice

---

# 4. Audio / voice gaps

### Already strong
- standalone audio endpoints now field-level visible from preserved artifacts:
  - TTS (`/v1/audio/tts`)
  - Text to Audio (`/v1/audio/text-to-audio`)
  - Video to Audio (`/v1/audio/video-to-audio`)
  - Custom Voice / Presets Voice / Delete Voice (`/v1/general/custom-voices`, `/v1/general/presets-voices`, `/v1/general/delete-voices`)
- exact request/result rows now recoverable for:
  - `voice_id`, `voice_language`, `voice_speed`
  - `prompt`, `duration`
  - `video_id`, `video_url`, `sound_effect_prompt`, `bgm_prompt`, `asmr_mode`
  - `voice_name`, `voice_url`
- preserved workflow + field evidence now supports:
  - Elements 3.0 voice binding via `element_voice_id` on Create Element surfaces
  - voice tags
  - `voice_list`
  - native lip sync
  - multilingual dialogue
  - ambient sound control

### Newly closed or partially closed
- `voice_list` exact shape is now preserved from Image-to-Video:
  ```json
  "voice_list": [
    {"voice_id":"voice_id_1"},
    {"voice_id":"voice_id_2"}
  ]
  ```
- exact request-body location for `<<<voice_1>>>`-style speaker binding is now partially closed:
  - placement is inside the generation `prompt`
  - direct preserved Image-to-Video invocation example shows `prompt` + sibling `voice_list` + `sound: "on"`
- preserved docs now tie Elements 3.0 to audio binding at the element layer through `element_voice_id`

### Still missing from field-level reference
- exact audio input/binding field names for native Series 3 video-generation payloads beyond the currently preserved `voice_list` + prompt-tag pattern
- exact Text-to-Video `voice_list` request-body row with the same extraction strength as Image-to-Video
- exact native-audio generation contract boundaries inside video/image generation requests beyond current `sound`/prompt/voice references
- any preserved API-table row showing whether `element_voice_id`-bound elements can coexist with generation-time `voice_list` in Series 3 create payloads
- exact dedicated request fields, if any, for speech vs ambient/BGM separation inside Series 3 native generation payloads

### Current interpretation boundary
- the strongest field-level grounded native speaker-binding pattern is now:
  - speaker tag inside `prompt`
  - sibling `voice_list`
  - `sound: "on"`
- this pattern is strongest on Image-to-Video and only partially parity-closed on Text-to-Video
- the remaining audio gaps are no longer just extraction gaps; they are partly runtime-contract questions that require live verification

---

# 5. Quality / mode gaps

### Already strong
- Qingque-derived Professional Mode / 1080p / higher-quality hint
- `std` live-confirmed

### Still missing
- exact documented enum/value rows for `mode`
- exact resolution field names, if any
- exact cost/quality controls at API-body level

---

# 6. Why this log matters
This log now exists to track refinement-only residuals after the current documentation closeout.
It should not be read as evidence that implementation-blocking API structure is still missing.
A field only needs to move out of this log when it materially affects implementation safety, verification safety, or current-production guidance.


## 2026-03-29 extraction-method note
A direct raw pass over the stored Qingque `page.html` reconfirmed the existence of Tier 1 Text/Image sections and invocation examples, but did not cleanly expose the exact create-body field rows through naive string search alone.
This means the remaining Tier 1 gaps are not merely “unread”; they are tied to extraction-method limitations in the currently preserved raw HTML view.


## 2026-03-29 network-artifact note
Inspection of `samples/kling-api/qingque-deep/requests.json` revealed that the original Qingque session included meaningful XHR/document-loading endpoints such as:
- `https://docs.qingque.cn/word/api/load/...`
- `https://docs.qingque.cn/merlot/api/docs/cosmo/meta/...`

Interpretation:
- the preserved `page.html` does contain fully rendered Omni field rows, so rendered-HTML extraction is a valid method in principle
- however, Text/Image create-body rows still were not cleanly recoverable from the currently preserved artifacts during this pass
- a better future extraction path may still depend on reproducing or re-fetching the underlying content-loading data rather than only scanning the current preserved page bundle
- this is now an extraction-method insight, not merely a content gap

## 2026-03-29 closeout correction after preserved-artifact comparison
That earlier blocker note is now obsolete.
A fuller preserved-artifact comparison shows:
- deep-page text extracts do provide exact Text-to-Video and Image-to-Video create/query rows at the current field-level target
- deep-extra text extracts do provide the Tier 2 element top-level create/query/list/delete rows and Tier 3 image-generation create/query rows needed for current scope
- the remaining gaps are refinement-only: minor nested child-row polish, capability-map support ranges, and later live validation