# Kling priority TODO

Status: strict-closeout pass active
Captured: 2026-03-29
Updated: 2026-03-30

## Priority order
1. **Documentation first**
2. **Code sync second**
3. **Expanded Phase 3 verification only after docs/code closeout is understood correctly**

This ordering is intentional and should not be reversed casually.

---

# Phase 1 — Documentation first (top priority)

## Goal
Finish the Qingque-derived API reference at a field-level so implementation no longer depends on partial memory, vague summaries, or guessed payload shapes.

## Phase 1 closeout status
Status: **top-level field-level documentation closeout achieved; strict residual pass still open**

Closed during the exact-gap rounds:
- [x] Extract exact Text to Video create-body field rows
- [x] Extract exact Image to Video create-body field rows
- [x] Recover exact Text to Video query single/list path/query/response rows
- [x] Recover exact Image to Video query single/list path/query/response rows
- [x] Extract Omni-Video query response envelopes needed for current use
- [x] Extract exact General - Create Element request-body rows
- [x] Extract exact Create Multi-Image Elements example body
- [x] Extract exact Create Video Character Elements example body
- [x] Extract exact Query Custom Element (Single) parameter/response rows at top-level field depth
- [x] Extract exact Query Custom Element (List) query/response rows at top-level field depth
- [x] Extract exact Query Presets Element (List) rows at top-level field depth
- [x] Extract exact Delete Custom Element rows
- [x] Extract exact Omni-Image create/query rows needed for current field-level reference
- [x] Extract exact Image Generation create/query rows needed for current field-level reference
- [x] Extract image-generation quality/resolution-related fields visible in preserved artifacts
- [x] Extract exact `voice_list` field structure where preserved artifacts expose it
- [x] Keep README current enough to reflect the documentation-first state
- [x] Keep status tags and field names materially aligned with preserved evidence

## Strict residuals after top-level Phase 1 closeout
These are no longer top-level field-shape blockers, but they are still open under the stricter 100%-closeout standard:
- [ ] Close deeper nested child-row parity for camera / mask / voice / element subobjects where possible from preserved evidence; otherwise mark exact evidence limits explicitly
- [ ] Reclassify capability-map-specific support ranges beyond the preserved field tables into answered / not-evidenced / out-of-scope states
- [ ] Close Text-to-Video audio/voice parity wording as far as preserved evidence allows, and mark remaining runtime-semantics questions explicitly
- [ ] Reclassify each remaining runtime-semantic question into `live-confirmed`, `doc-confirmed only`, `not evidenced`, or `approval-gated future verification`

---

# Phase 2 — Code sync second

## Goal
Align builders, validators, comments, and defaults with the now-expanded field-level docs.

## Phase 2 TODOs
- [x] Review builders against approved payload patterns (Omni-first narrow scope)
- [x] Review validator allowlists against doc-derived field names (Omni-first narrow scope)
- [~] Add stronger code-side notes for doc-derived vs live-confirmed fields where needed
- [~] Remove or quarantine helpers that encourage deprecated payload patterns
- [x] Align defaults with current single-scene <=15s policy (Omni-first narrow scope)

## Phase 2 closeout note
- For the **current Omni-first narrow sync scope**, Phase 2 closeout has been achieved.
- What qualifies as done enough here:
  - approved Omni-first payload directions are represented in builders / validators
  - live-confirmed `multi_shot` contract checks are enforced strongly enough to block obvious drift
  - README / status docs no longer imply Phase 2 is wholly untouched
- What is explicitly **not** included in this narrow closeout:
  - broad non-Omni cleanup
  - aggressive helper deletion / API redesign
  - full comment/docstring provenance tagging for every field
  - new live verification work

---

# Phase 3 — Expanded verification (approval-gated)

## Goal
Run the smallest necessary verification pass after Phase 1/2 closeout, covering both deeper Omni checks and explicit non-Omni re-verification without over-reading historical evidence as current default policy.

## Phase 3 scope
### Omni verification track
- [x] `pro` mode quality comparison
- [ ] stronger `video_list(remote url)` continuation validation
  - current preferred interpretation: `video_list` is the scene-continuity surface
  - current practical implementation candidate: short-window chaining using the freshly returned Kling result URL from the previous clip
  - treat this as continuity fallback validation, not exact-extension validation
  - prioritize `refer_type='feature'` first for next/previous-shot continuity semantics; compare against `base` only if needed
- [x] element-backed generation validation at the basic recurring-identity level
  - achieved: element create/query live success, `element_list` generation success, `element_list + sound='on'` success
  - still open separately: stronger cross-scene and continuity-complementarity semantics
- [ ] native audio / voice workflow validation
- [x] establish a practical 2-person Omni audiovisual baseline
  - two raw-base64 reference images
  - `kling-v3-omni`
  - no `type='first_frame'` for portrait-like references when transformed-scene generation is the goal
  - explicit wardrobe adaptation away from portrait clothing
  - `sound='on'`
  - direct prompt-level dialogue lines
  - current result: audio stream present, dialogue-like output present, visually coherent 2-person scene grounding present
- [x] compare the new Omni 2-person baseline against element-backed identity reuse at the first-pass practical level
  - current conclusion: no clearly meaningful quality gap proven yet; the more important observed lever was framing/shot objective rather than a decisive baseline-vs-element superiority claim
  - still open separately: broader cross-scene recurring-identity comparison
- [ ] preserve and later revisit the 2026-03-30 Omni lingerie stress scenario as a **risk-blocked content case**, not a shape-error case
  - preserved attempt facts: `kling-v3-omni`, `duration='10'`, three user-provided raw-base64 references in `image_list[].image_url` (host1 / host2 / lingerie object)
  - preserved outcome: create accepted, final query failed with `Failure to pass the risk control system`, `final_unit_deduction='0'`
  - interpretation rule: do not cite that run as evidence against the Omni raw-base64 image-list contract; cite it only as a moderation/risk-control boundary example
- [ ] Phase 3 next priority: verify `video_list` as the current-production scene-continuity reference path
- [x] Elements / `element_list` have now been verified for recurring identity reuse at the basic practical level
- [ ] verify `video_list` + `element_list` together as the complementary scene-continuity + recurring-identity path
- [ ] keep historical `video-extend` out of the current-production core; it is legacy/non-current for this repo
- [x] keep `std` as the default production mode; escalate to `pro` only when higher output quality/resolution is the explicit goal or when `std` shows meaningful weakness
- [x] document multi-shot prompt-writing guidance for cast lock / shot differentiation / anti-freeze endings
- [x] verify that balanced group-motion wording improves the earlier shot-2 foreground-dominance issue in 4-person prompts
- [ ] fold the newest multi-shot findings into a reusable guide/example layer without over-centering templates as the primary artifact
- [ ] decide whether the next quality-focused pass should prioritize stronger cast-purity wording or richer background rendering under the balanced-group constraint

### non-Omni verification track
- [x] align repo production pair framing to `kling-v3` for non-Omni + `kling-v3-omni` for Omni
- [x] keep image input default policy on our side as base64 attachment, not remote image URL by default
- [x] keep endpoint-specific image payload shapes distinct in repo docs/code (`image2video`=`image`/`image_tail`, Omni=`image_list[].image_url`)
- [x] verify `text2video` with `kling-v3`
- [x] verify `image2video` with `kling-v3`
  - successful production-facing image input method: top-level `image=<raw base64>`
- [x] verify `omni` with `kling-v3-omni`
  - successful production-facing image input method: `image_list[].image_url=<raw base64>` with `type='first_frame'`
- [ ] re-check any remaining endpoint-level support/range questions that are still broader than the repo's conservative production posture, then close them by evidence or explicit out-of-scope classification
- [ ] document which non-Omni paths remain provisional vs safely reusable

## Approval gate for starting Phase 3
Do not run the first live Phase 3 create until all of the following are true:
- [ ] the intended payload is classified as `doc-derived`, `live-confirmed`, or `hypothesis-only`
- [ ] the exact endpoint(s) to be hit are named in advance
- [ ] expected create count is stated in advance
- [ ] the user has explicitly approved the create run

## Smallest safe non-Omni verification order
If Phase 3 is expanded to include non-Omni, keep it serialized and minimal:
1. `text2video`
   - one 5s `std` create only
   - no multi-shot
   - purpose: verify the lowest-input-complexity non-Omni create path first
2. `image2video`
   - only after `text2video` succeeds and is worth continuing from
   - one 5s `std` create only
   - use one pre-prepared known-good local asset only (prefer 512x512 JPG/PNG)
3. do **not** keep `reference2video` in the default current-production verification order
   - treat it as a separately classified legacy/non-current route if it remains documented at all

## Highest-risk combinations to avoid first
- any non-Omni endpoint tested with multiple candidate `model_name` values in the same round
- any non-Omni endpoint tested with guessed 3.0 / Omni model bindings just because Omni is the current stable default
- `image2video` with tiny throwaway images after the known `Image pixel is invalid` failures
- legacy/non-current routes such as `reference2video` should not be pulled back into the current-production path casually
- mixing endpoint verification with quality comparison, prompt comparison, or multi-shot goals in the same round
- any hypothesis-only payload on a billable create endpoint without explicit approval

## Preconditions before billable non-Omni tests
- [ ] re-read `kling-billable-create-guardrails.md`
- [ ] re-read `kling-api-field-status-register.md`
- [ ] re-check the exact endpoint path and payload shape against the current SOT docs
- [ ] confirm callback/query/download recovery is ready enough that the same purpose will not be re-created casually
- [ ] confirm required image assets are already prepared locally before `image2video`
- [ ] if any legacy/non-current route is ever revisited later, classify it separately before spending on it
- [ ] confirm maximum unit exposure and hard stop conditions before starting

## Hard stop rules for spending
- stop on the **first successful create** for the immediate verification purpose
- stop if the endpoint returns a clear schema/validation answer that already resolves the question
- stop if continuing would require a second billable create for the same endpoint in the same round
- stop if continuing would require trying a second model binding for the same endpoint
- stop if the artifact can already be recovered by query/download instead of another create
- stop and re-approve before any quality-comparison rerun

## Rule
Phase 3 is **not** a free-form probing bucket.
It is an approval-gated verification phase. Historical PASS notes and preserved artifacts inform it, but they do not waive approval for new billable creates.

## Strict-closeout note
Under the stricter 100%-closeout target, an item is only considered fully closed when it is explicitly classified into one of these terminal states:
- fully closed by preserved documentation
- fully closed by live verification
- closed by explicit negative finding / not evidenced
- closed as intentionally out-of-scope for current production posture

The objective is to drive residual ambiguity down to zero, not to pretend every residual requires a billable create.

## Post-Phase-3 implementation follow-up
- [~] Implement minimal non-LLM Kling watcher **after Phase 3 closeout**, not before.
  - initial MVP scaffold now exists in `scripts/watcher.py`
  - current implemented scope: file-backed jobs, cron-friendly `tick`, `kling_render`, 2-step `kling_chain`, per-job lease lock, `submit_pending`, `needs_attention`, report state, parent->child `video_list` injection
  - current intentional limits: no generic DAG, no automatic ambiguous-submit retry, no messaging transport integration yet, no TTS/assembly watcher support yet
- [ ] Keep watcher v1 small: task registry, pending-task polling, download on success, manifest write, then LLM summary after completion.

---

## Working principle
If there is any tension between moving fast and documenting exact fields, choose documentation first.