# YouTube Automation v1 — Kling Integration Spec

> **Doc maintenance:** This file is indexed in `docs/current/README.md`. If you change this file's role, scope, status, or filename, update `docs/current/README.md` in the same edit.


작성: 2026-03-28  
상태: **implementation draft**

## 1. 목적
이 문서는 `kling-api-reference.md`를 바탕으로 유튜브 자동화 v1에서 사용할 Kling 연동 인터페이스와 실행 흐름을 구현 가능한 수준으로 정리한다.

핵심 원칙:
- Kling direct API 중심
- AK/SK → JWT Bearer 인증
- `mode` uses `std` / `pro` (confirmed by live smoke test)
- callback-first 비동기 운영
- faceless long-form 기준
- v1에서는 endpoint 수를 최소화

---

## 2. v1 채택 capability

## 2.1 Contract confidence
- **Confirmed**: `omni-video`, AK/SK → JWT auth direction, callback payload core fields, Query(Single/List) structure existence in Series 3.0 doc
- **Provisional until live smoke test**: exact query paths, some model allowlists outside directly observed enums
- **Confirmed by live smoke test**:
  - Omni Video create path, JWT auth flow, `kling-v3-omni`, `mode` enum=`std|pro`
  - `text2video` with `kling-v3` = verified
  - `image2video` with `kling-v3` = verified
  - successful production-facing image input methods recorded for current use:
    - `image2video` -> top-level `image=<raw base64>`
    - Omni -> `image_list[].image_url=<raw base64>` with `type='first_frame'`
- **Out of scope for v1**: lip-sync, avatar, audio, advanced element workflows


### 필수
- `POST /v1/videos/omni-video`
- `POST /v1/videos/video-extend`
- callback protocol
- concurrency/rate-limit rules

### 문서상 추적하되 현재 기본정책에는 미동기화
- `POST /v1/videos/text2video` 또는 공식 Text to Video endpoint 문서 기준 경로
- `POST /v1/videos/image2video` 또는 공식 Image to Video endpoint 문서 기준 경로
- broader docs-corpus routes may still exist, but legacy/non-current families such as `POST /v1/videos/multi-image2video` should not remain in the current production-default interpretation
- 위 항목들은 documented families일 수 있으나, current scaffold의 default routing/support promise로 읽지 않는다

### 선택적(기본 비활성)
- `POST /v1/videos/motion-control`
- `POST /v1/videos/multi-elements/init-selection`
- `POST /v1/videos/effects`
- 이미지 계열 endpoint

### 제외(v1)
- lip sync
- avatar
- tts / voice clone
- virtual try-on

---

## 3. Scene 라우팅 규칙

## 3.1 Scene taxonomy
- `text_single`: 일반 단일 텍스트 기반 장면
- `text_multi_shot`: 하나의 장면 안에서 storyboard/multi-shot이 필요한 텍스트 장면
- `image_single`: 하나의 reference image로 시작하는 장면
- `image_multi_shot`: reference 기반 multi-shot 장면
- `continuity`: recurring character/object continuity를 최우선으로 하는 장면
- `extend`: 기존 video 결과를 연장하는 장면

## 3.2 Deterministic routing heuristics
- **Current stable scaffold policy**
  - `text_single` → 기본 `omni`
  - `text_multi_shot` → 기본 `omni` + `multi_shot=true`
  - `image_single` → 기본 `omni` image-anchor path 우선
  - `image_multi_shot` → 기본 `omni`
  - `continuity` → 기본 `omni` stronger reference path 우선
  - `extend` → `extend`
- **Deferred / caller-explicit only**
  - `text2video`
  - `image2video`
- Reason: current production-facing routing is intentionally narrower than the full docs corpus.
- Legacy/non-current routes such as `reference2video` should not remain in the active routing policy of the current scaffold.


### scene.type = `text`
- 현재 안정 기본값: OmniVideo
- Text to Video는 caller-explicit / later verification 전용

### scene.type = `image_ref`
- 현재 production 기본값은 intent에 따라 둘로 분리한다:
  - Omni reference-first path 필요 시: `kling-v3-omni` + `image_list[].image_url`
  - non-Omni image2video path 필요 시: `kling-v3` + top-level `image` / optional `image_tail`
- 공통 입력 원칙: source-of-truth asset은 우리 쪽에 두고, 요청에는 우리가 보유한 실제 이미지 attachment(raw base64)를 보낸다
- remote image URL은 upstream contract가 허용하는 값 형태일 수 있지만, 이 repo의 기본 production asset strategy로 두지 않는다
- verified input forms:
  - Omni -> `image_list[].image_url=<raw base64>` with `type='first_frame'`
  - image2video -> top-level `image=<raw base64>`

### scene.type = `continuity`
- 현재 안정 기본값: Omni stronger reference path
- Reference to Video는 caller-explicit / later verification 전용
- 목적: recurring character/object consistency

### scene.extend = true
- Extend Video 사용
- 단, 원본 clip 품질이 통과한 경우에만 적용

---

## 4. 내부 데이터 모델

### Project
```json
{
  "project_id": "proj_001",
  "title": "string",
  "topic": "string",
  "script_version": "v1",
  "created_at": 0
}
```

### Scene
```json
{
  "scene_id": "scene_001",
  "project_id": "proj_001",
  "order": 1,
  "narration": "string",
  "visual_prompt": "string",
  "target_duration": 5,
  "continuity_group": "host_a",
  "scene_type": "text|image_ref|continuity",
  "reference_assets": []
}
```

### KlingTask
```json
{
  "task_id_internal": "kt_001",
  "scene_id": "scene_001",
  "endpoint_type": "omni|text2video|image2video|extend",
  "request_payload": {},
  "kling_task_id": "string",
  "request_id": "string",
  "task_status": "submitted|processing|succeed|failed",
  "callback_received_at": null,
  "final_unit_deduction": null,
  "parent_video_id": null,
  "parent_video_url": null,
  "parent_video_duration": null,
  "created_at": 0,
  "updated_at": 0
}
```

### Asset
```json
{
  "asset_id": "asset_001",
  "scene_id": "scene_001",
  "source_task_id": "kt_001",
  "asset_type": "video|image|thumbnail",
  "original_url": "string",
  "watermark_url": "string",
  "local_path": "string",
  "duration": 5,
  "created_at": 0
}
```

---

## 5. 요청 래퍼 인터페이스

### 5.1 공통 create_task 인터페이스
```python
def create_kling_task(
    endpoint_type: str,
    payload: dict,
    callback_url: str,
    external_task_id: str,
) -> dict:
    ...
```

공통 처리:
- Authorization header 주입
- callback_url 주입
- external_task_id 주입
- request/response raw JSON 저장
- 실패 시 표준 에러 포맷으로 변환

### 5.2 endpoint_type 허용값
- `omni`
- `text2video`
- `image2video`
- `extend`

### 5.3 endpoint 매핑 테이블
```json
{
  "omni": "/v1/videos/omni-video",
  "extend": "/v1/videos/video-extend",
  "motion_control": "/v1/videos/motion-control",
  "multi_elements": "/v1/videos/multi-elements/init-selection",
  "video_effects": "/v1/videos/effects"
}
```

주의:
- Text to Video / Image to Video 실제 path 문자열은 `kling-api-reference.md`의 해당 문서 구간을 기준으로 구현 직전 최종 고정한다.
- 현재 설계에서는 capability slot을 먼저 고정하고, path는 wrapper 상수로 분리한다.
- 다만 non-Omni endpoint는 current scaffold의 stable default promise가 아니라 documented/provisional slot로 유지한다.

---

## 6. Callback 처리 규칙

### 6.1 기본 원칙
- 정상 운영은 callback-first
- polling은 fallback only

### 6.2 callback payload 저장
저장 대상:
- raw payload
- task_status
- task_status_msg
- final_unit_deduction
- task_result.images[] / task_result.videos[]
- parent_video 정보 (`id`, `url`, `duration`)

### 6.3 상태 전이
- `submitted` → `processing` → `succeed|failed`

### 6.4 성공 처리
- URL 즉시 다운로드
- asset row 생성
- local path 기록
- scene completion 여부 재평가

### 6.5 실패 처리
- task_status_msg 저장
- 실패 카테고리 분류:
  - moderation
  - transient
  - malformed request
  - quota/concurrency
- 재시도 정책 결정

---

## 7. 스케줄러 / concurrency 규칙

### 7.1 운영 원칙
- QPS보다 active concurrency가 중요
- create task만 concurrency 점유
- query는 상대적으로 자유

### 7.2 스케줄러 동작
```text
if active_tasks < concurrency_limit:
    dispatch next scene
else:
    wait for callback completion
```

### 7.3 권장 구현
- in-memory counter + durable DB state
- scene batch dispatch
- callback completion 시 slot release
- timeout 감시 워커 별도 운영

---

## 8. QC / Retry 규칙

### QC 체크
- 결과 clip 존재 여부
- duration 정상 여부
- video decode 가능 여부
- prompt mismatch 심각 여부
- continuity 손상 여부

### Retry 규칙
1. transient 실패 → 동일 endpoint 재시도
2. quality 실패 → prompt 조정 후 재시도
3. continuity 실패 → omni stronger reference strategy, better anchors, or later element workflows로 승격
4. 길이 부족 → extend 사용

---

## 9. v1 디렉토리 제안
```text
youtube-automation/
  docs/current/
    kling-api-reference.md
    youtube-automation-v1-kling-spec.md
  data/
    projects/
    scenes/
    tasks/
    assets/
  outputs/
    clips/
    renders/
  logs/
    callbacks/
    requests/
```

---

## 10. 구현 순서
1. endpoint constants 정의
2. common create_task wrapper 작성
3. callback receiver 작성
4. task/asset persistence 작성
5. scene router 작성
6. scheduler 작성
7. QC/retry 작성
8. final stitching 연결

---

## 11. 현재 보류 사항
- Text to Video / Image to Video 공식 path 상수 최종 고정
- 실제 auth/token 흐름 검증
- 실제 model_name 허용값 검증
- 실제 rate-limit 수치 검증
- billing deduction 실측

이 항목들은 구현 직전/초기 통합 테스트에서 확인한다.


## 5.4 Query strategy
- callback-first 유지
- Query Single/List는 fallback recovery용 정식 deliverable로 본다
- 다만 exact path는 live smoke test 전까지 provisional
- recovery, timeout handling, backfill은 query 계층이 확정된 뒤 연결

## 5.5 Callback receiver modes
- **Permissive debug mode**: 첫 live callback 관찰용
- **Shared-secret local mode**: self-proxy / local hardening용
- Kling의 공식 callback signature/header는 live observation 후 고정


## text2video/image2video/extend + historical legacy-route notes (2026-03-29)
- text2video: FAIL
- image2video: FAIL
- reference2video: FAIL
- query_lists: PASS
- extend_video: FAIL

> Interpretation rule: this section is evidence history, not the routing policy. Historical PASS/FAIL logs must not be over-read as current synchronized support promises for non-Omni defaults.


## Deep contract probe results (2026-03-29)
- text2video_probe PASS via payload: {'prompt': 'A calm sunrise over the ocean, cinematic, realistic', 'model_name': 'kling-v1', 'duration': '5', 'mode': 'std', 'aspect_ratio': '16:9'}
- image2video_probe PASS via payload: {'image': 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+y3ioAAAAASUVORK5CYII=', 'prompt': 'Subtle cinematic camera motion, realistic', 'model_name': 'kling-v1-6', 'duration': '5', 'mode': 'std', 'aspect_ratio': '16:9'}
- reference2video_probe PASS via payload: {'image_list': [{'image': 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+y3ioAAAAASUVORK5CYII='}], 'prompt': 'Keep subject consistent with subtle camera motion', 'model_name': 'kling-v1-6', 'duration': '5', 'mode': 'std', 'aspect_ratio': '16:9'}
- extend_video_probe fail: {"code":1201,"message":"This video not supported extend-video","request_id":"dcdc71ba-ffee-4586-8b00-9bdb5a797e4d"}


## 2026-03-29 final verification round
- text2video_5s: PASS
- image2video_5s: PASS
- reference2video_5s: PASS
- omni_multishot_15s: FAIL
- callback files observed: 0

> Phase 2 implementation note: these historical non-Omni passes do not by themselves settle current endpoint-specific model policy. The safe code-sync default remains Omni-first; non-Omni model selection and non-Omni default routing should stay explicit/provisional until later verification narrows them further.


## 2026-03-29 retry round after valid image + multishot validator fix
- image2video_valid_image: PASS
- reference2video_valid_image: PASS
- omni_multishot_15s_retry: FAIL
- callback externally reachable configured: NO


## 2026-03-29 contract refinements from deeper live probing
- image2video_img256_png: failed (Image pixel is invalid)
- image2video_img512_png: submitted ()
- image2video_img512_jpg: submitted ()
- reference2video_img256_png: processing ()
- reference2video_img512_png: processing ()
- reference2video_img512_jpg: processing ()
- omni_multishot_probe_3: create/query succeeded after adding index fields to multi_prompt
- omni_multishot_probe_3: create/query succeeded after adding index fields to multi_prompt

## 2026-03-29 update — Omni first-frame structure confirmed
- Confirmed by live probing:
  - `POST /v1/videos/omni-video` accepts `image_list[].image_url + type='first_frame'`
  - task completed successfully and video artifact was downloaded
- Rejected live:
  - `image_list[].image`
  - `image_list[].image + index`
- Therefore the current scaffold should treat `image_list[].image_url + type='first_frame'|'end_frame'` as the leading Omni image-reference contract.

## 2026-03-30 production policy refinement
- production model pair for this repo:
  - non-Omni paths: `kling-v3`
  - Omni paths: `kling-v3-omni`
- `kling-video-o1` is not the current 3.0 base production model in this repo anymore
- production input rule for image assets:
  - keep source-of-truth assets on our side
  - send actual image attachment payloads we control (raw base64) in requests by default
  - do not make remote image URLs the default production dependency for image inputs
- preserve endpoint-specific image field shapes:
  - `image2video` -> top-level `image` / optional `image_tail`
  - Omni -> `image_list[].image_url`
