# Kling continuity test matrix

Status: planned experiment matrix
Captured: 2026-03-29
Purpose: systematically test which workflow patterns improve continuity, subject stability, and shot-boundary smoothness.

## Why this exists
Our live tests proved that multi-shot works, but they did not yet reproduce the continuity quality implied by Kling's strongest reference materials.
This matrix is designed to answer one practical question:

**What combination of shot design + reference strength gives the best continuity for our pipeline?**

---

## Evaluation rubric
Each test should be scored 1–5 on these dimensions:

### 1. Identity continuity
- Does the same subject remain recognizably the same across all shots?

### 2. Scene continuity
- Does the environment feel like the same place/time/lighting context?

### 3. Camera continuity
- Do camera transitions feel like a natural progression rather than a reset?

### 4. Boundary visibility
- How noticeable is the shot cut/jump at the transition points?
- Higher score = less visible / less disruptive

### 5. Overall production usability
- Could this output be shipped or edited into a real pipeline without embarrassment?

### Suggested summary metric
`continuity_score = (identity + scene + camera + boundary + usability) / 5`

---

## Shared test principles
These apply unless a row explicitly tests the opposite.

- same environment
- same subject
- same time of day
- same direction of motion
- same overall emotional tone
- adjacent shots should have high semantic overlap
- no unnecessary angle reversals

---

## Phase 1 — shot-delta sensitivity
Goal: find how much shot-to-shot change the model can tolerate before continuity breaks.

### Test C1 — minimal shot delta
- Workflow: text-only multi-shot
- Scene: same beach, same subject, same side-follow motion
- Shot progression:
  - shot 1 = medium side-follow
  - shot 2 = slightly closer side-follow
  - shot 3 = slightly closer side-follow
- Purpose: baseline best-case continuity attempt

### Test C2 — moderate shot delta
- Workflow: text-only multi-shot
- Scene: same beach, same subject
- Shot progression:
  - shot 1 = medium-wide
  - shot 2 = medium
  - shot 3 = medium-close
- Purpose: find whether moderate progression is acceptable

### Test C3 — aggressive shot delta
- Workflow: text-only multi-shot
- Scene: same beach, same subject
- Shot progression:
  - shot 1 = wide
  - shot 2 = medium
  - shot 3 = close-up / angle shift
- Purpose: measure where boundary disruption becomes obvious

---

## Phase 2 — anchoring strength comparison
Goal: test whether stronger references materially improve continuity.

### Test R1 — text-only baseline
- Workflow: text-only multi-shot
- Scene concept held constant
- Purpose: comparison baseline

### Test R2 — single-image anchored
- Workflow: image-to-video or image-anchored omni workflow
- Input: one strong reference still
- Purpose: test whether one anchor reduces subject/scene drift

### Test R3 — multi-image reference
- Workflow: reference-to-video / multi-image reference
- Input: 2–4 consistent images of same subject
- Purpose: test whether richer still references improve continuity over R2

### Test R4 — video-reference / element-driven
- Workflow: Omni with strongest available subject binding path
- Input: short subject video clip if accessible
- Purpose: test whether marketed Omni strength appears only with video-reference-level anchoring

---

## Phase 3 — subject type comparison
Goal: identify whether continuity is easier for some subject classes than others.

### Test S1 — human subject
- One person walking in stable environment
- Purpose: baseline human continuity

### Test S2 — product subject
- One branded object / product in stable environment
- Purpose: test whether product identity locks better than humans

### Test S3 — no-character scenic shot
- Pure environment / no person
- Purpose: isolate camera continuity from character identity drift

---

## Phase 4 — environment complexity comparison
Goal: see whether simpler scenes stabilize better.

### Test E1 — simple environment
- Beach / desert road / empty studio
- Purpose: easiest continuity conditions

### Test E2 — medium environment
- Cafe / library / room interior
- Purpose: moderate visual complexity

### Test E3 — complex environment
- Busy street / crowd / many moving background elements
- Purpose: determine failure boundary under production-like complexity

---

## Phase 5 — motion-vector continuity
Goal: see whether directional movement continuity is a strong determinant.

### Test M1 — forward movement only
- Subject keeps moving in same direction across all shots

### Test M2 — lateral tracking only
- Camera tracks from same side all the way through

### Test M3 — directional conflict
- Introduce subtle direction/angle contradiction on purpose
- Purpose: measure how fragile continuity is to motion inconsistency

---

## Next critical experiments
These experiments are explicitly designated as the next core quality-reproduction work, but deferred for later execution.

- C1 — minimal shot delta continuity baseline
- C2 — moderate shot delta continuity test
- R2 — single-image anchored continuity test
- R3 — multi-image reference continuity test
- R4 — video-reference / element-driven continuity test

Current status: deferred intentionally; keep as priority experiments for the next quality-focused phase.

## Priority order
If budget is limited, run in this order:

1. C1 — minimal shot delta baseline
2. C2 — moderate shot delta
3. R2 — single-image anchored
4. R3 — multi-image reference
5. R4 — video-reference / element-driven
6. S3 — scenic no-character control
7. C3 — aggressive shot delta stress test

---

## Recommended data capture for every run
Store all of the following:
- exact payload
- model name
- duration
- mode
- query response
- final video URL
- local downloaded result
- user judgment notes
- continuity rubric scores
- cost / final unit deduction

---

## Naming convention for runs
Use names that make the experiment readable later.

Examples:
- `cont_c1_text_beach_sidefollow_min-delta_v1`
- `cont_r2_single-image_beach_walk_v1`
- `cont_r3_multi-image_beach_walk_v1`
- `cont_s3_scenic_ocean_min-delta_v1`

---

## Decision threshold
Treat a workflow as pipeline-worthy only if:
- continuity_score >= 4.0
- no major identity break
- no major scene reset
- shot boundaries are noticeable at most mildly

Anything below that stays in experimental status.

---

## Current working hypothesis
Most likely route to materially better results:
1. small shot deltas
2. strong subject anchoring
3. structured chronological prompting
4. simple environments first
5. video-reference / element workflows for premium continuity goals

## Spend-control execution rule
Before running any matrix row:
- assume each create is billable
- do not run multiple candidate variants for the same row without explicit approval
- if one candidate already succeeds, stop and evaluate that output first
- separate cheap documentation narrowing from live billable experimentation


## 2026-03-29 policy update
For production design:
- one scene should fit inside one clip (up to 15 seconds)
- multi-clip chaining should no longer be treated as the primary single-scene continuity strategy
- continuity testing across multiple clips is still useful for scene-to-scene transitions, not for pretending several clips are one scene