# YouTube 자동화 파이프라인: 프롬프트 엔지니어링 설계서

> 작성일: 2026-03-27  
> 목적: Kling AI 기반 YouTube 롱폼 자동화 파이프라인의 LLM 프롬프트 최적 설계

---

## 1. 리서치 요약: 핵심 인사이트

### 1.1 좋은 vs 나쁜 비디오 프롬프트

| 기준 | 나쁜 프롬프트 ❌ | 좋은 프롬프트 ✅ |
|------|----------------|----------------|
| 구체성 | "a woman crying" | "Close-up, a young woman (20s, worn leather jacket) with a single tear, dramatic rim lighting, ultra-photorealistic, 8K, cinematic color grade" |
| 카메라 지시 | 없음 | "slow push-in", "static medium shot", "45-degree rack focus" |
| 스타일 일관성 | 매 씬마다 다름 | 모든 씬에 동일 style prefix 포함 |
| 길이 | 단어 몇 개 | 1-2문장, 세부 요소 명시 |
| 금지 요소 | 없음 | "no text, no watermark, no cartoon" 명시 |

### 1.2 시각적 일관성 핵심 전략 (2026 베스트 프랙티스)

1. **Character DNA 패턴**: 인물 묘사를 복붙 가능한 "DNA 태그"로 고정 → 모든 씬에 동일하게 삽입
2. **Style Prefix 고정**: 모든 씬 앞에 동일한 시각 스타일 prefix 삽입
3. **Negative Prompt 공통화**: 제거할 요소를 채널 단위로 고정
4. **Shot Language 체계화**: 카메라 움직임을 표준 영화 용어로 통일

### 1.3 씬 유형별 프롬프트 구조

각 씬 타입은 다른 "anchor point"가 필요:
- **인물 씬**: 캐릭터 DNA + 감정/표정 + 카메라 거리
- **배경 씬**: 환경 묘사 + 시간대/날씨 + 카메라 무브먼트  
- **개념 씬**: 비유적 시각화 + 추상 → 구체 변환 지시
- **데이터 씬**: 시각적 메타포 (그래프 금지, 자연스러운 비주얼로 치환)
- **역사 씬**: 시대적 질감 + 색감 스타일 + 시대 고증 힌트

---

## 2. 핵심 설계: script → video_prompt 변환기

### 2.1 최적화된 SCENE_TO_VIDEO_PROMPT

```python
SCENE_TO_VIDEO_PROMPT = """
You are a professional cinematographer and AI video prompt engineer specializing in Kling AI.

## YOUR ROLE
Convert script scene descriptions into precise, cinematic video generation prompts.
Every prompt you write MUST maintain visual consistency across the entire video.

## VISUAL STYLE LOCK (apply to EVERY scene without exception)
{visual_style_prefix}

## CHANNEL CONTEXT
{style_guide}

## CURRENT SCENE
- Type: {scene_type}
- Script content: {scene_description}
- Emotion/tone: {emotion}
- Duration: {duration}s
- Previous scene: {prev_scene_summary}

## SCENE-TYPE RULES

### PERSON/ACTION scenes:
→ Include: character description, specific body language, facial expression, camera distance (close-up/medium/wide), camera movement

### ENVIRONMENT/PLACE scenes:
→ Include: location details, time of day, atmospheric conditions, establishing or mood shot, slow camera reveal

### CONCEPT VISUALIZATION (abstract content):
→ Translate abstract ideas into concrete visual metaphors
→ Use: symbolic objects, flowing movements, macro/micro scale shifts
→ NEVER use charts, text overlays, or UI elements

### DATA/STATISTICS:
→ Convert numbers into visual scale comparisons
→ Example: "1 billion" → "a vast crowd stretching to the horizon"
→ Use: crowd shots, architectural scale, natural phenomena

### HISTORICAL SCENES:
→ Include: era-specific visual texture (film grain for old, color tone matching era)
→ Use period-accurate colors: sepia/muted for pre-1960s, saturated for 70s-80s
→ Add: subtle cinematic imperfection appropriate to era

## OUTPUT RULES
1. Write in English only
2. Maximum 2 sentences, under 250 characters total
3. ALWAYS start with the visual style prefix
4. Include ONE specific camera movement from this list:
   [slow push-in | pull back | static shot | slow pan left/right | 
    tilt up/down | handheld drift | aerial descent | rack focus]
5. Include lighting description (golden hour / dramatic rim / soft diffused / cool moonlight / etc.)
6. End with forbidden elements: ", no text, no watermarks, no logos, no subtitles"
7. Output ONLY the prompt. No explanation. No labels. No quotes.

## EXAMPLE OUTPUT
Cinematic documentary style, warm color grading, 4K — elderly fisherman (weathered face, worn yellow raincoat) stares at a gray storm horizon, static wide shot with subtle tilt up toward darkening sky, dramatic side lighting casting long shadows on wet dock planks, no text, no watermarks, no logos, no subtitles.
"""
```

### 2.2 배치 처리용 MULTI_SCENE_PROMPT (씬 여러 개 동시 처리)

```python
MULTI_SCENE_BATCH_PROMPT = """
You are a professional AI video prompt engineer. 
Process ALL scenes below and output EXACTLY one prompt per scene.

## GLOBAL VISUAL STYLE (apply to ALL scenes)
{visual_style_prefix}
Negative elements to exclude from ALL scenes: {negative_prompt}

## CHANNEL STYLE
{style_guide}

## SCENES TO PROCESS
{scenes_json}
// Format: [{"id": "scene_01", "type": "...", "content": "...", "emotion": "...", "duration": 5}]

## OUTPUT FORMAT (strict JSON)
Output a JSON array only. No other text.
[
  {
    "scene_id": "scene_01",
    "prompt": "... your cinematic prompt here ..."
  }
]

## CONSISTENCY RULES
- Every prompt MUST begin with the GLOBAL VISUAL STYLE prefix
- Camera movements should create logical visual flow between scenes
- Alternate between close-ups and wide shots for visual rhythm
- Maintain consistent color temperature across all prompts
"""
```

---

## 3. settings.json visual_style 필드 설계

### 3.1 기본 구조

```json
{
  "visual_style": {
    "prefix": "cinematic documentary style, warm color grading, 4K ultra-sharp, shallow depth of field",
    "negative": "text, watermark, logo, subtitles, cartoon, anime, blur, distortion, nsfw, painting",
    "aspect_ratio": "16:9",
    "camera_style": "professional documentary cinematography",
    
    "mood_variants": {
      "dramatic": "high contrast, deep shadows, desaturated highlights",
      "hopeful": "soft golden hour, warm tones, gentle lens flare",
      "tense": "cold blue tint, tight framing, shallow focus",
      "neutral": "balanced exposure, natural color, medium contrast"
    },
    
    "era_overrides": {
      "historical_pre1950": "black and white, heavy grain, high contrast, vignette edges",
      "historical_1960s80s": "warm faded film look, slight desaturation, 16mm grain",
      "modern": "clean digital, accurate colors, 4K sharpness",
      "future": "cool blue teal, high tech ambient glow, clean lines"
    }
  }
}
```

### 3.2 채널별 프리셋 예시

```json
{
  "channel_presets": {
    "documentary": {
      "prefix": "cinematic documentary 4K, natural lighting, photorealistic, shallow depth of field",
      "negative": "text, watermark, logo, cartoon, anime, CGI, artificial lighting",
      "camera_style": "handheld naturalistic with steady moments"
    },
    "history": {
      "prefix": "cinematic historical drama, film grain, desaturated warm tones, era-accurate",
      "negative": "text, watermark, modern objects, smartphones, contemporary clothing",
      "camera_style": "slow deliberate movements, wide establishing shots"
    },
    "science": {
      "prefix": "cinematic science documentary 4K, clean cool lighting, macro detail shots",
      "negative": "text overlays, watermark, cartoon diagrams, anime style",
      "camera_style": "smooth technical movement, precision framing"
    },
    "finance": {
      "prefix": "cinematic business documentary, sharp 4K, corporate clean aesthetic, natural light",
      "negative": "text, charts, graphs, data visualizations, watermark, cartoon",
      "camera_style": "professional steady cam, boardroom aesthetics"
    }
  }
}
```

---

## 4. 씬 유형별 최적 프롬프트 패턴 (실전 예시)

### 4.1 인물/행동 씬

**입력:** 주인공이 계약서에 서명하는 장면, 긴장감 있는 감정

**최적 프롬프트 구조:**
```
{visual_style_prefix} — [character description with 2-3 defining traits], 
[specific action with physical detail], [camera: tight over-shoulder push-in], 
[lighting: cool blue office fluorescent, harsh shadows], 
no text, no watermarks, no logos, no subtitles.
```

**실제 예시:**
```
Cinematic documentary style, warm color grading, 4K — middle-aged man (gray suit, tense jaw) 
slowly moves pen toward contract on mahogany desk, tight over-shoulder push-in showing 
hesitating hand, cold fluorescent lighting emphasizing shadows under eyes, 
no text, no watermarks, no logos, no subtitles.
```

---

### 4.2 배경/장소 씬

**입력:** 2008년 뉴욕 금융 위기 당시 월스트리트 거리

**최적 프롬프트 구조:**
```
{visual_style_prefix}, {era_override} — [location with era markers], 
[atmospheric details: crowd/emptiness, weather], [camera: wide establishing slow pan], 
[ambient lighting: time of day + mood], 
no text, no watermarks, no logos, no subtitles.
```

**실제 예시:**
```
Cinematic documentary style, warm desaturated film look, 4K — 
wide view of Wall Street in autumn 2008, scattered newspapers and worried pedestrians in suits, 
slow pan left across stone facades and NYSE entrance, overcast gray sky, 
cold diffused midday light emphasizing the somber atmosphere, 
no text, no watermarks, no logos, no subtitles.
```

---

### 4.3 개념 시각화 (추상적 내용)

**입력:** "부채가 눈덩이처럼 불어난다" 개념

**최적 프롬프트 구조:**
```
{visual_style_prefix} — [concrete visual metaphor for abstract concept], 
[movement that shows growth/change/tension], [camera: dramatic pull back or aerial], 
[lighting reinforcing concept tone], 
no text, no watermarks, no logos, no subtitles.
```

**실제 예시:**
```
Cinematic documentary style, warm color grading, 4K — 
massive rolling snowball gathering size down an endless white mountain slope, 
aerial pull back revealing the scale dwarfing pine forests below, 
cold winter sunlight with hard shadows creating sense of inevitability, 
no text, no watermarks, no logos, no subtitles.
```

---

### 4.4 데이터/통계 표현

**입력:** "전 세계 70억 명 중 10%만 깨끗한 물에 접근 가능"

**규칙:** 숫자 → 시각적 규모 비교로 변환. 그래프/차트/숫자 텍스트 절대 금지

**최적 프롬프트 구조:**
```
{visual_style_prefix} — [visual metaphor for scale: crowds, landscapes, containers], 
[foreground/background contrast showing proportion], [camera: wide shot with selective focus], 
[lighting: realistic field/environment], 
no text, no watermarks, no logos, no subtitles.
```

**실제 예시:**
```
Cinematic documentary style, warm color grading, 4K — 
vast crowd of people stretching to the horizon under harsh dry sunlight, 
camera slowly tilts down to small group in foreground holding clean water jugs, 
golden late afternoon light making the contrast between abundance and scarcity visceral, 
no text, no watermarks, no logos, no subtitles.
```

---

### 4.5 역사적 장면 재현

**입력:** 1969년 아폴로 11호 달 착륙 당시 관제 센터 장면

**최적 프롬프트 구조:**
```
{visual_style_prefix}, [era_visual_treatment] — [location with era-accurate details], 
[characters in period-accurate attire], [camera: wide or medium slow push-in], 
[era-appropriate film quality: grain, color shift], 
no text, no watermarks, no logos, no subtitles.
```

**실제 예시:**
```
Cinematic historical drama, warm faded 16mm film look, slight desaturation, 4K — 
1969 NASA mission control room packed with men in white short-sleeved shirts and crew cuts, 
staring at large green-tinted monitors, slow push-in over shoulders toward central screen, 
practical fluorescent overhead lighting with warm tungsten desk lamps, heavy analog aesthetic, 
no text, no watermarks, no logos, no subtitles.
```

---

## 5. 파이썬 구현 코드 (실제 사용 가능)

```python
import json
from string import Formatter

# ─────────────────────────────────────────────
# 핵심 프롬프트 템플릿
# ─────────────────────────────────────────────

SCENE_TO_VIDEO_PROMPT = """You are a professional cinematographer and AI video prompt engineer specializing in Kling AI.

## VISUAL STYLE LOCK (apply to EVERY scene)
{visual_style_prefix}

## CHANNEL CONTEXT  
{style_guide}

## CURRENT SCENE
- Type: {scene_type}
- Script content: {scene_description}
- Emotion/tone: {emotion}
- Duration: {duration}s

## SCENE-TYPE RULES
PERSON/ACTION → Include character description, body language, facial expression, camera distance, movement
ENVIRONMENT/PLACE → Include location details, time of day, atmosphere, slow camera reveal
CONCEPT (abstract) → Translate to concrete visual metaphor, NO charts/text overlays
DATA/STATISTICS → Convert numbers to visual scale comparisons (crowds, landscapes, objects)
HISTORICAL → Era-specific film texture and color, period-accurate visual elements

## OUTPUT RULES
1. English only
2. Max 2 sentences, under 250 characters
3. MUST start with visual_style_prefix
4. Include ONE camera movement: [slow push-in | pull back | static shot | slow pan | tilt up/down | handheld drift | aerial descent | rack focus]
5. Include lighting description
6. End with: ", no text, no watermarks, no logos, no subtitles"
7. Output ONLY the prompt. No labels. No quotes. No explanation.
"""


def load_visual_style(settings_path: str) -> dict:
    """settings.json에서 visual_style 로드"""
    with open(settings_path) as f:
        settings = json.load(f)
    return settings.get("visual_style", {})


def build_style_prefix(visual_style: dict, scene_emotion: str = None, era: str = None) -> str:
    """감정/시대에 따라 동적으로 스타일 prefix 생성"""
    base = visual_style.get("prefix", "cinematic documentary style, 4K")
    
    # 시대 오버라이드 적용
    if era and era in visual_style.get("era_overrides", {}):
        base = visual_style["era_overrides"][era]
    
    # 감정 변형 적용
    if scene_emotion and scene_emotion in visual_style.get("mood_variants", {}):
        mood_add = visual_style["mood_variants"][scene_emotion]
        base = f"{base}, {mood_add}"
    
    return base


def generate_scene_prompt(
    scene_description: str,
    scene_type: str,
    emotion: str,
    duration: int,
    style_guide: str,
    visual_style: dict,
    era: str = None,
    llm_client = None  # OpenAI / Anthropic client
) -> str:
    """단일 씬의 비디오 프롬프트 생성"""
    
    visual_style_prefix = build_style_prefix(visual_style, emotion, era)
    
    prompt = SCENE_TO_VIDEO_PROMPT.format(
        visual_style_prefix=visual_style_prefix,
        style_guide=style_guide,
        scene_type=scene_type,
        scene_description=scene_description,
        emotion=emotion,
        duration=duration
    )
    
    # LLM 호출 (실제 사용 시 client 주입)
    if llm_client:
        response = llm_client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=300,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text.strip()
    
    return prompt  # 테스트용: 프롬프트 자체 반환


def generate_all_scene_prompts(
    scenes: list[dict],
    style_guide: str,
    visual_style: dict,
    llm_client = None
) -> list[dict]:
    """
    전체 씬 배치 처리
    
    scenes 형식:
    [
        {
            "id": "scene_01",
            "type": "person",  # person/environment/concept/data/historical
            "content": "주인공이 빌딩 앞에서 멈춰서 위를 올려다본다",
            "emotion": "dramatic",  # dramatic/hopeful/tense/neutral
            "duration": 5,
            "era": None  # "historical_pre1950" | "historical_1960s80s" | None
        }
    ]
    """
    results = []
    
    for scene in scenes:
        video_prompt = generate_scene_prompt(
            scene_description=scene["content"],
            scene_type=scene.get("type", "environment"),
            emotion=scene.get("emotion", "neutral"),
            duration=scene.get("duration", 5),
            style_guide=style_guide,
            visual_style=visual_style,
            era=scene.get("era"),
            llm_client=llm_client
        )
        results.append({
            "scene_id": scene["id"],
            "prompt": video_prompt,
            "duration": scene["duration"],
            "negative_prompt": visual_style.get("negative", ""),
            "aspect_ratio": visual_style.get("aspect_ratio", "16:9")
        })
    
    return results
```

---

## 6. settings.json 완성 예시

```json
{
  "channel_name": "역사 다큐멘터리 채널",
  "style_guide": "Educational documentary tone, PBS/Netflix style, serious but accessible",
  
  "visual_style": {
    "prefix": "cinematic documentary style, warm color grading, 4K ultra-sharp, shallow depth of field, film-like texture",
    "negative": "text, watermark, logo, subtitles, cartoon, anime, blur, distortion, nsfw, CGI plastic look, overexposed",
    "aspect_ratio": "16:9",
    "camera_style": "professional documentary cinematography with deliberate movement",
    
    "mood_variants": {
      "dramatic": "high contrast, deep shadows, desaturated highlights, intense",
      "hopeful": "soft golden hour light, warm tones, gentle lens flare, uplifting",
      "tense": "cold blue tint, tight framing, shallow focus, unsettling",
      "sad": "muted cool tones, overcast lighting, slow motion implication",
      "neutral": "balanced exposure, natural color, medium contrast, observational"
    },
    
    "era_overrides": {
      "historical_pre1950": "black and white, heavy grain, high contrast, vignette edges, classic photography aesthetic",
      "historical_1960s80s": "warm faded film look, slight desaturation, 16mm grain, vintage color cast",
      "historical_1990s2000s": "slightly desaturated, VHS-adjacent sharpness, cooler tone",
      "modern": "clean digital, accurate colors, 4K sharpness, professional grade",
      "future": "cool blue teal ambient, high tech glow, clean precise lines"
    }
  },
  
  "kling_settings": {
    "model": "kling-v1-5",
    "duration": 5,
    "cfg_scale": 0.5,
    "mode": "std"
  }
}
```

---

## 7. 주의사항 및 운영 가이드

### 7.1 Kling AI 특화 최적화 포인트

1. **캐릭터 일관성 → Character DNA 태그 방식**  
   인물이 여러 씬에 등장하면 동일한 묘사 태그를 모든 씬에 복붙해야 함  
   예: `"Alex (30s male, dark blue suit, thick-rimmed glasses, determined look)"`

2. **200~250자 제한 준수**  
   너무 짧으면 품질 불안정, 너무 길면 AI가 일부 무시 → 200-250자가 스위트스팟

3. **Negative Prompt 분리 필드**  
   Kling API는 negative prompt를 별도 필드로 지원 → `visual_style.negative`를 그대로 사용

4. **씬 전환 연속성**  
   이전 씬 마지막 카메라 위치에서 시작하는 느낌을 다음 씬 첫 라인에 암시  
   예: 이전 씬이 클로즈업으로 끝나면 → 다음 씬은 풀백 또는 와이드 오프닝

5. **동일 seed 사용 (일관성 강화)**  
   가능하면 채널 내 모든 씬에 동일 seed 값 적용  
   → Kling API `seed` 파라미터에 고정값 지정

### 7.2 흔한 실수 피하기

| 실수 | 결과 | 해결책 |
|------|------|--------|
| 추상적 설명만 사용 | AI가 랜덤 해석 → 의도와 다른 영상 | 구체적 물체/행동/장소로 변환 |
| 스타일 prefix 누락 | 씬마다 다른 화풍 → 일관성 파괴 | 모든 프롬프트에 prefix 강제 |
| 텍스트 요소 요청 | 화면에 깨진 텍스트 노출 | negative에 "text, subtitle" 포함 |
| 감정만 묘사 | "슬픈 장면" → 해석 불명확 | 감정을 물리적 요소로 변환 |
| 너무 많은 요소 | AI가 충돌 요소 혼합 | 씬당 핵심 요소 3개로 제한 |

---

## 8. 빠른 시작: 채널 설정 체크리스트

```
□ settings.json에 visual_style 블록 추가
□ prefix: 채널 핵심 스타일 3-5개 키워드 (cinematic, color grade, quality)
□ negative: 제거할 요소 목록 (text, watermark, logo 는 필수)
□ mood_variants: 채널에서 자주 쓸 감정 3-5개 정의
□ era_overrides: 역사 콘텐츠라면 시대별 스타일 정의
□ LLM 연결: SCENE_TO_VIDEO_PROMPT 템플릿 사용
□ 배치 처리: generate_all_scene_prompts() 함수 연동
□ Kling API: negative_prompt 별도 필드, seed 고정값 설정
```

---

*이 문서는 실제 파이프라인에 직접 사용 가능한 완성본입니다.*  
*채널 스타일에 맞게 `settings.json`의 `visual_style.prefix`만 변경하면 범용으로 작동합니다.*
