M·06AI ENGINEERING2026.03.124 MIN READ

Practical Prompt Engineering: Getting Structured Output

프롬프트 엔지니어링 실제: 구조화된 출력 얻기

Beyond basic prompting — how to get reliably structured output from LLMs. System/user/assistant role design, few-shot examples, chain-of-thought, JSON mode, function calling, and building type-safe LLM responses with Zod and the AI SDK.

codemapo

INTERDISCIPLINARY DEV · SEOUL

Practical Prompt Engineering: Getting Structured Output

1. Prologue — "Why did it return something broken when I asked for JSON?"

First time wiring an LLM into production, you hit this quickly.

Prompt: "Analyze this review and return JSON.
         sentiment must be one of: positive/negative/neutral."

LLM Response:
"Of course! Here is the analysis in JSON format:

\`\`\`json
{
  "sentiment": "POSITIVE",  ← should be 'positive', not 'POSITIVE'
  "score": "high",           ← should be a number, not a string
  "issues": null             ← should be [] when empty
}
\`\`\`

The overall sentiment is quite positive!"  ← extra text after the JSON

JSON.parse() blows up in production. Field names come back different sometimes. Arrays arrive as null. This non-determinism is what makes devs hesitant to use LLMs for real production features (vs. quick demos).

This post is about fixing that.

2. Understanding the Role Structure

Modern LLM APIs organize conversations into three roles:

Role	Purpose	Written by
`system`	Sets model behavior and persona	Developer
`user`	User input, questions, requests	User or developer
`assistant`	Previous model responses (used in few-shot)	Model or developer

The Importance of System Prompts

The system prompt is the operating manual you give the LLM — "here's who you are, here's how you behave." Enforcing output format here dramatically improves consistency.

const systemPrompt = `
You are a sentiment analysis specialist for user reviews.

## Output Rules (MANDATORY)
- Output ONLY valid JSON. Include no other text whatsoever.
- Do NOT use JSON code blocks (\`\`\`json ... \`\`\`)
- Do NOT use markdown
- Do NOT add explanatory text

## JSON Schema
{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": number between 0.0 and 1.0,
  "key_phrases": string[] (max 3 items),
  "issues": string[] (empty array [] if none)
}
`;

This improves consistency, but still can't guarantee 100% compliance. The stronger methods are below.

3. Few-shot Prompting

Showing the model examples of what you want is far more effective than describing it.

const messages = [
  {
    role: "system" as const,
    content: "You are a review sentiment analyst. Output JSON only."
  },
  // Example 1: positive case
  {
    role: "user" as const,
    content: "Review: This product is amazing! Fast shipping and great quality."
  },
  {
    role: "assistant" as const,
    content: JSON.stringify({
      sentiment: "positive",
      confidence: 0.95,
      key_phrases: ["amazing product", "fast shipping", "great quality"],
      issues: []
    })
  },
  // Example 2: negative case
  {
    role: "user" as const,
    content: "Review: The packaging was terrible and the product arrived scratched."
  },
  {
    role: "assistant" as const,
    content: JSON.stringify({
      sentiment: "negative",
      confidence: 0.88,
      key_phrases: ["terrible packaging", "product scratched"],
      issues: ["poor packaging quality", "product damage"]
    })
  },
  // Example 3: edge case — neutral
  {
    role: "user" as const,
    content: "Review: It's okay I guess. Nothing special, nothing bad."
  },
  {
    role: "assistant" as const,
    content: JSON.stringify({
      sentiment: "neutral",
      confidence: 0.72,
      key_phrases: ["okay", "nothing special"],
      issues: []
    })
  },
  // Actual input
  {
    role: "user" as const,
    content: `Review: ${userReview}`
  }
];

The key to good few-shot: diversity of examples. Only showing happy paths fails at edge cases. Include negative, neutral, and cases with issues.

4. Chain-of-Thought — Give the Model Space to Think

CoT prompts the model to reason step-by-step before producing a final answer. For complex analysis or judgment tasks, accuracy improves substantially.

// Without CoT (simple classification)
const withoutCoT = `
Determine if this contract is valid.
Contract: "${contractText}"
Output: {"valid": boolean, "reason": string}
`;

// With CoT (step-by-step reasoning)
const withCoT = `
Analyze this contract's validity through these steps:

1. Parties: Are both parties clearly identified?
2. Purpose: Is the contract's purpose clear?
3. Obligations: Are rights and duties specified?
4. Legal requirements: Are signatures, dates, and legal formalities present?
5. Final judgment: Synthesize the above analysis.

Document your reasoning in the "analysis" field, then provide "valid" and "reason".

{
  "analysis": {
    "parties": "string",
    "purpose": "string",
    "obligations": "string",
    "legal_requirements": "string"
  },
  "valid": boolean,
  "reason": "string"
}

Contract: "${contractText}"
`;

The key: include the reasoning trace in the JSON output. This forces the model to actually work through the steps before concluding — and the trace stays in the response for debugging.

When Is CoT Needed?

Task Type	CoT Necessity	Examples
Simple classification	Low	Sentiment labeling
Information extraction	Low	Entity extraction
Complex reasoning	High	Contract analysis, code review
Numerical calculation	High	Cost estimation, formula application
Multi-step judgment	High	Medical triage, legal review

5. JSON Mode and Function Calling

Prompt engineering alone can't guarantee structured output 100% of the time. We need stronger mechanisms.

JSON Mode (OpenAI)

Pass response_format: { type: "json_object" } and the model is guaranteed to return valid JSON.

import OpenAI from 'openai';

const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  response_format: { type: 'json_object' },  // JSON mode
  messages: [
    {
      role: 'system',
      content: 'Analyze sentiment and return JSON. sentiment must be one of: positive/negative/neutral.'
    },
    {
      role: 'user',
      content: `Review: ${userReview}`
    }
  ]
});

// JSON.parse() no longer throws (malformed JSON is guaranteed not to happen)
const result = JSON.parse(response.choices[0].message.content!);

Caveat: valid JSON is guaranteed, but the schema (field names, types, value constraints) is not.

Structured Outputs (OpenAI)

Specify a JSON Schema explicitly and the model returns output matching that schema exactly.

const response = await openai.chat.completions.create({
  model: 'gpt-4o-2024-08-06',  // Structured Outputs supported model
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'sentiment_analysis',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          sentiment: {
            type: 'string',
            enum: ['positive', 'negative', 'neutral']  // constrained values!
          },
          confidence: {
            type: 'number',
            minimum: 0,
            maximum: 1
          },
          key_phrases: {
            type: 'array',
            items: { type: 'string' },
            maxItems: 3
          },
          issues: {
            type: 'array',
            items: { type: 'string' }
          }
        },
        required: ['sentiment', 'confidence', 'key_phrases', 'issues'],
        additionalProperties: false
      }
    }
  },
  messages: [...]
});

Function Calling

Function Calling is marketed as "the model can call functions" — but the real power is that it's the strongest way to get structured output.

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  tools: [
    {
      type: 'function',
      function: {
        name: 'analyze_sentiment',
        description: 'Analyzes the sentiment of a review text',
        parameters: {
          type: 'object',
          properties: {
            sentiment: {
              type: 'string',
              enum: ['positive', 'negative', 'neutral'],
            },
            confidence: {
              type: 'number',
              description: 'Classification confidence (0.0 to 1.0)'
            },
            key_phrases: {
              type: 'array',
              items: { type: 'string' },
              description: 'Key phrases (max 3)'
            },
            issues: {
              type: 'array',
              items: { type: 'string' },
              description: 'List of identified issues'
            }
          },
          required: ['sentiment', 'confidence', 'key_phrases', 'issues']
        }
      }
    }
  ],
  tool_choice: { type: 'function', function: { name: 'analyze_sentiment' } },
  messages: [
    { role: 'system', content: 'You are a review sentiment analyst.' },
    { role: 'user', content: `Review: ${userReview}` }
  ]
});

const toolCall = response.choices[0].message.tool_calls?.[0];
if (toolCall) {
  const result = JSON.parse(toolCall.function.arguments);
  console.log(result.sentiment); // always 'positive' | 'negative' | 'neutral'
}

6. Type-Safe Output with Zod + AI SDK

Combine the Vercel AI SDK (ai package) with Zod to get full TypeScript type safety.

import { openai } from '@ai-sdk/openai';
import { generateObject } from 'ai';
import { z } from 'zod';

// Define Zod schema
const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  key_phrases: z.array(z.string()).max(3),
  issues: z.array(z.string()),
});

// Type inference
type SentimentResult = z.infer<typeof SentimentSchema>;
// {
//   sentiment: "positive" | "negative" | "neutral";
//   confidence: number;
//   key_phrases: string[];
//   issues: string[];
// }

async function analyzeSentiment(review: string): Promise<SentimentResult> {
  const { object } = await generateObject({
    model: openai('gpt-4o-mini'),
    schema: SentimentSchema,
    prompt: `Analyze the sentiment of this review: "${review}"`,
    system: 'You are a review sentiment analysis specialist.'
  });

  // object is auto-inferred as SentimentResult
  // Zod validation runs automatically
  return object;
}

const result = await analyzeSentiment("Great product, fast shipping!");
console.log(result.sentiment); // TypeScript knows this is 'positive' | 'negative' | 'neutral'

generateObject uses Function Calling or Structured Outputs internally, then runs Zod validation on top. Type errors get caught at compile time, not runtime.

Complex Nested Schemas

const ProductExtractionSchema = z.object({
  products: z.array(z.object({
    name: z.string(),
    category: z.enum(['electronics', 'clothing', 'food', 'other']),
    price: z.number().positive().optional(),
    attributes: z.record(z.string()),  // dynamic key-value
  })),
  total_count: z.number().int().nonnegative(),
  confidence: z.number().min(0).max(1),
});

const { object } = await generateObject({
  model: openai('gpt-4o'),
  schema: ProductExtractionSchema,
  prompt: `Extract product info from this shopping list: "${shoppingList}"`,
});

// object.products is Array<{name: string; category: ...}> — fully typed

7. Common Failure Patterns and Fixes

Failure 1: Model ignores format instructions

Symptom: Asked for JSON only, got explanatory text attached

Cause: Weak system prompt, ambiguous instructions

Fix:
1. Explicitly enumerate prohibited behaviors in system prompt
2. Show desired format with few-shot examples
3. Use JSON mode or Function Calling

Failure 2: Inconsistent enum values

Symptom: "positive", "POSITIVE", "Positive", "pos" all appear

Cause: Possible values not clearly constrained in prompt

Fix:
- Use enum in JSON schema
- Use exactly the same values in few-shot examples
- Post-process to normalize (toLowerCase, etc.)

Failure 3: null vs empty array for optional fields

Symptom: issues is sometimes null, sometimes []

Cause: Model decides how to handle empty case on its own

Fix:
- Zod schema: z.array(z.string()).default([])
- Prompt: "Use empty array [] when there are no issues"
- Show empty array case explicitly in few-shot examples

Failure 4: Missing fields in nested objects

Symptom: Some fields are missing or renamed in nested objects

Cause: Deep nesting is hard for models to follow perfectly

Fix:
- Flatten the schema wherever possible
- Use nesting only when necessary
- Enumerate required fields explicitly
- Use Function Calling strict mode

Failure 5: Mixed number types

Symptom: Price comes as "50000" (string) or "50,000" (with comma)

Cause: Models tend to represent numbers as formatted text

Fix:
- Prompt: "Numbers must be integers or floats with no comma separators"
- Zod: z.number() (auto-validates type)
- Post-process: parseFloat(String(value).replace(/,/g, ''))

8. Temperature and Top-p Tuning

For structured output, consistency matters more than creativity.

Temperature

Controls the "randomness" of model output.

Temperature	Behavior	Best Use Case
0.0	Always picks most probable token	Structured output, classification, extraction
0.3–0.5	Slight variation	Summarization, Q&A
0.7–1.0	Creative	Writing, brainstorming
1.0+	Very creative / unstable	Experimental

For structured output: use temperature=0 or below 0.1.

const { object } = await generateObject({
  model: openai('gpt-4o-mini'),
  schema: SentimentSchema,
  temperature: 0,  // maximum consistency
  prompt: `Analyze: "${review}"`,
});

Top-p (Nucleus Sampling)

Top-p samples only from tokens whose cumulative probability reaches p. Similar effect to temperature but different mechanism. For structured output: leave top-p alone, just lower temperature. Changing both simultaneously is rarely necessary and makes behavior harder to predict.

9. Prompt Version Control

Prompts are code. Version control them.

// prompts/sentiment-analysis/v1.ts
export const SENTIMENT_ANALYSIS_PROMPT = {
  version: '1.0.0',
  system: `You are a review sentiment analyst...`,
  description: 'Initial version — basic sentiment classification',
  createdAt: '2026-01-01',
};

// prompts/sentiment-analysis/v2.ts
export const SENTIMENT_ANALYSIS_PROMPT_V2 = {
  version: '2.0.0',
  system: `You are a review sentiment analyst.
    You also analyze sentiment intensity...`,
  description: 'v2 — added intensity field',
  createdAt: '2026-03-01',
};

Maintain a test suite for prompt performance:

const testCases = [
  {
    input: "This is an amazing product!",
    expected: { sentiment: "positive" },
  },
  {
    input: "The packaging was terrible.",
    expected: { sentiment: "negative" },
  },
  {
    input: "It's just okay, nothing special.",
    expected: { sentiment: "neutral" },
  },
];

describe('Sentiment Analysis Prompt', () => {
  it.each(testCases)('correctly classifies "$input"', async ({ input, expected }) => {
    const result = await analyzeSentiment(input);
    expect(result.sentiment).toBe(expected.sentiment);
  });
});

Run this test suite every time you modify a prompt to catch regressions.

10. Conclusion

Tiered strategy for getting structured output:

Prompt level (quick start): Format specification in system prompt + few-shot examples
JSON mode (guarantee valid JSON): response_format: { type: "json_object" }
Function Calling / Structured Outputs (guarantee schema): Recommended for production
Zod + AI SDK (TypeScript type safety): Final form for TypeScript codebases

One principle to burn in: the more ambiguous the prompt, the more freely the model interprets it. When you need structured output, eliminate ambiguity ruthlessly and enforce the contract with technical mechanisms.

Prompt engineering isn't magic incantation — it's specification writing. Like a good product spec, the more precisely you document edge cases, expected formats, and prohibited behaviors, the more reliably the LLM behaves.

#Prompt Engineering #LLM #AI #Structured Output #JSON

← Back to List

M·06AI ENGINEERING2026.03.124 MIN READ

Practical Prompt Engineering: Getting Structured Output

프롬프트 엔지니어링 실제: 구조화된 출력 얻기

codemapo

INTERDISCIPLINARY DEV · SEOUL

Practical Prompt Engineering: Getting Structured Output

1. Prologue — "Why did it return something broken when I asked for JSON?"

First time wiring an LLM into production, you hit this quickly.

Prompt: "Analyze this review and return JSON.
         sentiment must be one of: positive/negative/neutral."

LLM Response:
"Of course! Here is the analysis in JSON format:

\`\`\`json
{
  "sentiment": "POSITIVE",  ← should be 'positive', not 'POSITIVE'
  "score": "high",           ← should be a number, not a string
  "issues": null             ← should be [] when empty
}
\`\`\`

The overall sentiment is quite positive!"  ← extra text after the JSON

This post is about fixing that.

2. Understanding the Role Structure

Modern LLM APIs organize conversations into three roles:

Role	Purpose	Written by
`system`	Sets model behavior and persona	Developer
`user`	User input, questions, requests	User or developer
`assistant`	Previous model responses (used in few-shot)	Model or developer

The Importance of System Prompts

The system prompt is the operating manual you give the LLM — "here's who you are, here's how you behave." Enforcing output format here dramatically improves consistency.

const systemPrompt = `
You are a sentiment analysis specialist for user reviews.

## Output Rules (MANDATORY)
- Output ONLY valid JSON. Include no other text whatsoever.
- Do NOT use JSON code blocks (\`\`\`json ... \`\`\`)
- Do NOT use markdown
- Do NOT add explanatory text

## JSON Schema
{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": number between 0.0 and 1.0,
  "key_phrases": string[] (max 3 items),
  "issues": string[] (empty array [] if none)
}
`;

This improves consistency, but still can't guarantee 100% compliance. The stronger methods are below.

3. Few-shot Prompting

Showing the model examples of what you want is far more effective than describing it.

const messages = [
  {
    role: "system" as const,
    content: "You are a review sentiment analyst. Output JSON only."
  },
  // Example 1: positive case
  {
    role: "user" as const,
    content: "Review: This product is amazing! Fast shipping and great quality."
  },
  {
    role: "assistant" as const,
    content: JSON.stringify({
      sentiment: "positive",
      confidence: 0.95,
      key_phrases: ["amazing product", "fast shipping", "great quality"],
      issues: []
    })
  },
  // Example 2: negative case
  {
    role: "user" as const,
    content: "Review: The packaging was terrible and the product arrived scratched."
  },
  {
    role: "assistant" as const,
    content: JSON.stringify({
      sentiment: "negative",
      confidence: 0.88,
      key_phrases: ["terrible packaging", "product scratched"],
      issues: ["poor packaging quality", "product damage"]
    })
  },
  // Example 3: edge case — neutral
  {
    role: "user" as const,
    content: "Review: It's okay I guess. Nothing special, nothing bad."
  },
  {
    role: "assistant" as const,
    content: JSON.stringify({
      sentiment: "neutral",
      confidence: 0.72,
      key_phrases: ["okay", "nothing special"],
      issues: []
    })
  },
  // Actual input
  {
    role: "user" as const,
    content: `Review: ${userReview}`
  }
];

The key to good few-shot: diversity of examples. Only showing happy paths fails at edge cases. Include negative, neutral, and cases with issues.

4. Chain-of-Thought — Give the Model Space to Think

CoT prompts the model to reason step-by-step before producing a final answer. For complex analysis or judgment tasks, accuracy improves substantially.

// Without CoT (simple classification)
const withoutCoT = `
Determine if this contract is valid.
Contract: "${contractText}"
Output: {"valid": boolean, "reason": string}
`;

// With CoT (step-by-step reasoning)
const withCoT = `
Analyze this contract's validity through these steps:

1. Parties: Are both parties clearly identified?
2. Purpose: Is the contract's purpose clear?
3. Obligations: Are rights and duties specified?
4. Legal requirements: Are signatures, dates, and legal formalities present?
5. Final judgment: Synthesize the above analysis.

Document your reasoning in the "analysis" field, then provide "valid" and "reason".

{
  "analysis": {
    "parties": "string",
    "purpose": "string",
    "obligations": "string",
    "legal_requirements": "string"
  },
  "valid": boolean,
  "reason": "string"
}

Contract: "${contractText}"
`;

The key: include the reasoning trace in the JSON output. This forces the model to actually work through the steps before concluding — and the trace stays in the response for debugging.

When Is CoT Needed?

Task Type	CoT Necessity	Examples
Simple classification	Low	Sentiment labeling
Information extraction	Low	Entity extraction
Complex reasoning	High	Contract analysis, code review
Numerical calculation	High	Cost estimation, formula application
Multi-step judgment	High	Medical triage, legal review

5. JSON Mode and Function Calling

Prompt engineering alone can't guarantee structured output 100% of the time. We need stronger mechanisms.

JSON Mode (OpenAI)

Pass response_format: { type: "json_object" } and the model is guaranteed to return valid JSON.

import OpenAI from 'openai';

const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  response_format: { type: 'json_object' },  // JSON mode
  messages: [
    {
      role: 'system',
      content: 'Analyze sentiment and return JSON. sentiment must be one of: positive/negative/neutral.'
    },
    {
      role: 'user',
      content: `Review: ${userReview}`
    }
  ]
});

// JSON.parse() no longer throws (malformed JSON is guaranteed not to happen)
const result = JSON.parse(response.choices[0].message.content!);

Caveat: valid JSON is guaranteed, but the schema (field names, types, value constraints) is not.

Structured Outputs (OpenAI)

Specify a JSON Schema explicitly and the model returns output matching that schema exactly.

const response = await openai.chat.completions.create({
  model: 'gpt-4o-2024-08-06',  // Structured Outputs supported model
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'sentiment_analysis',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          sentiment: {
            type: 'string',
            enum: ['positive', 'negative', 'neutral']  // constrained values!
          },
          confidence: {
            type: 'number',
            minimum: 0,
            maximum: 1
          },
          key_phrases: {
            type: 'array',
            items: { type: 'string' },
            maxItems: 3
          },
          issues: {
            type: 'array',
            items: { type: 'string' }
          }
        },
        required: ['sentiment', 'confidence', 'key_phrases', 'issues'],
        additionalProperties: false
      }
    }
  },
  messages: [...]
});

Function Calling

Function Calling is marketed as "the model can call functions" — but the real power is that it's the strongest way to get structured output.

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  tools: [
    {
      type: 'function',
      function: {
        name: 'analyze_sentiment',
        description: 'Analyzes the sentiment of a review text',
        parameters: {
          type: 'object',
          properties: {
            sentiment: {
              type: 'string',
              enum: ['positive', 'negative', 'neutral'],
            },
            confidence: {
              type: 'number',
              description: 'Classification confidence (0.0 to 1.0)'
            },
            key_phrases: {
              type: 'array',
              items: { type: 'string' },
              description: 'Key phrases (max 3)'
            },
            issues: {
              type: 'array',
              items: { type: 'string' },
              description: 'List of identified issues'
            }
          },
          required: ['sentiment', 'confidence', 'key_phrases', 'issues']
        }
      }
    }
  ],
  tool_choice: { type: 'function', function: { name: 'analyze_sentiment' } },
  messages: [
    { role: 'system', content: 'You are a review sentiment analyst.' },
    { role: 'user', content: `Review: ${userReview}` }
  ]
});

const toolCall = response.choices[0].message.tool_calls?.[0];
if (toolCall) {
  const result = JSON.parse(toolCall.function.arguments);
  console.log(result.sentiment); // always 'positive' | 'negative' | 'neutral'
}

6. Type-Safe Output with Zod + AI SDK

Combine the Vercel AI SDK (ai package) with Zod to get full TypeScript type safety.

import { openai } from '@ai-sdk/openai';
import { generateObject } from 'ai';
import { z } from 'zod';

// Define Zod schema
const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  key_phrases: z.array(z.string()).max(3),
  issues: z.array(z.string()),
});

// Type inference
type SentimentResult = z.infer<typeof SentimentSchema>;
// {
//   sentiment: "positive" | "negative" | "neutral";
//   confidence: number;
//   key_phrases: string[];
//   issues: string[];
// }

async function analyzeSentiment(review: string): Promise<SentimentResult> {
  const { object } = await generateObject({
    model: openai('gpt-4o-mini'),
    schema: SentimentSchema,
    prompt: `Analyze the sentiment of this review: "${review}"`,
    system: 'You are a review sentiment analysis specialist.'
  });

  // object is auto-inferred as SentimentResult
  // Zod validation runs automatically
  return object;
}

const result = await analyzeSentiment("Great product, fast shipping!");
console.log(result.sentiment); // TypeScript knows this is 'positive' | 'negative' | 'neutral'

generateObject uses Function Calling or Structured Outputs internally, then runs Zod validation on top. Type errors get caught at compile time, not runtime.

Complex Nested Schemas

const ProductExtractionSchema = z.object({
  products: z.array(z.object({
    name: z.string(),
    category: z.enum(['electronics', 'clothing', 'food', 'other']),
    price: z.number().positive().optional(),
    attributes: z.record(z.string()),  // dynamic key-value
  })),
  total_count: z.number().int().nonnegative(),
  confidence: z.number().min(0).max(1),
});

const { object } = await generateObject({
  model: openai('gpt-4o'),
  schema: ProductExtractionSchema,
  prompt: `Extract product info from this shopping list: "${shoppingList}"`,
});

// object.products is Array<{name: string; category: ...}> — fully typed

7. Common Failure Patterns and Fixes

Failure 1: Model ignores format instructions

Symptom: Asked for JSON only, got explanatory text attached

Cause: Weak system prompt, ambiguous instructions

Fix:
1. Explicitly enumerate prohibited behaviors in system prompt
2. Show desired format with few-shot examples
3. Use JSON mode or Function Calling

Failure 2: Inconsistent enum values

Symptom: "positive", "POSITIVE", "Positive", "pos" all appear

Cause: Possible values not clearly constrained in prompt

Fix:
- Use enum in JSON schema
- Use exactly the same values in few-shot examples
- Post-process to normalize (toLowerCase, etc.)

Failure 3: null vs empty array for optional fields

Symptom: issues is sometimes null, sometimes []

Cause: Model decides how to handle empty case on its own

Fix:
- Zod schema: z.array(z.string()).default([])
- Prompt: "Use empty array [] when there are no issues"
- Show empty array case explicitly in few-shot examples

Failure 4: Missing fields in nested objects

Symptom: Some fields are missing or renamed in nested objects

Cause: Deep nesting is hard for models to follow perfectly

Fix:
- Flatten the schema wherever possible
- Use nesting only when necessary
- Enumerate required fields explicitly
- Use Function Calling strict mode

Failure 5: Mixed number types

Symptom: Price comes as "50000" (string) or "50,000" (with comma)

Cause: Models tend to represent numbers as formatted text

Fix:
- Prompt: "Numbers must be integers or floats with no comma separators"
- Zod: z.number() (auto-validates type)
- Post-process: parseFloat(String(value).replace(/,/g, ''))

8. Temperature and Top-p Tuning

For structured output, consistency matters more than creativity.

Temperature

Controls the "randomness" of model output.

Temperature	Behavior	Best Use Case
0.0	Always picks most probable token	Structured output, classification, extraction
0.3–0.5	Slight variation	Summarization, Q&A
0.7–1.0	Creative	Writing, brainstorming
1.0+	Very creative / unstable	Experimental

For structured output: use temperature=0 or below 0.1.

const { object } = await generateObject({
  model: openai('gpt-4o-mini'),
  schema: SentimentSchema,
  temperature: 0,  // maximum consistency
  prompt: `Analyze: "${review}"`,
});

Top-p (Nucleus Sampling)

9. Prompt Version Control

Prompts are code. Version control them.

// prompts/sentiment-analysis/v1.ts
export const SENTIMENT_ANALYSIS_PROMPT = {
  version: '1.0.0',
  system: `You are a review sentiment analyst...`,
  description: 'Initial version — basic sentiment classification',
  createdAt: '2026-01-01',
};

// prompts/sentiment-analysis/v2.ts
export const SENTIMENT_ANALYSIS_PROMPT_V2 = {
  version: '2.0.0',
  system: `You are a review sentiment analyst.
    You also analyze sentiment intensity...`,
  description: 'v2 — added intensity field',
  createdAt: '2026-03-01',
};

Maintain a test suite for prompt performance:

const testCases = [
  {
    input: "This is an amazing product!",
    expected: { sentiment: "positive" },
  },
  {
    input: "The packaging was terrible.",
    expected: { sentiment: "negative" },
  },
  {
    input: "It's just okay, nothing special.",
    expected: { sentiment: "neutral" },
  },
];

describe('Sentiment Analysis Prompt', () => {
  it.each(testCases)('correctly classifies "$input"', async ({ input, expected }) => {
    const result = await analyzeSentiment(input);
    expect(result.sentiment).toBe(expected.sentiment);
  });
});

Run this test suite every time you modify a prompt to catch regressions.

10. Conclusion

Tiered strategy for getting structured output:

Prompt level (quick start): Format specification in system prompt + few-shot examples
JSON mode (guarantee valid JSON): response_format: { type: "json_object" }
Function Calling / Structured Outputs (guarantee schema): Recommended for production
Zod + AI SDK (TypeScript type safety): Final form for TypeScript codebases

#Prompt Engineering #LLM #AI #Structured Output #JSON

← Back to List