Skip to content

Azure OpenAI in Production (Expert)

Section O1: Deployments, APIs, and Common Pitfalls

QO1.1: Your code works with OpenAI but fails on Azure OpenAI because you passed model="gpt-4o". What’s the fix?

Answer: Use the deployment name you created in Azure OpenAI Studio.

Clarifications (exam traps):

  • Azure OpenAI routes calls by deployment, not raw model IDs.

QO1.2: You want to test a new model version without breaking production. What deployment strategy fits best?

Answer: Create a new deployment (or parallel deployment) and do canary/A-B routing in your app.

Clarifications (exam traps):

  • Don’t overwrite a production deployment name if you can’t roll back quickly.

QO1.3: You need to handle transient failures from the model endpoint. What status codes should trigger retries?

Answer: Retry on 429 and typical transient 5xx (500/502/503/504), with backoff and jitter.

Clarifications (exam traps):

  • Do not blindly retry 400-series validation errors.

Section O2: Output Control (JSON, Schemas, Tools)

QO2.1: You require strictly valid JSON output for downstream parsing. What production pattern is correct?

Answer: Prompt for JSON + validate against a schema + repair/retry on failure.

Clarifications (exam traps):

  • “Just set temperature to 0” is insufficient.

QO2.2: Your agent can call tools. How do you prevent it from calling unsafe tools?

Answer: Implement a server-side allowlist + authorization checks per tool call.

Clarifications (exam traps):

  • Tool schemas are not permission boundaries.

QO2.3: A tool returns a large payload (e.g., 200KB). What’s the best practice before sending it to the model?

Answer: Summarize/transform and send only the minimum necessary subset.

Clarifications (exam traps):

  • Large tool outputs increase token cost and can degrade model quality.

Section O3: Safety Controls (Filters, Blocklists, App Policies)

QO3.1: You need to block company-specific disallowed terms. What should you use?

Answer: Blocklists + application-side policy checks.

Clarifications (exam traps):

  • Built-in filters cover broad categories; blocklists cover your own terms.

QO3.2: You need to prevent prompt injection via user input like “ignore previous instructions.” What’s the correct stance?

Answer: Don’t rely on the model to “behave”; use instruction hierarchy, tool allowlists, and server-side authorization.

Clarifications (exam traps):

  • The correct answer includes architecture controls, not “stronger wording.”

QO3.3: You must redact PII from user prompts before model invocation. Where should this happen?

Answer: In your backend, before calling the model.

Clarifications (exam traps):

  • Don’t send PII to the model and hope to remove it afterward.

Section O4: Performance and Cost Engineering

QO4.1: Your app is slow but total tokens are modest. What’s the biggest UX win?

Answer: Streaming responses to reduce time-to-first-token.

Clarifications (exam traps):

  • Streaming improves perceived latency even if total time stays similar.

QO4.2: Your costs are dominated by repeating long static instructions. What’s the best fix?

Answer: Compress static instructions into a short system prompt and push dynamic knowledge into RAG.

Clarifications (exam traps):

  • Fine-tuning can reduce prompt size for style/format, but RAG handles changing knowledge.

QO4.3: You’re hitting token limits due to long conversations. What’s the standard mitigation?

Answer: Summarize older turns and keep only the relevant conversation state (plus key user preferences).

Clarifications (exam traps):

  • “Increase max tokens” doesn’t increase context window.

Section O5: Enterprise Integration Patterns

QO5.1: You need centralized auth, rate limiting, and request logging for model calls. What Azure service is designed for this?

Answer: Azure API Management.

Clarifications (exam traps):

  • APIM doesn’t replace VNet/Private Link requirements.

Answer: Managed identity + Azure AD auth (where supported) or MI to retrieve secrets from Key Vault.

Clarifications (exam traps):

  • Keys in app settings is not “secretless.”

Released under the MIT License.