Azure OpenAI in Production (Expert)

Section O1: Deployments, APIs, and Common Pitfalls

QO1.1: Your code works with OpenAI but fails on Azure OpenAI because you passed `model="gpt-4o"`. What’s the fix?

Answer: Use the deployment name you created in Azure OpenAI Studio.

Clarifications (exam traps):

Azure OpenAI routes calls by deployment, not raw model IDs.

QO1.2: You want to test a new model version without breaking production. What deployment strategy fits best?

Answer: Create a new deployment (or parallel deployment) and do canary/A-B routing in your app.

Clarifications (exam traps):

Don’t overwrite a production deployment name if you can’t roll back quickly.

QO1.3: You need to handle transient failures from the model endpoint. What status codes should trigger retries?

Answer: Retry on 429 and typical transient 5xx (500/502/503/504), with backoff and jitter.

Clarifications (exam traps):

Do not blindly retry 400-series validation errors.

Section O2: Output Control (JSON, Schemas, Tools)

QO2.1: You require strictly valid JSON output for downstream parsing. What production pattern is correct?

Answer: Prompt for JSON + validate against a schema + repair/retry on failure.

Clarifications (exam traps):

“Just set temperature to 0” is insufficient.

QO2.2: Your agent can call tools. How do you prevent it from calling unsafe tools?

Answer: Implement a server-side allowlist + authorization checks per tool call.

Clarifications (exam traps):

Tool schemas are not permission boundaries.

QO2.3: A tool returns a large payload (e.g., 200KB). What’s the best practice before sending it to the model?

Answer: Summarize/transform and send only the minimum necessary subset.

Clarifications (exam traps):

Large tool outputs increase token cost and can degrade model quality.

Section O3: Safety Controls (Filters, Blocklists, App Policies)

QO3.1: You need to block company-specific disallowed terms. What should you use?

Answer: Blocklists + application-side policy checks.

Clarifications (exam traps):

Built-in filters cover broad categories; blocklists cover your own terms.

QO3.2: You need to prevent prompt injection via user input like “ignore previous instructions.” What’s the correct stance?

Answer: Don’t rely on the model to “behave”; use instruction hierarchy, tool allowlists, and server-side authorization.

Clarifications (exam traps):

The correct answer includes architecture controls, not “stronger wording.”

QO3.3: You must redact PII from user prompts before model invocation. Where should this happen?

Answer: In your backend, before calling the model.

Clarifications (exam traps):

Don’t send PII to the model and hope to remove it afterward.

Section O4: Performance and Cost Engineering

QO4.1: Your app is slow but total tokens are modest. What’s the biggest UX win?

Answer: Streaming responses to reduce time-to-first-token.

Clarifications (exam traps):

Streaming improves perceived latency even if total time stays similar.

QO4.2: Your costs are dominated by repeating long static instructions. What’s the best fix?

Answer: Compress static instructions into a short system prompt and push dynamic knowledge into RAG.

Clarifications (exam traps):

Fine-tuning can reduce prompt size for style/format, but RAG handles changing knowledge.

QO4.3: You’re hitting token limits due to long conversations. What’s the standard mitigation?

Answer: Summarize older turns and keep only the relevant conversation state (plus key user preferences).

Clarifications (exam traps):

“Increase max tokens” doesn’t increase context window.

Section O5: Enterprise Integration Patterns

QO5.1: You need centralized auth, rate limiting, and request logging for model calls. What Azure service is designed for this?

Answer: Azure API Management.

Clarifications (exam traps):

APIM doesn’t replace VNet/Private Link requirements.

QO5.2: You need secretless access from Azure Functions to Azure OpenAI. What’s the recommended approach?

Answer: Managed identity + Azure AD auth (where supported) or MI to retrieve secrets from Key Vault.

Clarifications (exam traps):

Keys in app settings is not “secretless.”

Azure OpenAI in Production (Expert) ​

Section O1: Deployments, APIs, and Common Pitfalls ​

QO1.1: Your code works with OpenAI but fails on Azure OpenAI because you passed model="gpt-4o". What’s the fix? ​

QO1.2: You want to test a new model version without breaking production. What deployment strategy fits best? ​

QO1.3: You need to handle transient failures from the model endpoint. What status codes should trigger retries? ​

Section O2: Output Control (JSON, Schemas, Tools) ​

QO2.1: You require strictly valid JSON output for downstream parsing. What production pattern is correct? ​

QO2.2: Your agent can call tools. How do you prevent it from calling unsafe tools? ​

QO2.3: A tool returns a large payload (e.g., 200KB). What’s the best practice before sending it to the model? ​

Section O3: Safety Controls (Filters, Blocklists, App Policies) ​

QO3.1: You need to block company-specific disallowed terms. What should you use? ​

QO3.2: You need to prevent prompt injection via user input like “ignore previous instructions.” What’s the correct stance? ​

QO3.3: You must redact PII from user prompts before model invocation. Where should this happen? ​

Section O4: Performance and Cost Engineering ​

QO4.1: Your app is slow but total tokens are modest. What’s the biggest UX win? ​

QO4.2: Your costs are dominated by repeating long static instructions. What’s the best fix? ​

QO4.3: You’re hitting token limits due to long conversations. What’s the standard mitigation? ​

Section O5: Enterprise Integration Patterns ​

QO5.1: You need centralized auth, rate limiting, and request logging for model calls. What Azure service is designed for this? ​

QO5.2: You need secretless access from Azure Functions to Azure OpenAI. What’s the recommended approach? ​

Azure OpenAI in Production (Expert)

Section O1: Deployments, APIs, and Common Pitfalls

QO1.1: Your code works with OpenAI but fails on Azure OpenAI because you passed `model="gpt-4o"`. What’s the fix?

QO1.2: You want to test a new model version without breaking production. What deployment strategy fits best?

QO1.3: You need to handle transient failures from the model endpoint. What status codes should trigger retries?

Section O2: Output Control (JSON, Schemas, Tools)

QO2.1: You require strictly valid JSON output for downstream parsing. What production pattern is correct?

QO2.2: Your agent can call tools. How do you prevent it from calling unsafe tools?

QO2.3: A tool returns a large payload (e.g., 200KB). What’s the best practice before sending it to the model?

Section O3: Safety Controls (Filters, Blocklists, App Policies)

QO3.1: You need to block company-specific disallowed terms. What should you use?

QO3.2: You need to prevent prompt injection via user input like “ignore previous instructions.” What’s the correct stance?

QO3.3: You must redact PII from user prompts before model invocation. Where should this happen?

Section O4: Performance and Cost Engineering

QO4.1: Your app is slow but total tokens are modest. What’s the biggest UX win?

QO4.2: Your costs are dominated by repeating long static instructions. What’s the best fix?

QO4.3: You’re hitting token limits due to long conversations. What’s the standard mitigation?

Section O5: Enterprise Integration Patterns

QO5.1: You need centralized auth, rate limiting, and request logging for model calls. What Azure service is designed for this?

QO5.2: You need secretless access from Azure Functions to Azure OpenAI. What’s the recommended approach?