Azure OpenAI in Production (Expert)
Section O1: Deployments, APIs, and Common Pitfalls
QO1.1: Your code works with OpenAI but fails on Azure OpenAI because you passed model="gpt-4o". What’s the fix?
Answer: Use the deployment name you created in Azure OpenAI Studio.
Clarifications (exam traps):
- Azure OpenAI routes calls by deployment, not raw model IDs.
QO1.2: You want to test a new model version without breaking production. What deployment strategy fits best?
Answer: Create a new deployment (or parallel deployment) and do canary/A-B routing in your app.
Clarifications (exam traps):
- Don’t overwrite a production deployment name if you can’t roll back quickly.
QO1.3: You need to handle transient failures from the model endpoint. What status codes should trigger retries?
Answer: Retry on 429 and typical transient 5xx (500/502/503/504), with backoff and jitter.
Clarifications (exam traps):
- Do not blindly retry 400-series validation errors.
Section O2: Output Control (JSON, Schemas, Tools)
QO2.1: You require strictly valid JSON output for downstream parsing. What production pattern is correct?
Answer: Prompt for JSON + validate against a schema + repair/retry on failure.
Clarifications (exam traps):
- “Just set temperature to 0” is insufficient.
QO2.2: Your agent can call tools. How do you prevent it from calling unsafe tools?
Answer: Implement a server-side allowlist + authorization checks per tool call.
Clarifications (exam traps):
- Tool schemas are not permission boundaries.
QO2.3: A tool returns a large payload (e.g., 200KB). What’s the best practice before sending it to the model?
Answer: Summarize/transform and send only the minimum necessary subset.
Clarifications (exam traps):
- Large tool outputs increase token cost and can degrade model quality.
Section O3: Safety Controls (Filters, Blocklists, App Policies)
QO3.1: You need to block company-specific disallowed terms. What should you use?
Answer: Blocklists + application-side policy checks.
Clarifications (exam traps):
- Built-in filters cover broad categories; blocklists cover your own terms.
QO3.2: You need to prevent prompt injection via user input like “ignore previous instructions.” What’s the correct stance?
Answer: Don’t rely on the model to “behave”; use instruction hierarchy, tool allowlists, and server-side authorization.
Clarifications (exam traps):
- The correct answer includes architecture controls, not “stronger wording.”
QO3.3: You must redact PII from user prompts before model invocation. Where should this happen?
Answer: In your backend, before calling the model.
Clarifications (exam traps):
- Don’t send PII to the model and hope to remove it afterward.
Section O4: Performance and Cost Engineering
QO4.1: Your app is slow but total tokens are modest. What’s the biggest UX win?
Answer: Streaming responses to reduce time-to-first-token.
Clarifications (exam traps):
- Streaming improves perceived latency even if total time stays similar.
QO4.2: Your costs are dominated by repeating long static instructions. What’s the best fix?
Answer: Compress static instructions into a short system prompt and push dynamic knowledge into RAG.
Clarifications (exam traps):
- Fine-tuning can reduce prompt size for style/format, but RAG handles changing knowledge.
QO4.3: You’re hitting token limits due to long conversations. What’s the standard mitigation?
Answer: Summarize older turns and keep only the relevant conversation state (plus key user preferences).
Clarifications (exam traps):
- “Increase max tokens” doesn’t increase context window.
Section O5: Enterprise Integration Patterns
QO5.1: You need centralized auth, rate limiting, and request logging for model calls. What Azure service is designed for this?
Answer: Azure API Management.
Clarifications (exam traps):
- APIM doesn’t replace VNet/Private Link requirements.
QO5.2: You need secretless access from Azure Functions to Azure OpenAI. What’s the recommended approach?
Answer: Managed identity + Azure AD auth (where supported) or MI to retrieve secrets from Key Vault.
Clarifications (exam traps):
- Keys in app settings is not “secretless.”