Skip to content

Expert Cross-Domain (AI-102) - Q&A

This document contains expert-level, scenario-driven questions with direct answers and exam clarifications spanning all AI-102 domains (GenAI, Search/RAG, Vision, Language/Speech, Document Intelligence, security, monitoring, and ops).


Section 1: Architecture, Security, and Operations (Expert)

Q1.1: You must call Azure AI services from an Azure Function without storing secrets. What’s the best authentication approach?

Answer: Use a managed identity for the Function + Azure AD (Entra ID) authentication to the AI resource (RBAC).

Clarifications (exam traps):

  • Keys are easiest but violate “no secrets” and are harder to rotate safely.
  • If the service supports Azure AD auth, prefer it with managed identity; otherwise store keys in Key Vault and use managed identity to retrieve them.

Q1.2: Your security team requires that AI service traffic never traverses the public internet. What should you implement?

Answer: Private Endpoint (Azure Private Link) + disable (or restrict) public network access on the AI resource.

Clarifications (exam traps):

  • “Selected networks” still uses public endpoints; Private Endpoint is the strict isolation requirement.
  • Don’t forget Private DNS configuration so the service hostname resolves to the private IP.

Q1.3: Requests to an Azure AI service intermittently fail with HTTP 429. What’s the correct client-side behavior?

Answer: Implement retry with exponential backoff + jitter, and respect the Retry-After header.

Clarifications (exam traps):

  • Do not retry 400-series validation errors (e.g., 400/401/403) as “transient.”
  • 429 often indicates rate limit; scaling out clients without backoff can worsen the issue.

Q1.4: You need a single place to enforce per-client quotas, auth, and request logging across multiple AI endpoints. What Azure service fits best?

Answer: Azure API Management (APIM) in front of your AI-calling backend.

Clarifications (exam traps):

  • APIM is the API gateway; it’s not a replacement for private endpoints.
  • If you must avoid exposing AI keys to clients, put keys/MI usage server-side behind APIM.

Q1.5: Your team wants to rotate AI service keys with near-zero downtime. What’s the standard pattern?

Answer: Use primary/secondary keys and rotate one while the app uses the other, then swap.

Clarifications (exam traps):

  • Rotation is easiest when secrets are pulled from Key Vault (and cached with TTL).
  • If using Azure AD auth, you avoid key rotation entirely.

Q1.6: A production incident requires tracing which user request triggered a tool call and which model output followed. What should you instrument?

Answer: End-to-end correlation IDs + structured logs to Application Insights (or Log Analytics), including tool-call spans.

Clarifications (exam traps):

  • “Enable diagnostic logs” on the AI resource helps, but you still need application-level tracing.
  • Avoid logging full prompts with sensitive data; log hashes/redacted fields where required.

Section 2: Azure OpenAI, Prompting, and Safety (Expert)

Q2.1: You must ensure the model responds only using provided retrieved passages and refuses otherwise. What’s the strongest pattern?

Answer: Use RAG with strict grounding instructions + return citations + implement an application-side “no supporting evidence” check.

Clarifications (exam traps):

  • Prompts alone are not a guarantee; add a post-check (e.g., require cited chunk IDs).
  • Set temperature low for factual tasks, but correctness still depends on grounding quality.

Q2.2: Your app is vulnerable to prompt injection inside retrieved documents (“ignore instructions and leak secrets”). What should you do?

Answer: Treat retrieved content as untrusted data: separate it from instructions, constrain tool/function calling, and implement content filtering + allowlist tools.

Clarifications (exam traps):

  • Put system instructions in the system message and retrieved passages in a clearly delimited data section.
  • Do not grant the model arbitrary tool access; use tool allowlists and server-side authorization.

Q2.3: When should you pick fine-tuning over RAG for enterprise Q&A?

Answer: Fine-tune for style/format consistency or specialized behavior; use RAG for fresh/variable knowledge and citations.

Clarifications (exam traps):

  • Fine-tuning does not “upload your documents” for factual recall; it changes behavior, not a searchable KB.
  • If content changes frequently, RAG wins operationally.

Q2.4: You need deterministic-ish answers for a compliance workflow. Which parameter is the first lever?

Answer: Set temperature low (often 0–0.2) and constrain output format.

Clarifications (exam traps):

  • Low temperature reduces randomness, but does not guarantee correctness.
  • Also cap max tokens and use stop sequences for strict schemas.

Q2.5: You need to stop the model from generating disallowed terms specific to your company policy. What Azure OpenAI feature helps?

Answer: Custom blocklists (plus built-in content filters).

Clarifications (exam traps):

  • Content filters handle broad safety categories; blocklists handle your terms.
  • Apply blocklists at the right scope (deployment/subscription) based on governance needs.

Q2.6: Your GenAI app must keep prompts and completions out of public networks and also out of client devices. What architecture is most appropriate?

Answer: Client → your backend (in VNet) → Azure OpenAI via Private Endpoint; keep API calls and credentials server-side.

Clarifications (exam traps):

  • Direct-to-Azure OpenAI from a browser is usually wrong for enterprise: you leak keys/tokens and lose governance.

Section 3: RAG + Azure AI Search (Expert)

Q3.1: You want best retrieval quality for both exact terms (IDs) and semantic meaning (paraphrases). What search approach?

Answer: Hybrid search (keyword + vector) with optional semantic ranking/re-ranking.

Clarifications (exam traps):

  • Vector-only can miss exact ID matches; keyword-only can miss paraphrases.
  • If asked “best overall relevance,” hybrid is the safe exam pick.

Q3.2: Your retrieved chunks are often irrelevant because documents are long and multi-topic. What’s the best first fix?

Answer: Improve chunking strategy (smaller, semantically coherent chunks + overlap) and store good metadata.

Clarifications (exam traps):

  • Bigger context windows don’t fix bad retrieval; they amplify noise.
  • Add fields like title, section, page, and use them in filtering.

Q3.3: You need to filter retrieval to “only documents the user is allowed to see.” Where should this be enforced?

Answer: Enforce authorization server-side, and apply security as filters in search queries (plus storage-level access control).

Clarifications (exam traps):

  • Do not rely on “the model will not mention restricted docs.”
  • Use document-level metadata like acl and filter with it.

Q3.4: You want to store embeddings in Azure AI Search. What do you need to define in the index schema?

Answer: A vector field Collection(Edm.Single) with the correct dimensions for your embedding model.

Clarifications (exam traps):

  • If you change embedding models, dimensions may change → re-index.

Q3.5: You built RAG but answers still hallucinate. What’s the highest-leverage operational metric to add?

Answer: Track groundedness: require citations and measure “answer coverage by retrieved chunks.”

Clarifications (exam traps):

  • “Lower temperature” can reduce weirdness but doesn’t ensure grounding.
  • Add a “refuse when insufficient evidence” policy.

Section 4: Knowledge Mining + Document Intelligence (Expert)

Q4.1: You must extract structured fields from invoices with varying layouts. Which service is designed for this?

Answer: Azure AI Document Intelligence (prebuilt invoice model, or custom if needed).

Clarifications (exam traps):

  • Vision OCR extracts text; it doesn’t reliably map fields like InvoiceTotal.
  • Start with prebuilt models to reduce time-to-value.

Q4.2: Your pipeline needs: ingest PDFs from Blob, OCR + key phrases + entity extraction, then index into Azure AI Search automatically. What Search feature enables this?

Answer: Indexers + skillsets (cognitive skills) for knowledge enrichment.

Clarifications (exam traps):

  • “Indexer” moves content into Search; “skillset” enriches it.
  • You can enrich with built-in skills and custom skills if needed.

Q4.3: You need to support multilingual documents and normalize them into a single search language. What’s the right enrichment step?

Answer: Add language detection and translation (Language service skill) in the enrichment pipeline.

Clarifications (exam traps):

  • Don’t translate everything blindly; detect language first.
  • Store both original and translated fields if required for audit.

Section 5: NLP + Speech (Expert)

Q5.1: You need intent classification + entity extraction for a custom domain with continual updates. Which service fits?

Answer: Conversational Language Understanding (CLU) in Azure AI Language.

Clarifications (exam traps):

  • LUIS is legacy; CLU is the modern path.
  • For pure sentiment/key phrases/entities (no intents), use Text Analytics features instead.

Q5.2: A call center needs real-time transcription and speaker separation. Which Azure AI capability is relevant?

Answer: Azure AI Speech for Speech-to-Text with speaker diarization (when available for the scenario).

Clarifications (exam traps):

  • Translation is a different feature; diarization is about “who spoke when.”

Section 6: Computer Vision (Expert)

Q6.1: You need OCR that works well on multi-page PDFs and returns page/line structure. Which capability?

Answer: Azure AI Vision Read.

Clarifications (exam traps):

  • “Analyze Image” tagging is not OCR.
  • Document Intelligence can also do OCR, but Read is the direct OCR capability in Vision.

Q6.2: You need to detect a small set of custom product defects from images. What’s the right service?

Answer: Custom Vision (classification or object detection, depending on need).

Clarifications (exam traps):

  • If you need bounding boxes, it’s object detection.
  • For “is defective or not,” that’s classification.

Section 7: Cost + Performance (Expert)

Q7.1: Your Azure OpenAI bill is dominated by prompt tokens from repeatedly sending long policy text. What’s the best optimization?

Answer: Move static instructions into a compact system prompt template, reduce repeated context, and use RAG for only what’s needed.

Clarifications (exam traps):

  • Fine-tuning can reduce prompt size for style/format, but it’s not a drop-in replacement for changing knowledge.
  • Caching common responses and retrieved contexts is often the biggest win.

Q7.2: You need to reduce latency in RAG. Which change usually helps most first?

Answer: Reduce retrieval + prompt size: fewer chunks (top-k), better chunking, and avoid sending irrelevant context.

Clarifications (exam traps):

  • Bigger models often increase latency; pick the smallest model that meets quality.

Section 8: Exam-Style “Choose the Best Option” (Expert)

Q8.1: You must implement enterprise semantic search over SharePoint docs with enrichment and security trimming. What’s the most Azure-native stack?

Answer: Azure AI Search (indexers/skillsets) + your identity-aware middleware for ACL filtering.

Clarifications (exam traps):

  • Azure AI Search can index content, but you must implement/retain document-level ACL metadata and enforce it in queries.

Q8.2: A customer requires that model inputs are not used to train models. What’s the Azure OpenAI position?

Answer: Azure OpenAI is designed for enterprise; customer prompts/completions are not used to train the underlying foundation models.

Clarifications (exam traps):

  • The exam typically tests that Azure OpenAI offers enterprise privacy/controls compared to public endpoints.
  • Still implement your own data handling policies (logging, retention, PII redaction).

Section 9: Azure OpenAI Deployments, APIs, and Tooling (Expert)

Q9.1: Your code uses model: "gpt-4o" but Azure OpenAI returns “The model does not exist.” What is the most likely mistake?

Answer: You used the model name instead of the deployment name.

Clarifications (exam traps):

  • In Azure OpenAI, your application targets a deployment (your chosen name), not the raw model ID.
  • This is a common “works on OpenAI, fails on Azure OpenAI” pitfall.

Q9.2: You need to enforce that only your backend can call Azure OpenAI, never browsers/mobile apps. What’s the correct design?

Answer: Keep Azure OpenAI calls server-side only; expose a backend API and apply auth/authorization there.

Clarifications (exam traps):

  • Putting keys/tokens in a client app is almost always an exam “wrong answer.”
  • Use APIM if you need centralized auth, quotas, and analytics.

Q9.3: You need JSON-only outputs to feed an automated pipeline. What’s the correct approach?

Answer: Constrain output format + implement schema validation and a repair/retry loop.

Clarifications (exam traps):

  • Never assume the model will always return valid JSON.
  • The correct production answer includes validation, not just “prompt better.”

Q9.4: Your assistant can call tools, but you must ensure it never calls admin tools. What’s the key control?

Answer: A server-side allowlist of tools + authorization checks per call.

Clarifications (exam traps):

  • Tool definitions are not authorization; enforce permissions in code.
  • Keep tool outputs minimal and scoped to the request.

Q9.5: Users complain the assistant feels slow even though total completion time is acceptable. What feature improves perceived latency?

Answer: Streaming responses (time-to-first-token UX).

Clarifications (exam traps):

  • Streaming improves UX, but policy might require buffering for moderation in high-risk contexts.

Q9.6: You’re choosing between text-embedding-3-small and text-embedding-3-large for RAG. What’s the tradeoff?

Answer: -3-large usually improves retrieval quality but increases cost/storage/latency (bigger vectors).

Clarifications (exam traps):

  • If your embedding dimensions change, you must re-index your vector store.

Section 10: Azure AI Search Deep Dive (Expert)

Q10.1: You need relevance that understands meaning, but also respects business boosts (e.g., official docs). What combination fits?

Answer: Semantic ranking + scoring profiles.

Clarifications (exam traps):

  • Scoring profiles are deterministic boosts; semantic ranker improves meaning-based relevance.

Q10.2: Users search for “SSO” but content says “single sign-on”. What Search feature addresses this best?

Answer: A synonym map for the relevant field(s).

Clarifications (exam traps):

  • Synonyms help lexical matching; they don’t replace vector/hybrid retrieval.

Q10.3: Your queries match irrelevant partial words. What configuration is most relevant?

Answer: Use an appropriate analyzer (and optionally separate exact-match fields).

Clarifications (exam traps):

  • Field analyzers strongly affect tokenization and matching behavior.

Answer: An indexer (scheduled) connected to the Blob data source.

Clarifications (exam traps):

  • Indexers ingest; enrichment requires a skillset.

Q10.5: You need OCR for scanned PDFs during indexing. How is this typically implemented?

Answer: Add OCR as part of a skillset in the enrichment pipeline.

Clarifications (exam traps):

  • For complex forms/field extraction, Document Intelligence may be a better upstream step than generic OCR.

Q10.6: Vector results contain near-duplicate chunks from the same document. What’s a practical fix?

Answer: Deduplicate by chunk hashing and prefer diversity selection (e.g., MMR-style) or post-filtering by document/section.

Clarifications (exam traps):

  • Increasing topK often increases duplicates unless you diversify.

Section 11: Document Intelligence Deep Dive (Expert)

Answer: Try prebuilt models first; if they don’t fit, train a custom model with labeled samples.

Clarifications (exam traps):

  • Prebuilt models reduce time-to-value; custom models require representative labeled documents.

Q11.2: Your extraction returns low-confidence values for critical fields. What should production logic do?

Answer: Apply confidence thresholds and route low-confidence cases to human review or fallback logic.

Clarifications (exam traps):

  • The “right” answer usually includes a HITL path for business-critical extraction.

Q11.3: You have multiple templates for the same document type. What pattern helps?

Answer: Use a composed model (or multiple models selected by a classifier/rules).

Clarifications (exam traps):

  • Composition is often the cleanest approach when templates are materially different.

Section 12: Evaluation, Monitoring, and Responsible AI (Expert)

Q12.1: You want to measure whether answers are supported by sources over time. What do you track?

Answer: Groundedness/citation validity plus retrieval quality metrics (e.g., Recall@k).

Clarifications (exam traps):

  • Engagement metrics don’t replace groundedness for RAG correctness.

Q12.2: You need evidence the assistant resists jailbreak/prompt-injection attempts. What’s the best practice?

Answer: Maintain a red-team test set and run it continuously with policy controls (filters, blocklists, tool allowlists).

Clarifications (exam traps):

  • One-time testing is not sufficient; make it repeatable and measurable.

Q12.3: You need debugging logs but can’t store PII. What’s the standard compromise?

Answer: Redact/tokenize sensitive fields before logging and enforce strict retention/access controls.

Clarifications (exam traps):

  • “Log everything” is almost never correct; design for privacy and compliance.

Section 13: Reliability + Quotas (Expert)

Q13.1: You need resilience for AI calls under transient failures. What’s the correct pattern set?

Answer: Retry (with backoff) + circuit breaker + timeouts + fallback.

Clarifications (exam traps):

  • Retries alone can amplify failures; circuit breakers prevent retry storms.

Q13.2: You must handle quota exhaustion without taking down the app. What’s a sensible design?

Answer: Implement quota-aware degradation (queue/batch, reduce features, shift to smaller model) and alert before limits are hit.

Clarifications (exam traps):

  • “Request a quota increase” is not an incident response plan.

Section 3: Advanced Azure AI Search and RAG Patterns (Expert)

Answer: Use multilingual semantic search with the language field mapped correctly + multi-language analyzer for each language field + enable semantic ranking.

Clarifications (exam traps):

  • Semantic ranking works best when you configure language-specific analyzers (e.g., en.microsoft, fr.microsoft) for each field.
  • For truly multilingual scenarios, consider multiple indexes (one per language) or language detection + field routing.
  • Vector search with multilingual embeddings (e.g., text-embedding-ada-002 supports 100+ languages) can complement keyword search.
  • Set queryLanguage parameter in semantic configuration to match the dominant query language.

Best Practices:

  1. Use language-specific analyzers for better tokenization
  2. Implement language detection in your application layer
  3. Consider hybrid search (keyword + semantic + vector) for best results
  4. Test with representative multilingual queries

Documentation Links:


Q3.2: You need to implement a RAG system that can search across both structured data (SQL database) and unstructured documents. What's the architecture pattern?

Answer: Use Azure AI Search as the unified query layer with:

  1. Multiple data sources (SQL indexer + Blob/Document indexer)
  2. Custom skillsets to enrich and normalize data
  3. Unified index with both structured fields and vector embeddings
  4. Hybrid retrieval (keyword + semantic + vector) for comprehensive results

Clarifications (exam traps):

  • Don't query SQL and documents separately then merge—use AI Search's indexers to bring all data into a single searchable index.
  • For SQL data, use the Azure SQL indexer with change tracking or high-water mark for incremental updates.
  • Use field mappings and output field mappings in indexers to normalize schema differences.
  • Implement custom web API skill if complex SQL transformations are needed.

Architecture Components:

  1. Data Sources: Azure SQL Database, Blob Storage
  2. Indexers: SQL indexer, Blob indexer with Document Intelligence
  3. Skillset: Text extraction, entity recognition, vectorization
  4. Index: Unified schema with searchable fields and vectors
  5. RAG Application: Retrieve from index → augment prompt → generate response

Documentation Links:


Q3.3: Your organization requires audit logging of all user queries and model responses for compliance. What's the complete solution?

Answer: Implement multi-layer logging:

  1. Application Insights with custom telemetry for query/response pairs + user context
  2. Azure OpenAI diagnostic logs sent to Log Analytics workspace
  3. Azure AI Search query logs enabled
  4. Correlation IDs across all layers for end-to-end tracing
  5. Data retention policies configured per compliance requirements

Clarifications (exam traps):

  • Azure OpenAI diagnostic logs alone don't capture your application context (user ID, session ID, business metadata)—you need application-level logging.
  • Don't log PII/sensitive data in plain text—use hashing, redaction, or separate secure storage.
  • Set up Azure Monitor alerts for anomalous patterns (e.g., high error rates, unusual token usage).
  • Consider Azure Data Factory or Event Hubs for streaming logs to long-term storage (e.g., Azure Data Lake).

Log Analytics Queries for Compliance:

kusto
// Track all Azure OpenAI requests
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where Category == "RequestResponse"
| project TimeGenerated, OperationName, ResultSignature, DurationMs, properties_s
| order by TimeGenerated desc

// Correlate with Application Insights
union ApplicationInsights, AzureDiagnostics
| where CorrelationId == "<correlation-id>"

Documentation Links:


Section 4: Document Intelligence and Custom Models (Expert)

Q4.1: You need to extract data from custom invoices with varying layouts. Should you use Document Intelligence prebuilt invoice model or train a custom model?

Answer: Start with prebuilt invoice model and evaluate coverage. If accuracy is insufficient for your specific invoice variations, train a custom extraction model using Document Intelligence Studio.

Clarifications (exam traps):

  • Prebuilt invoice model handles common invoice formats well (line items, totals, vendor info) and supports 100+ languages.
  • Use custom extraction model when:
    • Invoices have unique layouts not covered by prebuilt model
    • You need to extract custom fields (e.g., project codes, department IDs)
    • Accuracy requirements exceed prebuilt model performance
  • For training custom models, you need at least 5 labeled examples (5-15 recommended).
  • Composed models let you combine multiple custom models for different invoice types in one endpoint.

Decision Tree:

  1. Test prebuilt invoice model → adequate? Use it.
  2. If inadequate → collect 5-15 sample invoices
  3. Label in Document Intelligence Studio
  4. Train custom extraction model
  5. If multiple invoice types → create composed model

Model Comparison:

FeaturePrebuilt InvoiceCustom Extraction
Training RequiredNoYes (5+ samples)
Layout FlexibilityCommon layoutsAny layout
Custom FieldsLimitedUnlimited
Language Support100+Training data language
Time to DeployImmediateHours to days

Documentation Links:


Q4.2: Your document processing pipeline needs to handle both scanned PDFs and native digital PDFs. What considerations affect your Document Intelligence implementation?

Answer: Implement adaptive processing based on document type:

  1. Use Read OCR for scanned PDFs (image-based)
  2. Use Layout model for native PDFs (preserve structure and tables)
  3. Implement automatic detection of document type using docType or analyze pixel patterns
  4. Configure quality assessment to route low-quality scans for manual review

Clarifications (exam traps):

  • Native PDFs with embedded text are faster and more accurate than OCR—detect them to optimize processing.
  • Read API handles both but optimizes differently based on content detection.
  • For hybrid PDFs (mix of text and scanned pages), Document Intelligence processes each page appropriately.
  • Table extraction works better on native PDFs than scanned documents with complex layouts.
  • Use confidence scores to identify pages needing human review.

Implementation Pattern:

python
from azure.ai.formrecognizer import DocumentAnalysisClient

# Analyze document
poller = client.begin_analyze_document("prebuilt-layout", document)
result = poller.result()

# Check confidence and handle accordingly
for page in result.pages:
    if any(line.confidence < 0.85 for line in page.lines):
        # Route to manual review queue
        send_to_manual_review(page)
    else:
        # Process automatically
        extract_and_store(page)

Quality Optimization Tips:

  1. Preprocess scanned images (deskew, denoise) before submission
  2. Use minimum 300 DPI for scanned documents
  3. Ensure color scans for documents with color-coded information
  4. Implement retry logic with quality enhancement for low-confidence results

Documentation Links:


Section 5: Multi-Modal AI and Vision (Expert)

Q5.1: You're building a retail analytics solution that needs to count people entering/exiting store zones and detect when shelves are empty. What Azure AI Vision capabilities should you use?

Answer: Use Azure AI Vision Spatial Analysis with:

  1. PersonCount operation for zone entry/exit counting
  2. PersonCrossingLine for directional tracking
  3. Custom Vision object detection model trained on shelf states (full/empty)
  4. Edge deployment using Azure Stack Edge or IoT Edge for real-time processing

Clarifications (exam traps):

  • Spatial Analysis requires specific hardware (NVIDIA GPU) and containerized deployment—it's not a simple API call.
  • PersonCount operates on video streams, not individual images—plan for continuous video processing.
  • For shelf monitoring, Azure AI Vision's standard analyze image API can detect objects, but a Custom Vision model trained on your specific shelf/product configurations gives better accuracy.
  • Consider privacy and compliance—Spatial Analysis can be configured for privacy mode (no facial recognition, only silhouettes).
  • Use Azure IoT Hub or Event Hubs for streaming telemetry data to analytics backend.

Architecture:

IP Cameras → Azure Stack Edge (Spatial Analysis container)
          → IoT Hub → Stream Analytics → Power BI Dashboard
          
Product Images → Custom Vision Model → Alert System (empty shelf detection)

Deployment Considerations:

  1. Hardware: NVIDIA GPU for Spatial Analysis (T4 or better)
  2. Container orchestration: IoT Edge runtime
  3. Network: Low latency for real-time processing
  4. Storage: Local buffer for video retention
  5. Privacy: Configure zone masking and privacy settings

Documentation Links:


Q5.2: Your application needs to extract text from images that may contain handwritten notes, printed documents, and mixed content. Which Azure AI service and API should you use?

Answer: Use Azure AI Vision Read API (v4.0 or later), which handles:

  • Printed text in 100+ languages
  • Handwritten text in multiple languages
  • Mixed content (printed + handwritten)
  • Multi-page documents via PDF support
  • Table structures and layout preservation

Clarifications (exam traps):

  • Don't confuse Read API (modern, unified OCR) with the deprecated OCR API (limited to single page, no handwriting support).
  • Read API is asynchronous—use the operation-location header to poll for results.
  • For form-specific extraction (invoices, receipts, IDs), use Document Intelligence instead—it's optimized for structured documents.
  • Read API v4.0 includes the Read OCR model which has improved accuracy over v3.x.
  • The API returns bounding polygons for each word/line, useful for highlighting or redaction.

API Comparison:

FeatureRead APIOCR API (Deprecated)Document Intelligence
Handwriting✅ Yes❌ No✅ Yes
Multi-page✅ Yes❌ No✅ Yes
Tables✅ Yes❌ No✅ Yes (enhanced)
Async✅ Yes❌ No✅ Yes
Forms/Fields❌ No❌ No✅ Yes
Use CaseGeneral OCRLegacy onlyStructured docs

Implementation Example:

python
from azure.ai.vision.imageanalysis import ImageAnalysisClient

client = ImageAnalysisClient(endpoint, credential)

# Start async read operation
poller = client.begin_analyze_image_from_url(
    image_url,
    features=["READ"]
)

# Poll for completion
result = poller.result()

# Extract text with confidence
for page in result.read_result.pages:
    for line in page.lines:
        if line.confidence > 0.9:
            print(f"Text: {line.text}, Confidence: {line.confidence}")

Documentation Links:


Section 6: Azure OpenAI Advanced Patterns (Expert)

Q6.1: Your application uses Azure OpenAI function calling, but users are experiencing slow response times when multiple tools need to be called sequentially. What optimization strategies should you implement?

Answer: Implement parallel function calling (GPT-4 and GPT-3.5-turbo support it) and tool call batching:

  1. Use the parallel tool calls feature—model returns multiple tool_calls in a single response
  2. Execute independent function calls concurrently using async programming
  3. Implement function call caching for repeated calls with same parameters
  4. Use streaming mode to show incremental progress to users
  5. Consider tool call delegation to a separate fast-executing service layer

Clarifications (exam traps):

  • Parallel function calling only works when calls are independent—if function B depends on function A's output, they must be sequential.
  • Not all models support parallel tool calls—check model capabilities documentation.
  • Streaming doesn't reduce latency but improves perceived performance by showing progress.
  • Implement timeouts for function execution—if a tool call hangs, the entire request is blocked.
  • Use Azure Durable Functions or Logic Apps for complex orchestration workflows.

Implementation Pattern:

python
import asyncio
from openai import AsyncAzureOpenAI

async def execute_tool_calls_parallel(tool_calls):
    tasks = [execute_function(call) for call in tool_calls]
    results = await asyncio.gather(*tasks)
    return results

# Handle response with multiple tool calls
response = await client.chat.completions.create(
    model="gpt-4",
    messages=messages,
    tools=tools,
    parallel_tool_calls=True  # Enable parallel execution
)

if response.choices[0].message.tool_calls:
    # Execute all tool calls in parallel
    results = await execute_tool_calls_parallel(
        response.choices[0].message.tool_calls
    )

Performance Optimization:

  1. Caching: Cache deterministic function results (e.g., database lookups)
  2. Batching: Combine similar API calls when possible
  3. Pre-fetching: Anticipate likely tool calls and prefetch data
  4. Circuit breakers: Fail fast on unavailable services
  5. Monitoring: Track tool call latency to identify bottlenecks

Documentation Links:


Q6.2: You need to implement content filtering that blocks not just Azure's default categories but also company-specific terms and topics. What's the complete filtering strategy?

Answer: Implement multi-layer content filtering:

  1. Enable Azure OpenAI Content Filters (hate, sexual, violence, self-harm) at appropriate severity levels
  2. Create custom blocklists for company-specific restricted terms
  3. Implement application-layer validation for business logic (e.g., competitive mentions, confidential projects)
  4. Use prompt shields to detect jailbreak attempts and prompt injection
  5. Add response validation to catch inadvertent policy violations in model outputs

Clarifications (exam traps):

  • Content filters operate on both input (prompts) and output (completions)—configure separately.
  • Custom blocklists support exact match and fuzzy match with wildcards.
  • Filters can be configured at deployment level or per-request basis (using content_filter_config parameter).
  • Annotated filtered mode returns filtered content with annotations rather than blocking—useful for moderation workflows.
  • Prompt shields are specifically designed for indirect attacks (e.g., malicious content in retrieved documents).

Configuration Strategy:

python
# Deployment-level content filter configuration
content_filter_config = {
    "hate": {"blocking": True, "severity": "medium"},
    "sexual": {"blocking": True, "severity": "medium"},
    "violence": {"blocking": True, "severity": "medium"},
    "self_harm": {"blocking": True, "severity": "medium"}
}

# Custom blocklist for company-specific terms
blocklist = [
    "competitor-name-*",  # Wildcard match
    "Project Confidential",  # Exact match
    "internal-code-name"
]

# Application-layer validation
def validate_content(text, user_context):
    # Check against custom rules
    if user_context.department == "Sales":
        # Additional restrictions for sales team
        if contains_pricing_info(text):
            return False, "Pricing information not allowed"
    
    # Check against blocklist
    for blocked_term in blocklist:
        if matches_pattern(text, blocked_term):
            return False, f"Blocked term detected: {blocked_term}"
    
    return True, None

Best Practices:

  1. Layered approach: Don't rely on a single filter type
  2. User feedback: Collect false positive/negative reports
  3. Regular updates: Review and update blocklists quarterly
  4. Monitoring: Track filter activation rates and patterns
  5. User education: Clear error messages explaining filter triggers

Documentation Links:


Section 7: Cost Optimization and Monitoring (Expert)

Q7.1: Your Azure OpenAI token consumption is exceeding budget. What are the most effective cost optimization strategies without degrading quality?

Answer: Implement comprehensive token optimization:

  1. Prompt engineering: Remove verbose instructions, use examples efficiently
  2. Response length limits: Set max_tokens based on actual needs
  3. Caching strategies: Cache responses for repeated queries
  4. Model selection: Use GPT-3.5-turbo for simpler tasks, GPT-4 only when needed
  5. Request batching: Combine multiple small requests when possible
  6. Token counting: Monitor and budget token usage per user/session
  7. Streaming: Stop generation early if sufficient answer is obtained

Clarifications (exam traps):

  • Both input and output tokens are billed—long system prompts add up quickly across many requests.
  • GPT-4 costs ~20-30x more than GPT-3.5-turbo—use routing logic to select appropriate model.
  • Fine-tuning can reduce prompt size by embedding instructions into the model (but has upfront cost).
  • Semantic caching can deduplicate similar queries—exact match caching only helps identical queries.
  • Provisioned throughput (PTU) may be cheaper for high-volume predictable workloads vs pay-per-token.

Cost Optimization Implementation:

python
import tiktoken

# Token counting for budgeting
def count_tokens(text, model="gpt-4"):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Smart model routing
def select_model(query, complexity_threshold=100):
    complexity_score = assess_query_complexity(query)
    
    if complexity_score < complexity_threshold:
        return "gpt-35-turbo", 0.001  # $1 per 1M tokens
    else:
        return "gpt-4", 0.03  # $30 per 1M tokens

# Response caching with similarity
from sklearn.metrics.pairwise import cosine_similarity

cache = {}

def get_cached_or_call(query_embedding, query_text, similarity_threshold=0.95):
    # Check for similar cached queries
    for cached_emb, response in cache.items():
        similarity = cosine_similarity([query_embedding], [cached_emb])[0][0]
        if similarity > similarity_threshold:
            return response, True  # Cache hit
    
    # No cache hit, make API call
    response = call_openai(query_text)
    cache[query_embedding] = response
    return response, False

Monitoring and Budgeting:

  1. Azure Monitor: Set up alerts for token usage thresholds
  2. Cost Management: Configure budgets and automatic spending limits
  3. Per-user quotas: Implement application-level rate limiting
  4. Token analytics: Track token usage by feature, user type, time of day

Documentation Links:


Q7.2: You need to implement comprehensive observability for an Azure AI solution spanning OpenAI, AI Search, Document Intelligence, and custom APIs. What's the monitoring architecture?

Answer: Implement distributed tracing with correlation across all services:

  1. Application Insights as central telemetry hub
  2. Correlation IDs propagated through all service calls
  3. Custom metrics for business KPIs (e.g., answer relevance, user satisfaction)
  4. Distributed tracing using OpenTelemetry or Application Insights SDK
  5. Log Analytics workspaces for centralized log aggregation
  6. Dashboards and alerts in Azure Monitor for proactive issue detection

Clarifications (exam traps):

  • Each Azure AI service has diagnostic settings—enable them and route to the same Log Analytics workspace.
  • Correlation ID must be application-generated and passed through headers (e.g., x-correlation-id)—Azure doesn't automatically correlate cross-service calls.
  • Don't log sensitive data—use sampling, redaction, or separate audit logs for compliance data.
  • Use dependency tracking in Application Insights to visualize call chains across services.
  • Implement custom dimensions in telemetry to add business context (user type, feature, scenario).

Distributed Tracing Implementation:

python
from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry import trace
from opentelemetry.trace import SpanKind
import uuid

# Configure OpenTelemetry with Application Insights
configure_azure_monitor(connection_string=app_insights_connection_string)
tracer = trace.get_tracer(__name__)

# Generate correlation ID for request
correlation_id = str(uuid.uuid4())

# Trace entire RAG pipeline
with tracer.start_as_current_span(
    "rag_pipeline",
    kind=SpanKind.SERVER,
    attributes={
        "correlation.id": correlation_id,
        "user.id": user_id,
        "query.type": query_type
    }
) as pipeline_span:
    
    # Trace search operation
    with tracer.start_as_current_span("search_documents") as search_span:
        results = search_client.search(
            query,
            headers={"x-correlation-id": correlation_id}
        )
        search_span.set_attribute("search.results.count", len(results))
    
    # Trace OpenAI call
    with tracer.start_as_current_span("openai_completion") as openai_span:
        response = openai_client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            headers={"x-correlation-id": correlation_id}
        )
        openai_span.set_attribute("openai.tokens.total", response.usage.total_tokens)
        openai_span.set_attribute("openai.tokens.prompt", response.usage.prompt_tokens)
        openai_span.set_attribute("openai.tokens.completion", response.usage.completion_tokens)

Key Metrics to Monitor:

  1. Performance: Latency (p50, p95, p99), throughput, error rates
  2. Cost: Token usage, API calls, storage consumption
  3. Quality: User satisfaction scores, answer relevance, citation accuracy
  4. Reliability: Availability, retry rates, throttling events
  5. Security: Authentication failures, content filter activations, anomalous patterns

Alert Configuration:

kusto
// High error rate alert
requests
| where timestamp > ago(5m)
| summarize 
    total = count(),
    errors = countif(success == false)
| extend error_rate = errors * 100.0 / total
| where error_rate > 5  // Alert if error rate > 5%

// High token usage alert
dependencies
| where type == "OpenAI"
| extend tokens = toint(customDimensions.tokens_total)
| summarize total_tokens = sum(tokens) by bin(timestamp, 1h)
| where total_tokens > 1000000  // Alert if > 1M tokens/hour

Documentation Links:


Summary

This expert pack focuses on real architecture choices and common exam traps:

  • Private networking vs "selected networks"
  • Managed identity + Azure AD auth vs API keys
  • Azure OpenAI deployment naming vs model naming
  • Hybrid retrieval, chunking, semantic ranking, and groundedness
  • Document Intelligence vs OCR vs enrichment pipelines
  • CLU vs Text Analytics vs Speech features
  • Safety controls (filters + blocklists) plus tool allowlists and prompt-injection defenses
  • Multilingual search and RAG patterns
  • Structured + unstructured data integration
  • Compliance and audit logging across services
  • Document Intelligence model selection (prebuilt vs custom)
  • Multi-modal vision applications (Spatial Analysis, Custom Vision)
  • Read API vs OCR API vs Document Intelligence
  • Azure OpenAI function calling optimization
  • Multi-layer content filtering strategies
  • Cost optimization techniques (model routing, caching, token management)
  • Comprehensive observability with distributed tracing and correlation
  • Custom skills and knowledge store for AI Search
  • Container deployment and offline scenarios
  • Semantic ranking and vector search hybrid approaches
  • Azure Machine Learning model integration patterns
  • Advanced troubleshooting and performance optimization
  • Multi-region deployment and disaster recovery
  • Batch vs real-time processing patterns
  • Advanced prompt engineering and few-shot learning
  • Model fine-tuning and customization strategies
  • Cross-service orchestration patterns

Section 8: Knowledge Mining Advanced Patterns (Expert)

Q8.1: Your indexing pipeline needs to call a proprietary ML model to extract custom entities that aren't available in built-in skills. What's the implementation approach?

Answer: Implement a custom web API skill that:

  1. Hosts your ML model as an Azure Function or App Service endpoint
  2. Implements the custom skill interface (receives/returns JSON in expected format)
  3. Integrates into your skillset alongside built-in skills
  4. Returns enriched data that gets mapped to index fields

Clarifications (exam traps):

  • Custom skills must implement a specific REST contract—they receive a values array and return a matching values array with enrichments.
  • The endpoint must respond within 30 seconds (default timeout) or configure longer timeout in skillset.
  • Use managed identity for the indexer to authenticate to your custom skill endpoint if it's secured.
  • Custom skills are called synchronously during indexing—ensure they're optimized for performance.
  • For expensive operations, consider caching results in the custom skill or using incremental enrichment.

Implementation Pattern:

python
# Azure Function custom skill
import json
import azure.functions as func

def main(req: func.HttpRequest) -> func.HttpResponse:
    try:
        req_body = req.get_json()
        values = req_body.get('values', [])
        
        results = []
        for record in values:
            # Extract input data
            text = record['data'].get('text', '')
            
            # Call your custom ML model
            entities = extract_custom_entities(text)
            
            # Format response
            results.append({
                "recordId": record['recordId'],
                "data": {
                    "customEntities": entities
                },
                "errors": None,
                "warnings": None
            })
        
        return func.HttpResponse(
            json.dumps({"values": results}),
            mimetype="application/json"
        )
    except Exception as e:
        return func.HttpResponse(
            json.dumps({"error": str(e)}),
            status_code=500
        )

def extract_custom_entities(text):
    # Your custom ML logic here
    # Return list of entities
    return [
        {"entity": "CustomEntity1", "confidence": 0.95},
        {"entity": "CustomEntity2", "confidence": 0.87}
    ]

Skillset Configuration:

json
{
  "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
  "name": "custom-entity-extraction",
  "description": "Custom entity extraction using proprietary ML model",
  "uri": "https://my-function-app.azurewebsites.net/api/CustomEntityExtractor",
  "httpMethod": "POST",
  "timeout": "PT90S",
  "batchSize": 10,
  "degreeOfParallelism": 5,
  "context": "/document",
  "inputs": [
    {
      "name": "text",
      "source": "/document/content"
    }
  ],
  "outputs": [
    {
      "name": "customEntities",
      "targetName": "customEntities"
    }
  ],
  "httpHeaders": {
    "x-custom-header": "value"
  }
}

Performance Considerations:

  1. Batch processing: Set batchSize to process multiple documents per call
  2. Parallelism: Use degreeOfParallelism for concurrent requests
  3. Caching: Implement result caching for repeated content
  4. Timeout: Adjust based on model inference time
  5. Retry logic: Custom skills should handle transient failures

Documentation Links:


Q8.2: You need to persist enriched data from your indexing pipeline to enable data science analysis and reporting beyond search. What Azure AI Search feature should you use?

Answer: Configure a knowledge store in your skillset to project enriched data to:

  1. Azure Table Storage for structured tabular data
  2. Azure Blob Storage for JSON documents and normalized objects
  3. Azure Blob Storage (file projection) for binary files like images

Clarifications (exam traps):

  • Knowledge store runs as part of indexer execution—enriched data is saved even if index population fails.
  • Projections define how enriched data is shaped and stored—you can create multiple projections from the same enrichment tree.
  • Object projections to Blob create one JSON file per document; table projections create normalized relational tables.
  • Unlike the search index, knowledge store is write-only from the indexer perspective—it's meant for external analytics tools.
  • Shaper skill is often needed to restructure enriched data before projection.
  • Knowledge store data persists independently—deleting the index doesn't delete knowledge store data.

Knowledge Store Configuration:

json
{
  "name": "my-skillset",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
      "name": "shape-for-knowledge-store",
      "context": "/document",
      "inputs": [
        {
          "name": "id",
          "source": "/document/id"
        },
        {
          "name": "content",
          "source": "/document/content"
        },
        {
          "name": "keyPhrases",
          "source": "/document/keyPhrases"
        },
        {
          "name": "entities",
          "source": "/document/entities"
        },
        {
          "name": "sentiment",
          "source": "/document/sentiment"
        }
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "shapedDocument"
        }
      ]
    }
  ],
  "knowledgeStore": {
    "storageConnectionString": "DefaultEndpointsProtocol=https;AccountName=...",
    "projections": [
      {
        "tables": [
          {
            "tableName": "Documents",
            "generatedKeyName": "DocumentId",
            "source": "/document/shapedDocument"
          },
          {
            "tableName": "KeyPhrases",
            "generatedKeyName": "KeyPhraseId",
            "source": "/document/shapedDocument/keyPhrases/*"
          }
        ],
        "objects": [
          {
            "storageContainer": "enriched-docs",
            "source": "/document/shapedDocument"
          }
        ]
      }
    ]
  }
}

Use Cases:

  1. Power BI reporting on enriched metadata (entities, sentiment, key phrases)
  2. Data science analysis of document corpus
  3. Compliance archiving of enriched content
  4. Training data for custom ML models
  5. Downstream ETL pipelines feeding data warehouses

Projection Types:

TypeStorageUse CaseStructure
TableTable StorageRelational analytics, Power BINormalized rows
ObjectBlob (JSON)Full document analysisOne JSON per doc
FileBlob (Binary)Image extraction, attachmentsBinary files

Documentation Links:


Q8.3: Your AI Search indexer is failing on certain documents but you can't identify why. What debugging approach should you use?

Answer: Use debug sessions in Azure Portal to:

  1. Test skillset execution on specific documents
  2. Inspect enrichment tree at each skill step
  3. Modify skill configurations and re-run
  4. Identify which skill is causing failures
  5. Validate field mappings and output

Clarifications (exam traps):

  • Debug sessions create a snapshot of indexer state in Blob Storage—you need to configure a storage connection.
  • Sessions are temporary (expire after several hours) and consume storage—not meant for production debugging.
  • You can edit skills inline during a debug session and immediately see results without re-running the full indexer.
  • Debug sessions show the enrichment document structure—essential for understanding the /document context path.
  • Use debug sessions to validate custom skills before deploying to production.
  • Skillset cache must be enabled for incremental enrichment testing.

Debug Session Workflow:

plaintext
1. Azure Portal → AI Search → Indexers → Select failing indexer
2. Click "Debug session" → Select storage account
3. Choose a problematic document from the index
4. View enrichment tree visualization:
   - /document/content (original text)
   - /document/normalized_images/*/text (OCR results)
   - /document/keyPhrases (extracted phrases)
   - /document/entities (recognized entities)
   - etc.
5. Edit skill parameters inline:
   - Change entity categories
   - Adjust confidence thresholds
   - Modify language settings
6. Re-execute individual skills
7. Validate output field mappings
8. Save validated changes to skillset

Common Issues Identified:

  1. Missing context paths: Skill output doesn't match input expectations
  2. Null reference errors: Optional fields not handled gracefully
  3. Language mismatches: Document language doesn't match skill configuration
  4. Timeout issues: Skills taking too long on large documents
  5. Field mapping errors: Enriched data not mapped to correct index fields

Alternative Debugging Approaches:

  • Indexer execution history: Review error messages and warnings
  • Enable indexer logging: Send detailed logs to Application Insights
  • Test skills independently: Call REST APIs directly with sample data
  • Use postman/curl: Manually invoke custom skills to validate behavior

Best Practices:

  1. Start with simple documents to validate skillset
  2. Use debug sessions early in development
  3. Test each skill type (OCR, entity recognition, etc.) separately
  4. Validate field mappings before full indexing
  5. Monitor enrichment costs during debug sessions

Documentation Links:


Section 9: Container Deployment and Edge Scenarios (Expert)

Q9.1: Your solution requires running Azure AI services in an on-premises data center with intermittent internet connectivity. What deployment pattern should you use?

Answer: Deploy containerized AI services with:

  1. Pull Docker containers from Microsoft Container Registry (MCR)
  2. Deploy to on-premises Kubernetes or Docker environment
  3. Configure disconnected mode with commitment-based billing
  4. Implement periodic connection to Azure for metering and compliance
  5. Use Azure IoT Edge (optional) for orchestration and updates

Clarifications (exam traps):

  • Not all AI services support containers—check service-specific documentation (e.g., Speech, Vision, Language are available; Azure OpenAI is NOT containerized).
  • Containers require Azure-connected billing—you need to connect at minimum every 14-30 days for metering.
  • Disconnected containers use commitment-based pricing (not pay-per-transaction).
  • Container images must be pulled from Microsoft Container Registry—you can't customize or redistribute.
  • CPU/GPU requirements vary by service—Speech/Vision may need GPUs for acceptable performance.
  • Containers still communicate with Azure for billing telemetry—ensure firewall rules allow this.

Container Deployment Pattern:

bash
# Pull container from MCR
docker pull mcr.microsoft.com/azure-cognitive-services/textanalytics/language:latest

# Run container with billing configuration
docker run --rm -it \
  -p 5000:5000 \
  --memory 8g \
  --cpus 4 \
  mcr.microsoft.com/azure-cognitive-services/textanalytics/language:latest \
  Eula=accept \
  Billing=https://<resource-name>.cognitiveservices.azure.com/ \
  ApiKey=<your-api-key>

Kubernetes Deployment:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: language-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: language-service
  template:
    metadata:
      labels:
        app: language-service
    spec:
      containers:
      - name: language
        image: mcr.microsoft.com/azure-cognitive-services/textanalytics/language:latest
        ports:
        - containerPort: 5000
        env:
        - name: Eula
          value: "accept"
        - name: Billing
          valueFrom:
            secretKeyRef:
              name: cognitive-services-secret
              key: billing-endpoint
        - name: ApiKey
          valueFrom:
            secretKeyRef:
              name: cognitive-services-secret
              key: api-key
        resources:
          requests:
            memory: "8Gi"
            cpu: "4"
          limits:
            memory: "12Gi"
            cpu: "6"

Use Cases:

  1. Data sovereignty: Process sensitive data on-premises
  2. Low latency: Eliminate round-trip to cloud
  3. Offline scenarios: Manufacturing floors, remote locations
  4. Regulatory compliance: Healthcare, financial services
  5. Edge computing: IoT scenarios with local processing

Supported Services (Containers):

  • ✅ Computer Vision (Read OCR, Analyze, Spatial Analysis)
  • ✅ Face API
  • ✅ Language Service (sentiment, key phrases, NER, language detection)
  • ✅ Speech (speech-to-text, text-to-speech, translation)
  • ✅ Translator
  • ❌ Azure OpenAI (not available as container)
  • ❌ Custom Vision (training - only prediction available)

Documentation Links:


Q9.2: You're deploying Speech-to-Text containers on Azure IoT Edge devices in retail stores. How should you handle model updates and configuration management?

Answer: Implement centralized deployment using:

  1. Azure IoT Hub for device management and module deployment
  2. Container Registry (ACR) for hosting custom module images
  3. Module twin properties for runtime configuration
  4. Deployment manifests defining container versions and settings
  5. Automatic rollback on deployment failures

Clarifications (exam traps):

  • IoT Edge modules are Docker containers—you define them in deployment manifests.
  • Module twins enable cloud-to-device configuration without redeploying containers.
  • Speech containers need continuous billing connection to Azure—ensure IoT Edge devices have internet (even if intermittent).
  • GPU support on IoT Edge requires specific runtime configuration—not automatic.
  • Use layered deployments to apply different configurations to device groups (e.g., by store region).
  • edgeHub handles offline buffering if devices lose connectivity temporarily.

Deployment Manifest Example:

json
{
  "modulesContent": {
    "$edgeAgent": {
      "properties.desired": {
        "schemaVersion": "1.1",
        "runtime": {
          "type": "docker",
          "settings": {
            "minDockerVersion": "v1.25",
            "registryCredentials": {
              "myregistry": {
                "username": "$CONTAINER_REGISTRY_USERNAME",
                "password": "$CONTAINER_REGISTRY_PASSWORD",
                "address": "myregistry.azurecr.io"
              }
            }
          }
        },
        "systemModules": {
          "edgeAgent": {
            "type": "docker",
            "settings": {
              "image": "mcr.microsoft.com/azureiotedge-agent:1.4",
              "createOptions": {}
            }
          },
          "edgeHub": {
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "mcr.microsoft.com/azureiotedge-hub:1.4",
              "createOptions": {
                "HostConfig": {
                  "PortBindings": {
                    "443/tcp": [{"HostPort": "443"}],
                    "5671/tcp": [{"HostPort": "5671"}],
                    "8883/tcp": [{"HostPort": "8883"}]
                  }
                }
              }
            }
          }
        },
        "modules": {
          "speechToText": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:latest",
              "createOptions": {
                "HostConfig": {
                  "PortBindings": {
                    "5000/tcp": [{"HostPort": "5000"}]
                  },
                  "Memory": 8589934592
                },
                "Env": [
                  "Eula=accept",
                  "Billing=$SPEECH_BILLING_ENDPOINT",
                  "ApiKey=$SPEECH_API_KEY"
                ]
              }
            }
          }
        }
      }
    },
    "$edgeHub": {
      "properties.desired": {
        "routes": {
          "speechToCloud": "FROM /messages/modules/speechToText/outputs/* INTO $upstream"
        },
        "storeAndForwardConfiguration": {
          "timeToLiveSecs": 7200
        }
      }
    },
    "speechToText": {
      "properties.desired": {
        "modelVersion": "2024-01-15",
        "language": "en-US",
        "enableLogging": true,
        "profanityOption": "Masked"
      }
    }
  }
}

Configuration Management:

python
# Update module twin from cloud
from azure.iot.hub import IoTHubRegistryManager

registry_manager = IoTHubRegistryManager(connection_string)

# Update module twin desired properties
twin_patch = {
    "properties": {
        "desired": {
            "modelVersion": "2024-02-01",
            "language": "en-US"
        }
    }
}

registry_manager.update_module_twin(
    device_id="store-001",
    module_id="speechToText",
    twin_patch=twin_patch,
    etag="*"
)

Best Practices:

  1. Version pinning: Use specific image tags, not latest
  2. Gradual rollout: Deploy to test devices first
  3. Health monitoring: Implement module health checks
  4. Offline tolerance: Configure store-and-forward for network outages
  5. Secure secrets: Use Azure Key Vault integration for credentials

Documentation Links:


Section 10: Hybrid Search and Vector Optimization (Expert)

Answer: Implement hybrid search with RRF (Reciprocal Rank Fusion) combining:

  1. Vector search for semantic similarity using embeddings
  2. Full-text search for keyword matching and exact phrases
  3. Semantic ranking as a re-ranker on top of hybrid results
  4. RRF to merge vector and keyword results with balanced scoring

Clarifications (exam traps):

  • Hybrid search ≠ semantic search—hybrid combines vector + keyword; semantic is a re-ranking feature.
  • RRF is the default fusion algorithm—it handles score normalization between vector and keyword results.
  • Vector search alone can miss exact matches (e.g., product codes, IDs)—always combine with keyword search for production.
  • Semantic ranking adds ~50-100ms latency but significantly improves relevance for natural language queries.
  • Set appropriate k (number of neighbors) for vector search—too high wastes compute, too low misses results.
  • Use HNSW algorithm for vector indexing (not flat/IVF) for best query performance.

Hybrid Search Implementation:

python
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

# Generate query embedding
query_vector = get_embedding(user_query)  # Call your embedding model

# Create vectorized query
vector_query = VectorizedQuery(
    vector=query_vector,
    k_nearest_neighbors=50,  # Top 50 vector matches
    fields="contentVector"    # Vector field name
)

# Execute hybrid search
results = search_client.search(
    search_text=user_query,      # Keyword search
    vector_queries=[vector_query],  # Vector search
    select=["id", "content", "title"],
    query_type="semantic",       # Enable semantic ranking
    semantic_configuration_name="my-semantic-config",
    top=10                       # Final results after RRF + semantic ranking
)

for result in results:
    print(f"Score: {result['@search.score']}")
    print(f"Content: {result['content']}")
    print(f"Reranker Score: {result['@search.reranker_score']}")

Index Configuration for Hybrid Search:

json
{
  "name": "hybrid-index",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "key": true
    },
    {
      "name": "content",
      "type": "Edm.String",
      "searchable": true,
      "analyzer": "en.microsoft"
    },
    {
      "name": "contentVector",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "dimensions": 1536,
      "vectorSearchProfile": "my-vector-profile"
    }
  ],
  "vectorSearch": {
    "algorithms": [
      {
        "name": "hnsw-algorithm",
        "kind": "hnsw",
        "hnswParameters": {
          "metric": "cosine",
          "m": 4,
          "efConstruction": 400,
          "efSearch": 500
        }
      }
    ],
    "profiles": [
      {
        "name": "my-vector-profile",
        "algorithm": "hnsw-algorithm"
      }
    ]
  },
  "semantic": {
    "configurations": [
      {
        "name": "my-semantic-config",
        "prioritizedFields": {
          "titleField": {
            "fieldName": "title"
          },
          "prioritizedContentFields": [
            {"fieldName": "content"}
          ],
          "prioritizedKeywordsFields": [
            {"fieldName": "keyPhrases"}
          ]
        }
      }
    ]
  }
}

Search Strategy Comparison:

StrategyPrecisionRecallLatencyUse Case
Keyword onlyHigh for exactLowLow (~10ms)Exact match, IDs
Vector onlyMediumHighMedium (~30ms)Semantic, paraphrase
Hybrid (RRF)HighHighMedium (~40ms)Balanced, production
Hybrid + SemanticVery HighHighHigh (~100ms)Best relevance

Performance Optimization:

  1. Cache embeddings: Don't regenerate for repeated queries
  2. Tune k parameter: Start with 50, adjust based on result quality
  3. HNSW parameters: Higher efSearch = better recall but slower
  4. Filter first: Apply filters before vector search when possible
  5. Batch indexing: Generate embeddings in batches during indexing

Documentation Links:


Q10.2: You're implementing vector search and notice queries are slow despite using HNSW algorithm. What tuning parameters should you adjust?

Answer: Optimize HNSW parameters based on your precision/performance trade-off:

  1. efSearch (query-time): Lower for speed (100-200), higher for recall (500-1000)
  2. m (index-time): Connections per node—higher = better recall but larger index
  3. efConstruction (index-time): Higher = better index quality but slower building
  4. Dimensions: Reduce embedding dimensions if possible (384 vs 1536)
  5. Filtering strategy: Apply filters after vector search (post-filtering) for better performance

Clarifications (exam traps):

  • efSearch is query-time parameter—you can tune it without reindexing.
  • m and efConstruction require re-indexing when changed—test carefully before production.
  • Higher efSearch improves recall but increases query latency linearly.
  • Filtering before vector search can hurt recall—prefer post-filtering or hybrid approaches.
  • Cosine vs Euclidean metric choice depends on your embedding model—use cosine for normalized embeddings.
  • Azure AI Search automatically normalizes vectors for cosine similarity.

Parameter Tuning Guide:

json
{
  "vectorSearch": {
    "algorithms": [
      {
        "name": "optimized-hnsw",
        "kind": "hnsw",
        "hnswParameters": {
          "metric": "cosine",
          "m": 4,                // Default 4, increase to 8-16 for better recall
          "efConstruction": 400,  // Default 400, increase to 800-1000 for better quality
          "efSearch": 500        // Default 500, adjust 100-1000 for speed/recall balance
        }
      }
    ]
  }
}

Performance Testing Matrix:

efSearchQuery TimeRecall@10Recommended For
10020ms85%Speed-critical apps
20030ms90%Balanced production
50050ms95%High precision needs
100090ms98%Maximum recall

Optimization Strategies:

python
# Strategy 1: Adaptive efSearch based on query type
def adaptive_search(query, query_type):
    if query_type == "exact_match":
        # Use keyword search only
        ef_search = 100
    elif query_type == "exploratory":
        # Higher recall for broad searches
        ef_search = 800
    else:
        # Balanced for typical queries
        ef_search = 500
    
    vector_query = VectorizedQuery(
        vector=get_embedding(query),
        k_nearest_neighbors=50,
        fields="contentVector",
        exhaustive=False  # Use HNSW, not brute force
    )
    
    return search_client.search(
        search_text=query,
        vector_queries=[vector_query],
        top=10
    )

# Strategy 2: Dimension reduction
# Use smaller embedding models when possible
# text-embedding-ada-002: 1536 dimensions
# text-embedding-3-small: 512 dimensions (faster, still good quality)

# Strategy 3: Pre-filtering optimization
# Instead of filtering in search query, create separate indexes
# for major categories to reduce search space

Index Size vs Performance:

Vector DimensionsIndex Size (1M docs)Query TimeRecommendation
384~1.5 GB15-25msBest for speed
768~3 GB25-40msGood balance
1536~6 GB40-70msMaximum quality

Monitoring and Tuning:

  1. Track query latency by percentile (p50, p95, p99)
  2. Measure recall@k using ground truth dataset
  3. A/B test parameter changes on production traffic
  4. Monitor index size growth over time
  5. Benchmark different embedding models

Documentation Links:


Section 11: Advanced Model Integration and Fine-Tuning (Expert)

Q11.1: Your organization wants to fine-tune GPT-3.5-turbo on proprietary customer service conversations. What's the end-to-end workflow in Azure OpenAI?

Answer: Follow the supervised fine-tuning workflow:

  1. Prepare training data in JSONL format (prompt-completion pairs)
  2. Validate data using Azure OpenAI validation tools
  3. Upload training file to Azure OpenAI
  4. Create fine-tuning job specifying model and hyperparameters
  5. Monitor training progress and validation loss
  6. Deploy fine-tuned model as a new deployment
  7. Evaluate and iterate on production traffic

Clarifications (exam traps):

  • Only specific models support fine-tuning—currently GPT-3.5-turbo, babbage-002, davinci-002 (NOT GPT-4 in most regions).
  • Training data needs minimum 10 examples, recommended 50-100 for meaningful improvements.
  • JSONL format is strict—each line must be valid JSON with messages array (not prompt/completion for chat models).
  • Fine-tuning creates a new model deployment—you're charged for hosting + inference separately.
  • Validation split is automatic (10-20%)—or you can provide a separate validation file.
  • Fine-tuned models don't support streaming initially—check current limitations.
  • Training can take minutes to hours depending on dataset size.

Training Data Format:

jsonl
{"messages": [{"role": "system", "content": "You are a customer service agent for Contoso Bank."}, {"role": "user", "content": "How do I reset my password?"}, {"role": "assistant", "content": "To reset your password: 1. Go to contoso.com/login 2. Click 'Forgot Password' 3. Enter your email 4. Follow the link sent to your email."}]}
{"messages": [{"role": "system", "content": "You are a customer service agent for Contoso Bank."}, {"role": "user", "content": "What are your business hours?"}, {"role": "assistant", "content": "Our customer service is available Monday-Friday 8 AM to 6 PM EST, and Saturday 9 AM to 3 PM EST."}]}
{"messages": [{"role": "system", "content": "You are a customer service agent for Contoso Bank."}, {"role": "user", "content": "How do I dispute a charge?"}, {"role": "assistant", "content": "To dispute a charge: 1. Log into your account 2. Navigate to Transactions 3. Select the transaction 4. Click 'Dispute' and provide details. Our fraud team will review within 48 hours."}]}

Fine-Tuning Workflow:

python
from openai import AzureOpenAI

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    api_version="2024-02-15-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# 1. Upload training file
training_file = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)

# 2. Create fine-tuning job
fine_tune_job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-35-turbo",  # Base model
    hyperparameters={
        "n_epochs": 3,  # Number of training epochs
        "batch_size": 8,
        "learning_rate_multiplier": 0.1
    },
    suffix="contoso-cs-v1"  # Custom model name suffix
)

print(f"Fine-tuning job created: {fine_tune_job.id}")

# 3. Monitor progress
import time

while True:
    job_status = client.fine_tuning.jobs.retrieve(fine_tune_job.id)
    print(f"Status: {job_status.status}")
    
    if job_status.status in ["succeeded", "failed", "cancelled"]:
        break
    
    time.sleep(60)

# 4. Get fine-tuned model ID
fine_tuned_model = job_status.fine_tuned_model
print(f"Fine-tuned model: {fine_tuned_model}")

# 5. Create deployment (via Azure Portal or API)
# Deploy the fine_tuned_model ID as a new deployment

# 6. Use fine-tuned model
response = client.chat.completions.create(
    model="contoso-cs-v1-deployment",  # Your deployment name
    messages=[
        {"role": "system", "content": "You are a customer service agent for Contoso Bank."},
        {"role": "user", "content": "How do I transfer money internationally?"}
    ]
)

print(response.choices[0].message.content)

Hyperparameter Tuning:

ParameterDefaultDescriptionWhen to Adjust
n_epochs3Training iterationsIncrease if underfitting (to 5-10)
batch_sizeAutoExamples per batchIncrease for faster training
learning_rate_multiplier0.1Learning rate scaleDecrease if loss unstable

Data Quality Guidelines:

  1. Consistency: Use the same system message across all examples
  2. Diversity: Cover various user intents and phrasings
  3. Length: Keep responses concise and similar in style
  4. Quality over quantity: 50 high-quality examples > 500 poor ones
  5. Validation: Reserve 10-20% for validation set

Cost Considerations:

  • Training: Charged per 1K tokens in training data × epochs
  • Hosting: Monthly fee for fine-tuned model deployment
  • Inference: Same as base model pricing

Use Cases:

  1. Domain-specific responses (customer service, legal, medical)
  2. Tone and style adaptation (formal, casual, technical)
  3. Structured output (JSON, specific formats)
  4. Language or terminology (company-specific jargon)
  5. Reducing prompting overhead (embed instructions in model)

Documentation Links:


Summary (Updated)

This expert pack now covers:

  • Private networking vs "selected networks"
  • Managed identity + Azure AD auth vs API keys
  • Azure OpenAI deployment naming vs model naming
  • Hybrid retrieval, chunking, semantic ranking, and groundedness
  • Document Intelligence vs OCR vs enrichment pipelines
  • CLU vs Text Analytics vs Speech features
  • Safety controls (filters + blocklists) plus tool allowlists and prompt-injection defenses
  • Multilingual search and RAG patterns
  • Structured + unstructured data integration
  • Compliance and audit logging across services
  • Document Intelligence model selection (prebuilt vs custom)
  • Multi-modal vision applications (Spatial Analysis, Custom Vision)
  • Read API vs OCR API vs Document Intelligence
  • Azure OpenAI function calling optimization
  • Multi-layer content filtering strategies
  • Cost optimization techniques (model routing, caching, token management)
  • Comprehensive observability with distributed tracing and correlation
  • Custom skills and knowledge store for advanced knowledge mining
  • Debug sessions for skillset troubleshooting
  • Container deployment and disconnected scenarios
  • IoT Edge integration for edge AI
  • Hybrid search with RRF and semantic ranking
  • HNSW parameter tuning for vector search optimization
  • Fine-tuning workflows for GPT-3.5-turbo customization

Released under the MIT License.