Advanced RAG + Azure AI Search (Expert)

Section R1: Retrieval Quality (Hybrid, Rerank, Chunking)

QR1.1: For enterprise docs, why is hybrid retrieval usually the best baseline?

Answer: It combines keyword precision (IDs, exact terms) with vector semantic recall (paraphrases).

Clarifications (exam traps):

Vector-only often fails on exact identifiers.
Keyword-only misses paraphrases and conceptual matches.

QR1.2: Your RAG answers are weak because retrieval returns broad, multi-topic chunks. What’s the first fix?

Answer: Improve chunking (smaller, semantically coherent chunks + overlap) and store strong metadata.

Clarifications (exam traps):

Bigger context windows don’t solve bad retrieval; they amplify noise.

QR1.3: When does a reranker/semantic ranker help the most?

Answer: When initial retrieval has decent recall but poor ordering; reranking improves top-k relevance.

Clarifications (exam traps):

Don’t rerank garbage; ensure basic retrieval is correct first.

QR1.4: You need “official docs only” for some questions and “all docs” for others. What’s the right mechanism?

Answer: Use metadata fields (e.g., sourceType) and apply filters based on the scenario.

Clarifications (exam traps):

This is not a prompt problem; it’s a retrieval policy problem.

Section R2: Index Design (Vectors, Analyzers, Synonyms)

QR2.1: You changed embedding model dimensions after indexing. What must you do?

Answer: Rebuild the index (re-embed + re-index) with the new dimensions.

Clarifications (exam traps):

Vector field dimensions are part of the schema contract.

QR2.2: Users search “SSO” but docs say “single sign-on.” What improves keyword retrieval with minimal complexity?

Answer: A synonym map on relevant fields.

Clarifications (exam traps):

Synonyms complement hybrid retrieval; they don’t replace embeddings.

QR2.3: Why use separate fields for exact-match identifiers (like product IDs)?

Answer: It prevents analyzers/tokenization from breaking exact matches and enables targeted boosts/filters.

Clarifications (exam traps):

A single “content” field is often too blunt for enterprise search.

Section R3: Grounding, Citations, and Refusal Logic

QR3.1: What’s the strongest way to reduce hallucinations in RAG?

Answer: Require answers to include citations, and refuse when evidence is insufficient.

Clarifications (exam traps):

Temperature changes don’t enforce grounding.

QR3.2: Your model cites sources but sometimes cites irrelevant chunks. What’s the fix?

Answer: Tighten retrieval (better chunking, lower top-k, rerank) and validate citations against retrieved chunk IDs.

Clarifications (exam traps):

“Cite something” isn’t enough; enforce citation validity.

QR3.3: You need document-level access control (ACLs) for RAG. Where must it be enforced?

Answer: Enforce server-side via search filters (ACL metadata) and storage security.

Clarifications (exam traps):

Never rely on the model to not reveal restricted content.

Section R4: Ingestion + Enrichment

QR4.1: You want to ingest PDFs from Blob and enrich them into Search automatically. What Azure Search concepts enable this?

Answer: Data sources + indexers + skillsets.

Clarifications (exam traps):

Indexers ingest; skillsets enrich.

QR4.2: Scanned PDFs need OCR at ingestion time. What’s the right enrichment approach?

Answer: Add OCR in the skillset (or preprocess with Vision/Document Intelligence and index results).

Clarifications (exam traps):

If you need structured fields (invoices/receipts), Document Intelligence is often the better preprocessor.

QR4.3: Your index is bloated with duplicate content across versions of documents. What’s the practical strategy?

Answer: Track document IDs/versions in metadata and enforce upserts (replace) rather than append.

Clarifications (exam traps):

Dedup is an ingestion design decision, not an LLM prompt fix.

Advanced RAG + Azure AI Search (Expert) ​

Section R1: Retrieval Quality (Hybrid, Rerank, Chunking) ​

QR1.1: For enterprise docs, why is hybrid retrieval usually the best baseline? ​

QR1.2: Your RAG answers are weak because retrieval returns broad, multi-topic chunks. What’s the first fix? ​

QR1.3: When does a reranker/semantic ranker help the most? ​

QR1.4: You need “official docs only” for some questions and “all docs” for others. What’s the right mechanism? ​

Section R2: Index Design (Vectors, Analyzers, Synonyms) ​

QR2.1: You changed embedding model dimensions after indexing. What must you do? ​

QR2.2: Users search “SSO” but docs say “single sign-on.” What improves keyword retrieval with minimal complexity? ​

QR2.3: Why use separate fields for exact-match identifiers (like product IDs)? ​

Section R3: Grounding, Citations, and Refusal Logic ​

QR3.1: What’s the strongest way to reduce hallucinations in RAG? ​

QR3.2: Your model cites sources but sometimes cites irrelevant chunks. What’s the fix? ​

QR3.3: You need document-level access control (ACLs) for RAG. Where must it be enforced? ​

Section R4: Ingestion + Enrichment ​

QR4.1: You want to ingest PDFs from Blob and enrich them into Search automatically. What Azure Search concepts enable this? ​

QR4.2: Scanned PDFs need OCR at ingestion time. What’s the right enrichment approach? ​

QR4.3: Your index is bloated with duplicate content across versions of documents. What’s the practical strategy? ​

Advanced RAG + Azure AI Search (Expert)

Section R1: Retrieval Quality (Hybrid, Rerank, Chunking)

QR1.1: For enterprise docs, why is hybrid retrieval usually the best baseline?

QR1.2: Your RAG answers are weak because retrieval returns broad, multi-topic chunks. What’s the first fix?

QR1.3: When does a reranker/semantic ranker help the most?

QR1.4: You need “official docs only” for some questions and “all docs” for others. What’s the right mechanism?

Section R2: Index Design (Vectors, Analyzers, Synonyms)

QR2.1: You changed embedding model dimensions after indexing. What must you do?

QR2.2: Users search “SSO” but docs say “single sign-on.” What improves keyword retrieval with minimal complexity?

QR2.3: Why use separate fields for exact-match identifiers (like product IDs)?

Section R3: Grounding, Citations, and Refusal Logic

QR3.1: What’s the strongest way to reduce hallucinations in RAG?

QR3.2: Your model cites sources but sometimes cites irrelevant chunks. What’s the fix?

QR3.3: You need document-level access control (ACLs) for RAG. Where must it be enforced?

Section R4: Ingestion + Enrichment

QR4.1: You want to ingest PDFs from Blob and enrich them into Search automatically. What Azure Search concepts enable this?

QR4.2: Scanned PDFs need OCR at ingestion time. What’s the right enrichment approach?

QR4.3: Your index is bloated with duplicate content across versions of documents. What’s the practical strategy?