LLMRAGSiliconKnowledge Management
Using LLMs to Query a Decade of Silicon Failure Reports
We pointed a retrieval-augmented LLM at ten years of root-cause analysis reports and turned tribal knowledge into a searchable, queryable system.
••2 min read
Every semiconductor company has the same problem: tribal knowledge trapped in documents. Thousands of root-cause analysis reports, failure investigation summaries, and characterization results sitting in shared drives and wikis. When a new failure mode appears, engineers spend hours searching through old reports hoping to find something similar.
I built a system to fix this using Large Language Models and retrieval-augmented generation.
The Corpus
We had roughly 4,000 post-silicon investigation reports spanning ten years. Each report described a failure mode, the investigation steps, the root cause, and the corrective action. Some were well-structured; many were free-form notes with embedded screenshots and data tables.
Step one was ingestion. I wrote Python parsers that extracted structured text from each report format, preserved the metadata (product, date, author, failure category), and chunked the content at section boundaries rather than arbitrary token counts. Section-aware chunking was critical because splitting a report mid-analysis destroys the logical flow that makes the root cause understandable.
The Retrieval Layer
Pure vector search was not enough. When an engineer asks "What caused the timing margin failure on Product X at 85C?", they need both semantic understanding (timing margin concepts) and exact matching (specific product name, temperature condition). We implemented hybrid retrieval: dense embeddings for conceptual similarity combined with keyword search for specific identifiers.
A cross-encoder reranker on top of the initial retrieval results dramatically improved precision. Without it, the system would surface tangentially related reports. With it, the top 5 results were almost always directly relevant.
How Engineers Use It
The interface is simple. Engineers type natural-language questions into a web UI. The system retrieves the most relevant report sections, synthesizes an answer, and cites the specific reports it drew from. Engineers can click through to read the full original reports for additional context.
Common queries look like:
- "What are the known root causes for yield drops on high-speed SerDes products?"
- "Has this leakage signature been seen before across any product family?"
- "What was the corrective action when Fab Y had oxide thickness variation in 2023?"
What used to take hours of manual searching now takes seconds. More importantly, it surfaces knowledge from engineers who may have left the company years ago. The tribal knowledge is no longer tribal.
We also use the LLM to auto-generate first-draft investigation reports from structured test data, cutting report preparation time by roughly 60%. Engineers review and edit rather than write from scratch.
← back to blogUpdated Feb 28, 2026