AI / LLM Research
RAG-Based LLM Assistant for Operational Users
An offline-capable Retrieval-Augmented Generation system for domain-specific Q&A and decision support — engineered for reliability, data isolation, and bandwidth-constrained deployment where cloud LLMs aren't an option.
Overview
A Retrieval-Augmented Generation assistant that lets operational users query a curated corpus of domain documents in natural language. The entire stack — embeddings, vector store, and LLM inference — runs locally, with no external API calls, so it operates in air-gapped or low-bandwidth environments.
The Problem
Operational users often need to consult large volumes of technical documentation, SOPs, and manuals. Cloud-based LLMs are unreliable in constrained environments and unsuitable for sensitive documents. General-purpose LLMs also hallucinate when the answer isn't in their training data. A locally-hosted RAG system solves both problems — it grounds answers in an approved corpus and never leaves the network.
My Role & Contribution
- Architected the end-to-end RAG pipeline from ingestion through retrieval to generation
- Evaluated embedding models and rerankers for the domain corpus
- Built the offline-deployable packaging and operator UI
Approach
- Document ingestion pipeline that chunks, cleans, and embeds the domain corpus
- Vector store for semantic retrieval plus a lightweight reranker for precision
- Local LLM inference (quantized open-weights model) tuned for the deployment hardware
- Prompt engineering for grounded, cited answers with explicit "I don't know" handling
- Air-gapped packaging — all models, weights, and dependencies bundled for offline install
Tech Stack
Python
PyTorch
Hugging Face Transformers
LangChain
FAISS / Chroma
Sentence-Transformers
llama.cpp / Ollama
FastAPI
Docker
Results & Impact
- Fully offline, air-gap-capable — no data ever leaves the deployment environment
- Grounded answers with citations back to source documents
Note: Deployment details and the specific domain corpus are confidential. The case study describes the technical approach at a level appropriate for public sharing.
// TODO: add architecture diagram / screenshots