Back to all projects
AI / LLM Research

RAG-Based LLM Assistant for Operational Users

An offline-capable Retrieval-Augmented Generation system for domain-specific Q&A and decision support — engineered for reliability, data isolation, and bandwidth-constrained deployment where cloud LLMs aren't an option.

Role AI Engineer
Domain LLMs / NLP
Status Deployed

Overview

A Retrieval-Augmented Generation assistant that lets operational users query a curated corpus of domain documents in natural language. The entire stack — embeddings, vector store, and LLM inference — runs locally, with no external API calls, so it operates in air-gapped or low-bandwidth environments.

The Problem

Operational users often need to consult large volumes of technical documentation, SOPs, and manuals. Cloud-based LLMs are unreliable in constrained environments and unsuitable for sensitive documents. General-purpose LLMs also hallucinate when the answer isn't in their training data. A locally-hosted RAG system solves both problems — it grounds answers in an approved corpus and never leaves the network.

My Role & Contribution

  • Architected the end-to-end RAG pipeline from ingestion through retrieval to generation
  • Evaluated embedding models and rerankers for the domain corpus
  • Built the offline-deployable packaging and operator UI

Approach

  • Document ingestion pipeline that chunks, cleans, and embeds the domain corpus
  • Vector store for semantic retrieval plus a lightweight reranker for precision
  • Local LLM inference (quantized open-weights model) tuned for the deployment hardware
  • Prompt engineering for grounded, cited answers with explicit "I don't know" handling
  • Air-gapped packaging — all models, weights, and dependencies bundled for offline install

Tech Stack

Python PyTorch Hugging Face Transformers LangChain FAISS / Chroma Sentence-Transformers llama.cpp / Ollama FastAPI Docker

Results & Impact

  • Fully offline, air-gap-capable — no data ever leaves the deployment environment
  • Grounded answers with citations back to source documents
Note: Deployment details and the specific domain corpus are confidential. The case study describes the technical approach at a level appropriate for public sharing.
// TODO: add architecture diagram / screenshots
← Previous SkyRoute — Helicopter Logistics