AI / LLM Research

RAG-Based LLM Assistant for Operational Users

An offline-capable Retrieval-Augmented Generation system for domain-specific Q&A and decision support — engineered for reliability, data isolation, and bandwidth-constrained deployment where cloud LLMs aren't an option.

Role AI Engineer

Domain LLMs / NLP

Status Deployed

Overview

A Retrieval-Augmented Generation assistant that lets operational users query a curated corpus of domain documents in natural language. The entire stack — embeddings, vector store, and LLM inference — runs locally, with no external API calls, so it operates in air-gapped or low-bandwidth environments.

The Problem

Operational users often need to consult large volumes of technical documentation, SOPs, and manuals. Cloud-based LLMs are unreliable in constrained environments and unsuitable for sensitive documents. General-purpose LLMs also hallucinate when the answer isn't in their training data. A locally-hosted RAG system solves both problems — it grounds answers in an approved corpus and never leaves the network.

My Role & Contribution

Architected the end-to-end RAG pipeline from ingestion through retrieval to generation
Evaluated embedding models and rerankers for the domain corpus
Built the offline-deployable packaging and operator UI

Approach

Document ingestion pipeline that chunks, cleans, and embeds the domain corpus
Vector store for semantic retrieval plus a lightweight reranker for precision
Local LLM inference (quantized open-weights model) tuned for the deployment hardware
Prompt engineering for grounded, cited answers with explicit "I don't know" handling
Air-gapped packaging — all models, weights, and dependencies bundled for offline install

Tech Stack

Python PyTorch Hugging Face Transformers LangChain FAISS / Chroma Sentence-Transformers llama.cpp / Ollama FastAPI Docker

Results & Impact

Fully offline, air-gap-capable — no data ever leaves the deployment environment
Grounded answers with citations back to source documents

Note: Deployment details and the specific domain corpus are confidential. The case study describes the technical approach at a level appropriate for public sharing.

// TODO: add architecture diagram / screenshots

← Previous SkyRoute — Helicopter Logistics Next → MARL + MANET Simulation