AI needs to work in the real world—when connectivity drops, power is limited, or networks are under attack. Federal agencies must operationalize artificial intelligence (AI) in environments where the stakes are high, the data is sensitive, and the infrastructure is constrained. MetroStar tackled this challenge head-on, deploying an advanced Retrieval-Augmented Generation (RAG) system for a classified U.S. government mission.
MetroStar’s Innovation Lab is redefining the way federal organizations operate in disconnected environments. Designed for air-gapped, low-power infrastructure, our RAG solution uses NVIDA’S Llama3.1-8B-Instruct model, a compact yet powerful Large Language Model (LLM) optimized for mission-driven tasks.
The system is tailored for federal agencies operating in secure enclaves with strict compute and security constraints, where public cloud services aren’t an option and transparency is non-negotiable. The result: a modular, rapidly adaptable solution that can integrate new models and workflows in under an hour, giving agencies a tactical advantage in dynamic environments.
We worked closely with our customers to understand their goals and to select the right technologies to make AI mission-ready for their organizational needs.
Model selection: After evaluating models like Mistral-7B, Falcon3-10B, and Qwen2.5-7B, MetroStar selected Llama3.1-8B-Instruct for its 128k token context window, enabling it to handle lengthy, complex documents far more effectively than 32k-limited alternatives.
Flexible service-based architecture: Built for both on-premises and cloud environments, the solution allows teams to independently deploy, update, and scale components. Services like document ingestion, vector storage, and inference can be containerized and reconfigured based on mission needs.
Optimized for RAG: The system converts unstructured, inconsistent documents into structured vector embeddings. This enables fast, accurate retrieval and cross-referencing, even when answering compliance-heavy or entity-specific queries that traditional methods can’t handle.
The following graphic provides a high-level view of our Innovation Lab’s air-gapped RAG solution.
The solution components evolve with federal agency needs, including:
A central model store governs all deployed models, ensuring reproducibility and governance
Vector operations are handled by a dedicated service layer optimized for speed and precision
Developers can interact through a lightweight Python library or web APIs—lowering the technical barrier to innovation
React-based UI supports intuitive mission-user workflows, while MLflow and Ray support scalable model orchestration behind the scenes
Built on Kubernetes, the system runs seamlessly across classified on-prem systems or cloud enclaves
What sets this deployment apart is our Innovation Lab’s strategic use of NVIDIA GPUs across the entire AI lifecycle. Models are fine-tuned using NVIDIA H100 GPUs in secure enclaves, then deployed on 4x NVIDIA RTX A5000 GPUs with NVLink in low-power, air-gapped environments—delivering mission-ready inference without reliance on external networks.
While the A5000 is a mid-tier GPU by market standards, in this mission context, it’s a breakthrough: delivering enterprise-grade AI performance in a secure, air-gapped environment without requiring data center infrastructure or cloud dependencies. NVLink enhances the system’s ability to perform parallelized inference tasks with low latency and shared memory optimization—critical for real-time LLM inference in secure, multi-GPU deployments. This design enables scalable, low-power, and cost-efficient LLM operations at the tactical edge, something few government-focused AI teams have achieved.
This hybrid approach enables:
Secure, sovereign LLM training and adaptation
Efficient, scalable inference at the edge
A fully air-gapped AI lifecycle—trained, deployed, and executed within government-controlled infrastructure
The following graphic describes our solution’s hardware specifications, cost-performance benefits, and outcomes.
Wherever they operate, federal agencies depend on massive archives of unstructured data, such as policy documents, compliance records, and historical directives. Integrating our air-gapped solution helped agencies access and validate insights from vast data sources in low-connectivity environments to improve operational resilience. Our solution:
Applies context-aware retrieval and plain English Q&A to complex datasets
Uses the LLM to extract key fields from unstructured text, even across noisy formats and incomplete metadata
Structures the extracted data into high-dimensional vector space to enable intelligent retrieval, matching, and validation in real-time
Our solution aligns with Zero Trust principles, enabling AI to operate in the most sensitive environments without compromising visibility or control:
Air-gapped deployment: Ensures no external connectivity or data leakage
Local artifact management: Models, dependencies, and logs are fully self-contained, ensuring transparency, traceability, and reproducibility
RBAC + Active Directory integration: Supports role-based access using existing enterprise identity infrastructure
AI solutions in low-connectivity environments must be secure, modular, and resilient to withstand shifting requirements and high-pressure operational demands. MetroStar’s framework is not a point solution – it’s a blueprint for sustainable AI modernization. Our Innovation Lab embraces a design approach built on the following principles:
Rapid Prototyping: New AI capabilities are built, validated, and deployed in days—not months
Open Architecture: Full transparency and flexibility allow agencies to integrate mission-specific tools and models
Community-Driven Improvements: By using open-source foundations, MetroStar ensures agencies evolve alongside the AI ecosystem without lock-i
“We’re not just delivering AI that runs – we’re delivering AI that survives contact with the mission,” said Jason Stoner, MetroStar’s Sr. Director of Transformation. “By combining H100s for secure model tuning and RTX A5000s with NVLink for tactical inference, we’ve built an AI deployment model that works in the real world of national security – air-gapped, sovereign, and ready.”
As federal organizations navigate the future, those who adopt innovative and proven solutions will remain mission-ready, even in unpredictable environments.
This deployment is part of MetroStar’s ongoing mission to bring operational, explainable, and secure AI to the highest levels of government. Learn more about how our Innovation Lab is advancing next-gen AI for Defense, Intelligence, and National Security.
Written By:
Jesse Scearce
Sr. Machine Learning Engineer
Never miss a thing by signing up for our newsletter. We periodically send out important news, blogs, and other announcements. Don’t worry, we promise not to spam you.