Project Overview

NFC-RAG is an embedded system that uses standard NFC tags to add Retrieval-Augmented Generation (RAG) to a local Large Language Model. By storing compact identifiers and metadata on low-memory NFC tags (e.g. 888 bytes on NTAG216), each tag acts as a contextual pointer into an external knowledge base. The result is personalized, privacy-preserving AI without retraining the model.

The system is designed for edge deployment on consumer hardware—for example an NVIDIA RTX 3090 running Ollama or llama.cpp—and fits scenarios that need offline operation and strict data locality.

Project Motivation

Classic RAG retrieves relevant documents from large vector stores using embeddings, which needs substantial storage and compute—often not feasible in very constrained environments. NFC tags offer a physical API: a single tap provides precise context hints that drive retrieval from a local knowledge store.

Typical use case: industrial maintenance. A technician scans an NFC tag on a machine; the system sends the tag payload to a local LLM, which retrieves only the relevant manual sections and answers troubleshooting questions—no cloud required.

Example: A tag on a robotic arm stores {"id": "robot-arm-XYZ", "role": "maintenance"}. Tapping it with a smartphone sends this to a local Qwen2.5-7B model, which retrieves arm-specific diagrams and produces step-by-step fixes. This scales to thousands of assets with minimal per-tag data.

Technical Feasibility and Constraints

NFC tag capacity is limited:

Payloads must stay small: UUIDs (16 bytes), flags (1–4 bytes), short strings (e.g. up to ~200 chars). A single 384-dimensional embedding at 1 byte per dimension would use 384 bytes—so at most one vector fits, and on-tag vector search is not practical. Instead, tags act as routing keys to a backend index.

Heavy work is offloaded: tags hold <1 KB, while the RAG engine (e.g. FAISS or LanceDB) runs on-device with embeddings built offline. A base LLM like Gemma-2-9B (Q4_K_M, ~6 GB VRAM) can process retrieved chunks in under 2 seconds on an RTX 3090. Limitations include read-only tags (writable NTAG21x can be used) and short read range (~5 cm), which match asset-tagging use cases well.

Constraint example: For 10 documents with 512D embeddings (about 2.5 KB total), an NTAG216 cannot store them—but a 32-byte ID can index a 10 GB local database partitioned by asset.

System Architecture

Three layers, all runnable locally:

Architecture diagram

NFC Frontend RAG Middleware LLM Backend
<system>Role: {tag.role}. Respond in {tag.lang}, style: {tag.style}.</system>
<context>{retrieved_chunks}</context>
<user>{query}</user>

Data flow

Scan Parse (≈10 ms) Retrieve (≈100 ms) Generate (1–3 s)

End-to-end latency is under 5 seconds on mid-range hardware.

Implementation Guide

Hardware requirements

Software stack

Sample Code: RAG Middleware (Python)

import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
from pydantic import BaseModel
import ollama

class TagPayload(BaseModel):
    doc_set: str
    role: str = "expert"
    lang: str = "en"

class NFC_RAG:
    def __init__(self):
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
        self.index = faiss.read_index('knowledge.faiss')  # Prebuilt per doc_set
        self.docstore = {}  # {id: text} loaded from JSON/DB

    def process(self, tag: TagPayload, query: str):
        query_emb = self.encoder.encode([query])
        scores, idxs = self.index.search(query_emb, k=3)
        chunks = [self.docstore[i] for i in idxs[0]]
        prompt = f"Role: {tag.role}. Lang: {tag.lang}.\nContext: {' '.join(chunks)}\nQ: {query}"
        resp = ollama.generate(model='qwen2.5:7b', prompt=prompt)
        return resp['response']

# Usage: rag = NFC_RAG(); print(rag.process(TagPayload(doc_set="robot-XYZ"), "fix vibration"))

What the code does — step by step

TagPayload (Pydantic model) — Represents the data read from the NFC tag. doc_set is required and identifies which knowledge partition to use (e.g. "robot-XYZ" for a specific machine manual). role and lang default to "expert" and "en"; they are injected into the system prompt so the LLM answers in the right tone and language.

NFC_RAG.__init__ — Loads the embedding model (all-MiniLM-L6-v2, 384 dimensions, runs on CPU), reads the prebuilt FAISS index from disk (knowledge.faiss), and prepares an in-memory docstore (id → text) that you populate from your JSON or database. In production you would load docstore from the same partition as the index (e.g. keyed by doc_set).

NFC_RAG.process(tag, query) — Runs the full RAG pipeline:

  1. Encode: encoder.encode([query]) turns the user question (e.g. “fix vibration”) into a 384D vector.
  2. Search: index.search(query_emb, k=3) finds the 3 nearest document chunks in the FAISS index; idxs[0] are the chunk indices.
  3. Fetch chunks: chunks = [self.docstore[i] for i in idxs[0]] retrieves the actual text for those indices from the docstore.
  4. Build prompt: The prompt combines the tag’s role and lang with the retrieved context and the user query, so the LLM sees a clear system instruction plus the relevant manual excerpts.
  5. Generate: ollama.generate(...) sends the prompt to the local Ollama server (e.g. Qwen2.5 7B) and returns the model’s reply (e.g. step-by-step troubleshooting).

Prebuild the index by embedding all documents for each doc_set, saving the FAISS index and the id→text mapping; at runtime you load the partition that matches the tag’s doc_set.

Tag Encoding

Write payloads via NFC Tools app or nfcpy. Example JSON:

{"doc_set": "robot-XYZ", "role": "maintenance", "lang": "it", "v": 1}

For maximum density use TLV (Tag-Length-Value): e.g. ID=0x01, length=16, value=UUID.

Setup Steps

Follow these steps to deploy NFC-RAG from scratch.

Step 1: Prepare the knowledge base and build the index

Step 2: Write the NFC tags

Step 3: Run the RAG middleware and LLM

Step 4: Test end-to-end

Use Cases and Examples

Industrial IoT maintenance

Attach an NTAG216 to a pump, conveyor, or robotic arm. The tag stores e.g. {"doc_set": "pump-model-A", "role": "maintenance", "lang": "en", "style": "step-by-step"}. When a technician taps the tag with a phone and asks “Why is pressure low?” or “How do I replace the seal?”, the app sends the tag payload and the question to the local NFC-RAG API. The middleware loads the FAISS index for pump-model-A, embeds the query, retrieves the top-3 chunks from the pump manual (e.g. troubleshooting section, parts list), and injects them into the LLM prompt. The model (e.g. Qwen2.5-7B) answers with concrete steps: “Check valve #3; per manual p.42 this is a common failure. If the seal is worn, order part XYZ and follow section 5.2 for replacement.” No cloud and no generic chatbot—only that machine’s documentation in context.

Scale: A factory can deploy hundreds of tags (one per asset or per asset type). Each tag points to a doc set of 20–50 chunks; the total knowledge base can be hundreds of MB, all on a single edge server with a single GPU.

Personalized tutoring

Stick a tag on a textbook chapter or a printed exercise set. Payload example: {"doc_set": "calculus-ch3", "role": "tutor", "lang": "en", "style": "educational"}. A student taps the tag and asks “Explain integrals” or “Work through example 3.2”. The system retrieves the relevant theorems, definitions, and worked examples from the chapter’s index, and the LLM produces an explanation tailored to that material—avoiding drift into other chapters or generic web content. The same setup works for language learning (e.g. doc_set: "spanish-lesson-5"), safety training (e.g. doc_set: "forklift-safety"), or certification prep.

Home automation and recipes

Put a tag on the fridge or a recipe binder. Example payload: {"doc_set": "recipes-vegan", "role": "assistant", "lang": "en"}. The user asks “Suggest dinner with what I have” or “Something quick with chickpeas”. The RAG layer can pull from a small, curated recipe set (and optionally from a grocery list if you store it in the same doc set or a linked one). The LLM suggests a concrete recipe and steps. Because the knowledge base is local and fixed, answers stay on-topic and private; you are not sending grocery or eating habits to the cloud.

Warehouse and inventory

Use one tag per shelf or product family. For example {"doc_set": "warehouse-zone-A3", "role": "logistics"}. Staff scan the tag and ask “Where is item SKU-789?” or “Restocking procedure for this zone”. The backend retrieves zone-specific procedures, layout notes, or inventory hints and the LLM answers in one place. At scale: 1k tags, each linked to a 50-document set (e.g. 50 chunks per zone), with a total DB size of a few hundred MB—easily hosted on a single server with FAISS and a 7B model.

Summary

In every case, the NFC tag is a contextual pointer: it tells the system which knowledge partition to use and how to shape the prompt (role, language, style). The actual retrieval and generation stay local, so latency stays low and data never leaves the premises.

Advantages Over Alternatives

Aspect NFC-RAG Full Retraining Cloud RAG
PrivacyLocal-onlyLocalCloud exposure
Cost~$0.50/tag + free LLMHigh computeAPI fees
Latency<5 s at edgeN/A200 ms+ network
Scalability1000s of tags, partitionedRigidVendor-locked
UpdateRewrite DB, not tagsFull retrainLive sync

NFC-RAG fits hybrid setups: a base LLM for fluency plus tag-triggered precision.

Future Extensions

Conclusions

NFC-RAG shows that physical context (what you tap) can drive retrieval and prompt shaping for local LLMs without cloud or heavy retraining. Small NFC tags become cheap, writable “context switches” for RAG, suitable for maintenance, education, home automation, and inventory. With standard tags (NTAG216), existing embedding models, and tools like FAISS and Ollama, you can deploy offline, low-latency, privacy-preserving RAG at the edge and scale by adding more tags and partitioned indexes.

Need More Information?

For questions about NFC-RAG or to discuss this project, please send an email.