NodeMind compresses float32 RAG indexes 48× smaller online (32× smaller offline · up to 100× on image embeddings) using our proprietary binary codec, and searches them 75× faster — with no GPU, no vector database, and no cloud bills.
A 1 GB text document becomes a 10 GB RAG float32 index — that's the real cost of vector search at scale. NodeMind's binary codec crushes that 10 GB down to just 210 MB online (or 32× smaller offline). Same documents. Same BGE-M3 embeddings. Dramatically different storage.
Why does RAG expand 10×? Chunking 1 KB of text produces a 1024-dim float32 vector = 4 KB (4× on raw text). HNSW graph index structures add another 2–3×. Result: every 1 GB of documents becomes ~10 GB in a vector database — confirmed by Elasticsearch, Pure Storage, and Milvus benchmarks. NodeMind then compresses that 10 GB RAG index 48× further on text (up to 100× on image embeddings) using our patent-pending binary codec.
| Original Documents | RAG Index float32 · ~10× expansion |
NodeMind Index binary · 48× smaller online |
vs RAG | RAG Storage/mo S3 Standard |
NodeMind Storage/mo S3 Standard |
Managed Vector DB/mo Pinecone pricing |
Annual Savings |
|---|---|---|---|---|---|---|---|
| — Storage Comparison | |||||||
| 1 GB documents ~250K chunks |
10 GB | 210 MB | 48× | $0.23/mo | $0.0024/mo | $25.00/mo | $300 / yr |
| 10 GB documents ~2.5M chunks |
100 GB | 2.1 GB | 48× | $2.30/mo | $0.024/mo | $250.00/mo | $3,000 / yr |
| 100 GB documents ~25M chunks |
1 TB | 21 GB | 48× | $23.00/mo | $0.24/mo | $2,500/mo | $30,000 / yr |
| 1 TB documents ~250M chunks |
10 TB | 210 GB | 48× | $230/mo | $2.40/mo | $25,000/mo | $300,000 / yr |
| — Search Performance | |||||||
| Search method Same 1024-dim BGE-M3 |
Cosine similarity on float32 — O(N·D) multiply-accumulate | 75× | Hamming distance on 1024-bit integers — POPCNT only | ||||
| GPU required | Yes — needed for fast cosine at scale | No — pure CPU, any machine | |||||
| RAM for 250M chunks | ~1 TB RAM | ~10 GB RAM | |||||
| Offline / portable | No — requires live vector DB connection | Yes — download zip, run anywhere, no cloud needed | |||||
Codec: NodeMind's compression is not standard binary quantization (which gives 32× at ~5% quality loss). Our patent-pending algorithm applies a spectral transform before binarization, achieving 48× online compression on text (32× on the offline downloadable bundle) and up to 100× on image embeddings, all with higher recall than vanilla quantization. No formula is disclosed. Costs use S3 Standard at $0.023/GB/mo and Pinecone managed vector DB at $2.50/GB/mo.
Note on scale and overhead. Compression ratios scale with corpus size. On small datasets (< 10,000 chunks) compression measures closer to 31× online due to the fixed structural overhead of the 64 MIH sub-tables; at production scale (> 100,000 chunks) this overhead is amortised, recovering the full 48×. Likewise the 75× search speedup is observable above ~100,000 chunks — small documents hit the 1 ms latency floor on both indexes. The 32× offline figure refers to the portable index file; if the raw corpus text is optionally bundled in the same zip, the total bundle footprint is roughly 5× smaller than standard RAG. Image / audio / video ratios (up to 100×) are projections — not yet measured in production.
The NodeMind binary codec applies to any embedding — text today, with audio, image, and video compression coming next. The same algorithm, dramatically higher ratios for richer media.
* Text results (48× online / 32× offline) are measured on the live platform. Image / audio / video estimates are projected from the same algorithm applied to those modalities' float32 embeddings; image's higher 100× ratio reflects larger native embedding dimensions in vision models.
Three stages — embedding, binary encoding with our proprietary codec, and Multi-Index Hashing search. No gradients. No GPU. Pure integer arithmetic throughout.
Each BGE-M3 float32 embedding (4,096 bytes) is transformed into a 1024-bit binary fingerprint (128 bytes) using our patent-pending algorithm. The codec is not standard quantization — it applies a spectral transform that preserves semantic neighborhood relationships far better than direct sign-binarization.
The 1024-bit fingerprint is split into 64 sub-strings of 16 bits each. Each sub-string is stored in a separate hash table. At query time, exact matches per sub-table are merged — giving sub-linear Hamming nearest-neighbor search without any approximate structures.
NodeMind uses BGE-M3, the state-of-the-art multilingual embedding model with 1024 dimensions. Dense, sparse, and multi-vector representations are supported. The model is loaded once per worker — no repeated downloads.
After indexing, users download two zip files: the NodeMind binary index and a standard RAG float32 index. Both run completely offline using the included nodemind_local.py runner. No cloud subscription needed to query.
User uploads PDF │ ▼ [ FastAPI — nodemind.space ] ← nginx + SSL (Google Cloud VPS, 1TB) │ ▼ submit job [ Community Hardware: RTX 3080 + 128 GB RAM ] 1. pdfplumber → chunks 2. BGE-M3 → float32 embeddings (1024-dim) 3. NodeMind binary codec → 1024-bit fingerprints (100× smaller) 4. MIH index: 64 sub-tables × 16-bit keys 5. RAG index: float32 cosine (comparison baseline) 6. Return nm_zip + rag_zip │ ▼ [ VPS stores zips ] ← auto-deleted after 24 hours │ ▼ User downloads both — runs offline
NodeMind's core algorithms are protected by two Australian provisional patents filed in 2026 by Sai Kiran Bathula, independent researcher, Coleambally NSW.
No installation. No API key. Upload any PDF, TXT, or Markdown file at the live demo and get a portable binary index back in under 2 minutes.
NodeMind is built by a solo independent researcher. Reach out for licensing, enterprise integration, or research collaboration.
saikiranbathula1@gmail.com