The AI Ingestion Pipeline – Intelligence vs. Automation
In the architectural evolution of Project Codex, the role of the AI is often misunderstood as the "engine" of ingestion. In reality, the 5060 Ti GPU is a precision instrument, and we have made a strategic distinction between Data Ingestion (moving and extracting files) and Data Intelligence (parsing and understanding them). While the raw heavy lifting of moving terabytes from the web to the "Vault" is handled by the CPU and the 2.5 Gb/s network, the GPU is reserved for the sophisticated "Triage" phase that follows.
1. The Separation of Concerns: CPU for Ingestion, GPU for Insights
We have deliberately avoided using the GPU for the raw ingestion and extraction of datasets like OpenAlex or Kiwix ZIM files.
-
Why the CPU handles Ingestion: Decompressing JSONL files and moving data over the network are primarily integer-math and I/O-bound tasks. Using the GPU for this would be an inefficient use of the 5060 Ti’s CUDA cores and 16GB of VRAM. Instead, the 5950X on Pear and the Ryzen 5 on Kiwi handle the "brute force" of extraction.
-
Why the GPU handles Intelligence: Once the data is "landed" on the Blackberry or Tayberry virtual disks, the GPU-accelerated Quince VM takes over. Its job is not to move the data, but to "read" it. The 16GB of VRAM allows us to load Large Language Models (LLMs) that can perform semantic tagging, metadata extraction, and "hallucination checks" on the archived content.
2. The Quince Workflow: Post-Ingestion Triage
The ingestion pipeline follows a "Harvest, then Analyze" model. The GPU enters the process only after the data has been stabilized on the IronWolf arrays.
-
The Staging Phase: Data is ingested into the 1TB Fastpool scratch disk via Tayberry (Kiwi) or Blackberry (Pear).
-
The Intelligence Phase: The Quince VM, utilizing PCIe passthrough for the 5060 Ti, mounts the specific dataset. We run local Python scripts (often utilizing libraries like PyTorch or llama.cpp) to scan the new harvest.
-
Categorization: The AI identifies the "cleanliness" of the data—detecting if a ZIM file is corrupted or if a scholarly dataset contains the specific metadata we require for the 2025 Baseline.
3. Protecting the VRAM for High-Value Tasks
Because RAM is our rarest resource, we do not keep the AI models "hot" (loaded in VRAM) during a massive 48-hour ingestion of OpenAlex.
-
The "Burst" Methodology: We keep the Quince VM in a low-power state or powered down during the "noisy" ingestion phase. This ensures that every available megabyte of system RAM is accessible to the ZFS ARC and the Java-based extraction engines on Tayberry.
-
Strategic Wake: Once the disks are quiet and the ingestion is complete, we spin up Quince. This "sequential processing" prevents the GPU drivers and the ZFS filesystem from competing for the same system memory addresses, maintaining the 24/7 stability of the host.
4. The Future 24GB/48GB Upgrade Path
While the 5060 Ti is currently used for this "Post-Ingestion Triage," our hardware roadmap acknowledges its limitations. The 16GB of VRAM is sufficient for "reading" and "tagging" data, but it struggles with "Deep Synthesis"—where the AI needs to cross-reference multiple 100GB datasets simultaneously. This is the primary driver for our eventual pivot to a Quadro RTX 6000 or 8000. More VRAM won't make the ingestion faster, but it will significantly expand the context the AI can hold when it begins to turn that raw data into a searchable, intelligent knowledge base.
Conclusion: A Measured Intelligence
By refusing to "throw the GPU" at tasks it wasn't designed for (like file extraction), we have preserved the lifespan of the 5060 Ti and ensured that our 750W power budget is used effectively. We treat the AI as the "Librarian" of the silo, not the "Warehouse Worker." It waits until the shelves are stocked before it begins the work of cataloging the 2025 Baseline.
No comments to display
No comments to display