Storage Wars: When Memory and Storage Collapse Into One Layer
This report is a companion to our memory report found here:
The market still tends to frame AI infrastructure demand through compute. In that view, storage is a downstream beneficiary. More accelerators ship, more storage gets attached. We think that framing is becoming less useful as inference architectures evolve. Agentic workloads are shifting a larger share of the system burden into orchestration, memory movement, and state management, which means the pressure point increasingly sits in IO and memory coordination rather than raw compute alone. Industry estimates that point to a four-fold increase in CPU core requirements in a fully agentic 1GW datacenter matter less as a precise forecast than as a signal of where the architecture is tightening. More of the system is being consumed by coordination. That is why storage IO is moving toward the center of the inference stack.
The most distinctive way to see that shift is at the level of a single session. A 128K-context interaction on a 70-billion-parameter model can require roughly 167GB of KV cache for one inference sequence. That figure is useful because it breaks the conventional intuition around where the bottleneck should sit. Context memory can outrun premium memory capacity before the system runs out of compute. Once that happens, overflow is no longer an edge case. It becomes an architectural requirement. At hyperscale, where thousands of these sessions are served concurrently, the industry is being pushed toward a tiered hierarchy with HBM handling active cache, DRAM absorbing overflow, and NVMe flash serving persistence and extended context. Flash is moving into the inference loop because the memory system increasingly needs it there.
That same logic is now reshaping the demand base for NAND. For most of flash history, the defining end market was the smartphone. In 2026, datacenter applications are positioned to consume more than half of global NAND output. The significance of that crossover is not simply that demand is rising. It is that the center of gravity in flash is moving toward infrastructure workloads at the same time supplier inventories have fallen to roughly one to two weeks, the leanest level since 2018 by our work, and fulfillment rates for some OEM customers have dropped as low as 20 percent. The enterprise SSD market is projected to reach $51.4 billion by 2027, while fabricators continue reallocating cleanroom space from NAND toward DRAM and HBM. That combination points to a tighter supply environment, firmer pricing, and a market structure increasingly influenced by datacenter requirements rather than consumer replacement cycles.
The physical architecture is changing in parallel. A standard CPU rack historically operated in the 5 to 10 kilowatt range. NVIDIA’s next-generation Vera Rubin NVL72 system pairs compute with an STX storage cabinet that includes 1,152TB of SSD NAND, while rack-scale power moves toward the megawatt range. The important point is not simply that rack density is rising sharply, though it is. It is that storage is being designed directly into the compute system (co-optimized) as a structural layer of the platform. NVIDIA’s SCADA architecture, which provides direct PCIe connections from GPUs to NVMe SSDs without routing that traffic through the CPU, reflects the same design logic. At these densities and throughput requirements, storage placement becomes part of system architecture rather than a downstream deployment choice.
Where this becomes more complicated for stakeholders is in value capture. NVIDIA is pulling the hot storage tier into its own platform architecture through STX, while hyperscalers are expanding proprietary infrastructure in parallel. That leaves the storage opportunity more uneven than the usual assumption that rising AI demand lifts all boats equally. The shift itself looks increasingly clear. The harder question is who captures the economics as storage moves closer to the center of the inference stack. Our full report lays out a five-layer framework for thinking about value capture across that hierarchy, where merchant vendors remain exposed, where captive platforms gain leverage, and which signals would cause us to revise that view.
What the Full Report Covers
The KV cache mechanism in detail, including the specific memory hierarchy architecture in Vera Rubin NVL72 and how SCADA bypasses traditional I/O bottlenecks
NAND supply-demand sizing with bear/base/bull scenario analysis, including our base-case estimate of incremental ICMSP-related NAND demand in 2026, reconciled against the memory report
A five-layer value-capture framework mapping NAND bit supply, SSD controllers, system/software, platform/HCI, and captive/integrated layers with merchant exposure and captive risk assessments for each
The merchant-versus-captive analysis: our assessment of how much incremental storage spend remains available to public-market vendors versus being absorbed into NVIDIA’s STX stack or hyperscaler proprietary fabrics
Developed investment views on Pure Storage and Micron, including financial detail, technical differentiation, margin trajectories, and specific risk factors
Private company tracking on VAST Data, Weka, and Hammerspace as potential public-market or acquisition candidates
A quarterly tracking scoreboard with six indicators and explicit bullish/bearish signals for monitoring the thesis over the next two to three earnings cycles
Four specific falsifiers that would cause us to materially revise the thesis, including CXL displacement risk, hyperscaler storage internalization, NAND supply overshoot, and AI monetization slowdown
Timing analysis with explicit uncertainty bands on ICMSP adoption rates, STX commercialization, and the transition from current to next-generation deployment configurations




