The Diligence Stack - By Creative Strategies

The Diligence Stack - By Creative Strategies

Memory Infrastructure in the AI Era: An Expert Perspective

An Interview with Steven Woo, Fellow, Rambus Labs

Ben Bajarin's avatar
Ben Bajarin
Feb 11, 2026
∙ Paid

We have had the opportunity to chat with Steven Woo, who is a Fellow at Rambus labs, mutliple times the past few years and always found his technical and market insights around what is happening in memory enlightening. Rambus kindly let us record and share this transcript on the record.

We have a full memory deep dive report coming next week and this is a good table setter conversation.


Discussion Highlights


The AI infrastructure buildout has fundamentally altered memory market dynamics in ways that distinguish this cycle from historical patterns. In a recent conversation with Steven Woo, a Fellow at Rambus Labs with nearly 30 years of experience in memory technology, we explored the structural shifts reshaping the semiconductor memory landscape and their implications for the investment thesis around high-performance memory.

Unprecedented Demand Intensity

Woo characterized the current environment as unlike any period in his career. Something nearly all execs in this space have consistently articulated to us. The acceleration in HBM development cycles—now roughly every two years versus the traditional five-year cadence for DDR and GDDR—reflects the urgency AI customers bring to the table. When Rambus meets with these customers, the conversation has shifted from “what bandwidth can you deliver?” to “tell us what you can give us economically, because you can’t possibly give us what we actually need.” This inversion of the typical vendor-customer dynamic underscores the severity of the bandwidth bottleneck.

What came up in our conversation that stood out was how Rambus has been pushing to validate controllers well above published spec limits—when HBM4 specs called for 6.4 Gbps, Rambus had validated closer to 10 Gbps—anticipating that customers would immediately push beyond standard specifications. This shows the demand for any increase in data throughput. As we have been articulating to the indsustry related to memory: compute determines how far AI can go, but memory now determines how fast it can scale.

Why This Cycle Is Different

Three structural factors differentiate the current memory market from prior cycles. First, end customers now drive specifications directly. Unlike previous generations, where Intel set memory standards for the broader industry, companies like NVIDIA are dictating requirements based on actual box shipments measured in millions of racks, backed by deep capital reserves.

Second, the economic model has shifted. Hyperscalers monetize through services rather than hardware margins, enabling them to absorb premium memory costs that pure hardware vendors could not justify. Third, nation-states now treat AI infrastructure as a strategic priority, with governments committing to power plant construction to support datacenter expansion—a level of sovereign commitment unprecedented in semiconductor history.

These points and more are the basis for not just one but multiple inflection points we still believe are coming that structually changes the dynamics of the memory category.

Technical Constraints and the Packaging Imperative

While demand appears insatiable, physical constraints impose hard limits. Woo provided valuable technical insight about HBM stacking: while current technology supports 16-high stacks, pushing to 32-high is technically possible but economically impractical given yield implications. The better path forward involves two 16-high stacks rather than one 32-high configuration.

In Woo’s assessment, and we agree, advanced packaging has emerged as the critical enabler for AI system innovation—more important than process node advances. Signal integrity, power delivery, and thermal management all converge at the packaging level, making this the dominant technology vector for performance improvements.

We detail the full advanced packaging bottlenecks in this report here.

Inference Economics and System Disaggregation

The conversation around NVIDIA’s Vera Rubin architecture—with GDDR for prefill and HBM for decode—signals a broader recognition that inference workloads require purpose-built silicon. The current GPU architecture, optimized across both training and inference, will likely bifurcate as the industry optimizes for inference economics.

NVIDIA’s inference context storage announcement points to an emerging memory tiering hierarchy: HBM for hot data, potentially augmented by CXL-attached memory appliances for extended context windows, with NVMe providing cold storage. The KV cache challenge—where context windows exceed available DRAM—will drive architectural innovation in memory (and storage) systems over the coming years.

Investment Implications

The designed-in nature of HBM and high-performance memory, versus the commodity spot market dynamics of traditional DRAM, fundamentally changes the risk profile for memory suppliers. When packaging determines allocation and multi-year capacity commitments govern supply relationships, the historical boom-bust cycles may be structurally dampened for this segment of the market.

As Woo noted, LLM inference is increasingly bandwidth-limited rather than compute-limited on roofline curves—a dynamic that ensures sustained demand for memory bandwidth improvements regardless of algorithmic developments.

The full interview transcript is available to Diligence Stack subscribers.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Creative Strategies, Inc. · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture