Breaking down the “Memory Wall”!
In computer systems, data moves between the memory and a processor. This exchange is inefficient and can overwhelm software optimizations or caches, causing significant latency and power consumption. Often referred to as the memory wall, this bottleneck is a challenge for all high performance computing workloads in applications such as big data and artificial intelligence.
To address this, we fuse computing elements into memory to nearly eliminating the physical distance between processing and memory that currently exists in today’s computer architecture. Sounds simple, right? It’s most definitely not.
Our solution involves building processors directly into DRAM. The difficulty lies in the manufacturing process in which silicon fabrication is traditionally specialized to produce either fast logic elements or dense storage elements. Because of our foundry’s unique DRAM manufacturing capabilities, we have pioneered a process that incorporates both fast logic elements and dense storage elements in a single chip. Since data doesn’t need to be moved around as much, there is immense potential for significant energy and time savings. The compute units have direct access to the data they need since they now physically reside in the memory, effectively widening the I/O bandwidth to unprecedented levels.
What about stacking 2.5D/3D or High Bandwidth Memory (HBM)?
In 2.5D, dies are stacked on top of an interposer, which incorporates through-silicon vias (TSVs). The interposer acts as the bridge between the chips and a board, which provides more I/Os and bandwidth.
HBM stacks DRAM dies on top of each other, enabling even more I/Os. For example, Samsung’s HBM2 technology consists of eight 8Gbit DRAM dies, which are stacked and connected using 5,000 TSVs.
Besides 2.5D, the industry is working on 3D integrated circuits. The idea is to stack memory dies on a logic chip, or logic dies on each other. The dies are connected using an active interposer. The problem with these technologies is the high cost and complexity of manufacturing, making them impractical for most applications.
Our cost effective, 2x-nm class process is unique and currently does not involve stacking. Adding in the processing cores adds a very negligible amount of die area, and can be enabled within 3-5 metal layers, as the logic is a magnitude less dense than a leading edge logic implementation. The idea is to improve total energy efficiency by 10x combined with scalability, compared to the leading CPU implementations while boosting performance at least 20x!