In the following diagram, we classify computing based on “working set location” where the data is actually processed. Prior systems were based on a CPU-centric approach where data is moved to the core for processing (a)-(c), whereas now with near-memory processing (d) the processing cores are brought to the place where data resides. Computation-in-memory (e) further reduces data movement by using memories with compute capability (e.g. memristors, phase change memory).
Visit the following link to read the source publication: “Near-Memory Computing: Past, Present, and Future“
From the perspective of a DRAM chip, our solution could be considered in-memory computing. But from the perspective of the memory array, it’s near-memory computing.
Our idea is this: instead of using tons of power to grab DRAM contents and somehow muscle them into the CPU or some other computing structure, what if we did the computing on the DRAM die itself?
We are building simple processors on the DRAM die using the DRAM process – something for which the process wasn’t originally intended. Some compromises have to be made on the architecture, so this isn’t going to compete with a Xeon chip, but it really doesn’t need to for processing AI. We call the architecture and chip A.I. Memory, or AIM.
Instead of bringing the data to the computing, we’re bringing the computing to the data. A routine is offloaded by the main CPU and executes locally in the DRAM chip itself. Then, with no need to move that data anywhere outside the DRAM chip, only the result of the computation needs to travel back to the host system. And, given that AI computation often involves a lot of reduction, that result should be less data that what was used to compute it.
The beauty of our solution is that we made no changes to the process, only some small modifications to the DRAM design itself.