Processor-in-memory

Computational RAM or C-RAM is random-access memory with processing elements integrated on the same chip. This enables C-RAM to be used as a SIMD computer. It also can be used to more efficiently use memory bandwidth within a memory chip.

Perhaps the most influential implementations of computational RAM came from The Berkeley IRAM Project. Vector IRAM (V-IRAM) combines DRAM with a vector processor integrated on the same chip.

Reconfigurable Architecture DRAM (RADram) is DRAM with reconfigurable computing FPGA logic elements integrated on the same chip. SimpleScalar simulations show that RADram (in a system with a conventional processor) can give orders of magnitude better performance on some problems than traditional DRAM (in a system with the same processor).

Some embarrassingly parallel computational problems are already limited by the von Neumann bottleneck between the CPU and the DRAM. Some researchers expect that, for the same total cost, a machine built from computational RAM will run orders of magnitude faster than a traditional general-purpose computer on these kinds of problems.

As of 2011, the "DRAM process" (few layers; optimized for high capacitance) and the "CPU process" (optimized for high frequency; typically twice as many BEOL layers as DRAM; since each additional layer reduces yield and increases manufacturing cost, such chips are relatively expensive per square millimeter compared to DRAM) is distinct enough that there are three approaches to computational RAM:

Some CPUs designed to be built on a DRAM process technology (rather than a "CPU" or "logic" process technology specifically optimized for CPUs) include The Berkeley IRAM Project, TOMI Technology and the AT&T DSP1.

...
Wikipedia