Smart Memories is a multiprocessor system with coarse grain reconfiguration capabilities. Processing units in this system are in form of "Tiles" which when put together in groups of four, form "Quads". Interconnecting these elements is done in a hierarchical manner: a set of Inter-Quad connections provide communication facilities for Tiles inside a Quad, while a mesh interconnection network connects Quads together. Tiles inside a Quad share the network interface to connect to outside world (Figure 1).

Figure 1 - Smart Memories system, high level view
Each Tile in the Smart Memories system is consisted of four major parts (Figure 2): Two processor cores, a set of configurable memory mats, a cross bar interconnect and Load/Store unit. Either or both of the processors inside the Tile can be easily turned off allowing a Tile to be just a memory resource, and saving power, in the case that excess processing power is not required.

Figure 2 - Smart Memories Tile architecture
Smart Memories leverages Xtensa LX commercial configurable processing cores from Tensilica. Cores are 32 bit RISC machines with a flexible 16/24 bit instruction length. We have configured the cores to be 3-way issue VLIW with flexible instruction formats. Xtenesa LX has a seven stage pipeline, with two stages for memory access. It has 64 general purpose registers, a 32-bit floating point unit and 32 floating point registers.
Processors are configured and also extended, using TIE (Tensilica Instruction Extension) Language. We have defined new interfaces to the memory, state registers and also custom instructions for supporting different programming models.
Figure 3 shows the block diagram of a reconfigurable Memory mat. Each memory mat has 1024, 32-bit word in its main data array. Each word is associated with six control bits, in a separate control array. A programmable PLA performs a read-modify-write operation on the control bits after each access to the memory word. Mat is able to perform read, write and compare operations on the 32-bit data word.

Figure 3 - Reconfigurable Memory Mat
Each memory mat also has two pairs of Pointer/Stride registers which allows it to contain two separate hardware FIFOs inside. Mats are connected via a two bit inter-mat communication network, which allows them to exchange control information. They can be configured to be used as Cache, FIFOs or scratchpads.
A crossbar inside the Tile provides access to the memory mats for both processor cores inside the Tiles and Quad interface. It has four ports at the processor (LSU) side, two ports to the Quad interface and 16 ports to memory mats.
A Load/Store unit interfaces Tensilica cores to the rest of the Smart Memories memory system. It provides basic interfacing, and also support for custom memory operations, defined in TIE language. LSU also communicates with Quad's Cache Controller to request cache refills, access to off-Tile memory and report other special events, such as synchronization misses.
Tiles are groups in four to form Quads. Each Quad has a shared Cache or Protocol controllers which provides supports for processors inside. It also is has a network interface which sends/receives/routes packets on the mesh like network and provides an interface to the outside world.
Protocol controller is considered as the heart of the Quad. It is able to perform a variety of actions in supporting the processors with their memory access needs in different programming models. As a summary, protocol controller services cache evictions/refills, provides access to memory mats in one Tile for a processor in another Tile (off-Tile accesses), enforces cache coherence invariance (MESI protocol), acts as a DMA engine to move data in and out of the Quad and provides supports for transactions.
The network interface is a simple router which connects each Quad to its neighbors via a set of wires. It receives packets from protocol controller or other neighbors and routes them to appropriate destinations.