Programming Models / Software

Smart Memories is designed to efficiently support  different programming models to allow applications to be programmed and run in the model that gives the best performance and/or programming ease. Different programming models are supported in the Smart Memories system by reconfiguring memory system to provide the memory access requirements for each model. Three major programming model / execution modes are supported:

 

Shared Memory, Multi-thread mode

This programming model gives a notion of cache coherent shared memory environment to the programmer. Multi-thread programs are supported using different APIs, such as pthreds or ANL macros. There are on going efforts for mapping different application classes to the Smart Memories architecture using this programming model:

 

Probabilistic Reasoning Applications

Probabilistic reasoning is an influential approach in A.I. and has been shown to successfully tackle difficult problems in growing fields such as data mining, image analysis, robotics, and genetics. Given the increasingly complex models and large data sets used in these emerging applications, the performance of reasoning algorithms is likely to
become important for future computing systems. These algorithms tend to be inherently parallel, but are demanding in compute, memory and bandwidth resources. By mapping these algorithms onto the Smart Memories architecture, we can evaluate the effectiveness of various reconfigurable components  in our design.

 

Global Illumination on Parallel Architectures

Monte-Carlo ray tracing to generate scenes with global illumination is an application that demands a lot from memory system. The application has been coded using pthreads and is simulated on the Smart Memories simulator. Although real-time performance on a single Smart Memories chip is achieved, higher performance over current processors is possible.

Related publications: C. Burns, "Global Illumination on Parallel Architectures," Senior Thesis, University of Texas Department of Computer Sciences, Dec. 2004

 

Data Structure Pre-fetching

Hardware-based or compiler-assisted pre-fetching techniques work well for array-based programs but are less effective in hiding memory latency for pointer-intensive programs. By using a data structure centric approach to pre-fetching (as opposed to control-flow centric approaches), we exploit libraries of data structures to help with pre-fetching data stored in the data structures. We take advantage of the recent success of chip multiprocessors, and use an idle or under-utilized processor to pre-fetch data using a pre-fetch thread. The library is modified by adding code for the pre-fetch thread as well as a few lines to communicate information from the library code to the pre-fetch thread. The pre-fetch thread uses the knowledge about data structures in the library to identify the memory traversal patterns and issues pre-fetches accordingly. This is contrary to issuing pre-fetches for individual load instructions independently. Using this approach, we are able to obtain performance improvements without the assistance of any profiling-compiler or costly hardware even while restricting to the paradigm of sequential programming languages. Furthermore, this approach makes pre-fetching transparent to the programmer (using the library) as we do not modify the application code at all.

 

Streaming

Streaming is the second programming model supported in the Smart Memories system. For data parallel applications as in the multimedia and DSP domain, the stream programming model gives high performance. By separating computation and communication in a program into kernels and streams of data, a compiler can make a lot of static optimizations. A high level compiler such as Reservoir Labs R-Stream maps compute kernels to stream co-processors and manages the transfer of data to software managed local memories. It generates SVM (Stream Virtual Machine) code, C with SVM API calls which is then compiled by our Tensilica XCC compiler. The SVM runtime implements the SVM API calls to allow a stream program to run on Smart Memories.

Smart Memories is an active participant in the Morphware Forum which develops standards such as the Stream Virtual Machine.

Related publications: F.Labonte, P. Mattson, I. Buck, C. Kozyrakis and M. Horowitz, "The Stream Virtual Machine," PACT, September 2004.

 

Transactional Coherence and Consistency (TCC)

The last major programming model in the Smart Memories system is transactions. By executing all codes as transactions on the memory system, TCC offers a simpler way to parallelize applications than different threads. For more details about TCC please refer to Stanford TCC website.