Programming
Models / Software
Smart Memories is designed to efficiently support different programming
models to allow applications to be programmed and run in the model that gives
the best performance and/or programming ease. Different programming models are
supported in the Smart Memories system by reconfiguring memory system to provide
the memory access requirements for each model. Three major programming model /
execution modes are supported:
Shared
Memory, Multi-thread mode
This programming model gives a notion of cache coherent shared memory
environment to the programmer. Multi-thread programs are supported using
different APIs, such as pthreds or ANL macros. There are on going efforts for
mapping different application classes to the Smart Memories architecture using
this programming model:
Probabilistic
Reasoning Applications
Probabilistic reasoning is an influential approach in A.I. and has been shown
to successfully tackle difficult problems in growing fields such as data
mining, image analysis, robotics, and genetics. Given the increasingly complex
models and large data sets used in these emerging applications, the
performance of reasoning algorithms is likely to
become important for future computing systems. These algorithms tend to be
inherently parallel, but are demanding in compute, memory and bandwidth
resources. By mapping these algorithms onto the Smart Memories architecture,
we can evaluate the effectiveness of various reconfigurable components
in our design.
Global
Illumination on Parallel Architectures
Monte-Carlo ray tracing to generate scenes with global illumination is an
application that demands a lot from memory system. The application
has been coded using pthreads and is simulated on the Smart Memories simulator.
Although real-time performance on a single Smart Memories chip is achieved,
higher performance over current processors is possible.
Related publications: C. Burns, "Global
Illumination on Parallel Architectures," Senior Thesis, University
of Texas Department of Computer Sciences, Dec. 2004
Data
Structure Pre-fetching
Hardware-based or compiler-assisted pre-fetching techniques work well for
array-based programs but are less effective in hiding memory latency for
pointer-intensive programs. By using a data structure centric approach to
pre-fetching (as opposed to control-flow centric approaches), we exploit
libraries of data structures to help with pre-fetching data stored in the data
structures. We take advantage of the recent success of chip multiprocessors,
and use an idle or under-utilized processor to pre-fetch data using a
pre-fetch thread. The library is modified by adding code for the pre-fetch
thread as well as a few lines to communicate information from the library code
to the pre-fetch thread. The pre-fetch thread uses the knowledge about data
structures in the library to identify the memory traversal patterns and issues
pre-fetches accordingly. This is contrary to issuing pre-fetches for
individual load instructions independently. Using this approach, we are able
to obtain performance improvements without the assistance of any
profiling-compiler or costly hardware even while restricting to the paradigm
of sequential programming languages. Furthermore, this approach makes
pre-fetching transparent to the programmer (using the library) as we do not
modify the application code at all.
Streaming
Streaming is the second programming model supported in the Smart Memories
system. For data parallel applications as in the multimedia and DSP domain,
the stream programming model gives high performance. By separating computation
and communication in a program into kernels and streams of data, a compiler
can make a lot of static optimizations. A high level compiler such as
Reservoir
Labs R-Stream maps compute kernels to stream co-processors and manages the
transfer of data to software managed local memories. It generates SVM (Stream
Virtual Machine) code, C with SVM API calls which is then compiled by our
Tensilica XCC compiler. The SVM runtime implements the SVM API calls to allow
a stream program to run on Smart Memories.
Smart Memories is an active participant in the Morphware
Forum which develops standards such as the Stream
Virtual Machine.
Related publications: F.Labonte, P. Mattson, I. Buck, C. Kozyrakis and M.
Horowitz, "The
Stream Virtual Machine," PACT, September 2004.
Transactional
Coherence and Consistency (TCC)
The last major programming model in the Smart Memories system is
transactions. By executing all codes as transactions on the memory system, TCC
offers a simpler way to parallelize applications than different threads. For
more details about TCC please refer to Stanford
TCC website.