Investigating Programming Extensions for Non-Uniform Memory Architectures


Virtually all large shared memory parallel processors display non-uniform memory access (NUMA). That is the speed and rate at which any memory location can be accessed depends on its exact location with respect to the requesting CPU. Typically memory that is physically close to the requesting CPU can be accessed faster than more remote locations. OpenMP is a widely used progamming paradigm for shared memory parallel computers. The current OpenMP model, however, is based upon the assumption of uniform memory access. The question arises as to whether OpenMP should be extended to support NUMA architectures. To this end several vendors have proposed possible extensions, but the existence of these extensions and their form is not generally agreed. The goal of this project is to consider OpenMP for NUMA architectures.


Principal Investigator

Alistair Rendell
Computer Science, Faculty of Science
Australian National University

Project

x38

Co-Investigators

Ilya Sharapov
Sun Microsystems
USA


Lindsay Hood
Institute for Molecular Bioscience
University of Queensland


Joseph Antony
Stephen Titmuss
Engineering, FEIT
Australian National University


Wilga Hawkins
Nathan Robertson
Computer Science, Faculty of Science
Australian National University

RFCD Codes

280301


Significant Achievements, Anticipated Outcomes and Future Work

Work to date has focused on the SGI Origin at University of Queensland. We have focused on using pthreads and various SGI specific routines rather than OpenMP. The goal was to bind specific threads to specific physical CPUs while also allocating specific blocks of memory on specific nodes. The effect of different thread/memory layouts could then be explored in a controlled fashion. This was done for a range of computational science algorithms. It was found that in some cases performance differences close to 50% were obtained. Furthermore, the performance observed without giving any attention to memory or thread placement was in general found to be far from optimal. In summary the results indicate that some attention to memory placement within the OpenMP programming model is highly desirable.

We are now focused on extending this work to the Compaq GS1280 located at the APAC National Facility, as well as looking at the OMNI OpenMP environment as implemented on a cluster of PC located within the Department of Computer Science.

 

Computational Techniques Used

In the project to date small programs have been written to model the kernels of larger applications. Examples include a pointer chasing algorithm to measure memory latency, a level 1 BLAS application to assess memory bandwidth, and a 2-D finite difference heat diffusion code. All codes have been written in C and use multiple processors via either pthreads or OpenMP.

 

Publications, Awards and External Funding

External Funding and Awards

ARC Linkage Grant LP0347178 "Programming Paradigms, Tools and Algorithms for the Spectral Solution of the Electronic Schroedinger Equations on Non-Uniform Memory Parallel Processors"

Publications

N. Robertson and A.P. Rendell, "OpenMP and NUMA Architectures I: Investigating Memory Placement on the SGI Origin 3000", 3rd International Conference on Computational Science, Lecture Notes in Computer Science, Springer Verlag, 2660, 648-656 (2003)