Table of Contents
| New Machines |
APAC has recently signed a contract with Dell Computers to provide a
linux cluster to be used as an additional computing resource for the
National Facility. Users with APAC grants will be able to use their
time on either the existing SC or on the new cluster to be known as
lc.apac.edu.au.
The compute nodes on the LC cluster will consist of 100 Pentium processors (2.66GHz) with 1 Gbyte of memory. The LC will be used to run single processor jobs and parallel jobs which do not require the high performance network of the SC. This will free up the SC to focus on tasks which require its special features. It is expected that the cluster will be available within a few weeks by which time there will be a web-based user guide. Assistance will be provided to help users port code to the LC to investigate its suitability. The user environment will be much the same on both systems. The batch queues will be similar and much of the applications software on the SC will be installed. The main differences will be the compilers and the amount of memory available. Large memory users will need to continue using the SC. Project accounting will be spread across both machines, so total time used is the sum of the time used on both machines. In addition an HP GS1280 - the Marvel - is expected to be installed in March. It will have 16 new generation alpha processors with 16 Gbytes of shared memory. The GS1280 will share file systems and batch queues with the SC. It will provide a platform for jobs such as OpenMP code requiring more than 4 processors. Other jobs from the queues will also be scheduled on the GS where appropriate. |
| Scheduled Downtime - advance warning | There will be a substantial SC downtime in the first week of April to upgrade the operating system and increase hardware reliability. |
| APAC NF Review | APAC is in the process of developing a proposal to the Federal Government for the continuation of APAC past the end of this year when the current service is scheduled to end. A review of the National Facility will be conducted by an independent panel over the next 5 weeks. As part of that process an independent company will be engaged to survey users on their experiences with the services provided. In addition, all users will have been invited by e-mail to make submissions to the panel. |
| MDSS News |
In the past few months there have been some upgrades to the Mass Data
Storage System. With the addition of new disks and fast high capacity tape technology the capacity
of the system is now 1.2 Petabytes. The SAM-FS file system has also been
upgraded. We apologize for recent interruptions to the service on the store. There have been some software problems with new releases and a major problem last week arising from a flat battery! |
| New PBS Commands |
To allow users to monitor their jobs on the SC
and, in particular, to make jobfs a
more user-friendly tool, a number of commands have been added to PBS
on the APAC NF systems. The command qsub has also been enhanced to make jobfs
files more secure. Summaries of the additions are as follows,
|
| APAC NF Courses |
Staff of the APAC National Facility have prepared a course on
MPI
Applications and Optimization. This is a one-day introductory course on
performance issues related to
parallelising applications particularly using message passing and the
MPI library. The course involves numerous hands-on exercises to
illustrate points. Attendees should have already attended the APAC course, Programming with MPI, and be embarking on writing their own MPI code or they should have MPI code already and be interested in improving its performance. Required skills include familiarity with the Unix environment, text editor (vi, emacs or an equivalent) and programming in C, C++ or Fortran. Users who would be interested in having this course run at their institution should contact help@nf.apac.edu.au. to organise a suitable time. |
| GrangeNet and Data-Grid Activities |
The National Facility is now connected to the GrangeNet network and
participating sites are using that link. Other sites
are still using the AARNet2 link. Staff located at the National Facility are working on GrangeNet projects involving large data sets. They are currently adapting the MACHO astronomical data repository to exploit the VOTable standard, an emerging international Virtual Observatory metadata standard. They are also working with the PARADISEC group to host their multi-terabyte spoken language archive on the massdata store. Another project is with physics groups including the ACIGA gravity wave consortium, and the Belle HEP project exploiting the massdata store and GrangeNet network resources. APAC is a founding member of the Australian Grid Forum, inaugurated in Melbourne in December last year. One of the first cooperative ventures this forum will undertake will be the construction of a national Grid testbed, to which APAC will contribute computing and storage resources. |
| Job Scheduling Issues |
We are often asked by users why their job has been suspended.
The decision to suspend some jobs actually allows jobs to complete
more quickly overall, and even benefits the suspended
job by maximising the overall efficiency of the system. For some
details as to how this works, please read on. The SC is heavily subscribed and the queueing scheduler aims to keep the machine as busy as possible by minimising the number of idle CPUs whilst, at the same time, treating all jobs as equitably as possible. This is a difficult task because of the huge variation in job parameters such as walltime, memory and number of CPUs requested. Say a large parallel job requiring 64 processors is submitted to the queue. An approach used by some computer centres is to let processors go idle until 64 processors become available. This could take some time and could result in many processors being idle for a substantial time. Even with some sophistications, this style of approach can still waste a substantial number of cycles. Our approach is to utilise this waiting time by starting up smaller jobs as processors become available and then suspending these smaller jobs when the 64 processor job is started. Once that job finishes, the smaller jobs are able to complete. So a one processor job may start relatively soon after submission, run for a while but then be suspended to allow a large parallel job to run. The latter may have been queued for some time before this happens. (Unfortunately it is not possible to transport suspended jobs from one node of the SC to another, so a suspended job must wait until the running job finishes.) From the user's point of view a good rule of thumb is that, on average, any job submitted to the queue will take roughly 60% more than the requested walltime to complete allowing for queueing, running and suspended time. This is simply a reflection of the number of jobs submitted to the system. The decisions that the scheduler makes when deciding which jobs to suspend depend on many things such as the resources requested by a job, the time remaining for a job to use its requested walltime, the number of jobs that the user has in the queue and so on. Users can get the best possible deal from the scheduler by making sure that the resources they request at qsub are as close as possible to what is actually required. Jobs that require specific resources such as a large memory node or large amounts of jobfs obviously have to compete for fewer nodes so may have to wait longer to start. To use up a large grant it is essential to keep a sufficient number of jobs queued to utilise the times when the machine is less heavily loaded such as overnight and on weekends. In practice a project with N thousand hours in a quarter needs to keep N cpus worth of jobs in the queue at all times to ensure that they use the whole grant. Compared with queuing schedulers that use a first-in-first-out approach, our combination of job suspension and resumption gives us 25-30% better utilization of the system and all users are ultimately winners. However the price of this while the machine is heavily used is occasional extended suspensions of single processor jobs to allow a large parallel job to complete and longer queuing times for parallel jobs before they can start running. Note that with the addition of the new computers in the next few weeks some of the pressure may be taken off the SC and users may see better turn around of jobs. |