This Wiki is intended to serve as an interim User Guide for raijin. It should be noted that some changes are still being made to the system, and information then updated on this page. This Wiki should also be read in conjunction with the vayu user guide at http://nf.nci.org.au/facilities/userguide. Please let us know if you notice particular inconsistencies or would like to suggest further information would be helpful. We will resolve these either in this wiki or as we update the final userguide.
As usual, please let us know if you experience problems with your jobs on the system with as much information as possible, especially jobids for batch jobs. We attempt to investigate each case that is raised.
All NCI users should be able to login to raijin now. If you get a message about a missing home directory please email firstname.lastname@example.org.
To login from your local desktop or other NCI computer do the usual
ssh -l userid raijin.nci.org.au
If you use X windows, you might like to add -Y to your command line options to pass through the DISPLAY variable.
Your ssh connection will be to one of six possible login nodes (using round-robin through the DNS). As usual, for security reasons we ask that you avoid setting up passwordless ssh to raijin. Entering your password every time you login is more secure, or using specialised ssh secure agents - which we will describe more in the future. Windows users should follow the instructions given in our FAQ at http://nf.nci.org.au/facilities/faq.
For users who had accounts on vayu prior to mid-June, if you want to carry across changes to your login environment that you made on vayu you will need to copy these across from the directory from_vayu e.g.
cp from_vayu/.profile .profile cp from_vayu/.login .login cp from_vayu/.cshrc .cshrc cp from_vayu/.bashrc .bashrc
All active (as of mid-June 2013) user's /home directories from vayu have been copied into a subdirectory of their /home directory on raijin called from_vayu. If you do not see the directory from_vayu and need files from it, please contact the NCI helpdesk at email@example.com.
/short directories are set up but not populated with files from vayu and you should also see /short/$PROJECT/$USER.
User with new accounts set up since raijin became generally available will have the correct dot files installed in their /home directories.
Choosing and switching projects
At login you will not be asked which project to use. A default project will be chosen by the login shell if one is not already set in ~/.rashrc. You can change your default project by editing .rashrc in your home directory. To switch to a different project for interactive use once you have already logged in you can use the following helpful command:
Note that this is just for interactive sessions. For PBS jobs, use the -P option to specify a project.
As well as 6 login nodes there are 3592 compute nodes with following configurations:
All nodes are Centos 6.4. Note that the Linux OS requires some physical memory to be reserved for the Systems functions, leaving the following memory available to user applications:
Memory Available to User jobs: # 31GB: r0001..r2395 # 62GB: r2396..r3520 # 126GB: r3521..r3592
All nodes have 16 cpu cores, meaning that OpenMP shared memory jobs that were on vayu previously restricted to 8 cpu cores can now run on up to 16 cpu cores. The architecture of each node is 2 sockets with 8 CPU cores each. As in the past, please check that your code can scale to these greater number of cores - many codes don't.
MPI jobs that request more than 16 CPU cores will need to request full nodes, that is, a multiple of 16 in #PBS -l ncpus.
Raijin uses Intel Xeon E5-2670 CPUs on the compute nodes with the following parameters:
Standard frequency: 2.60GHz (26 x 100MHz BClk) Turbo boost additional multipliers: 7/7/6/6/5/5/4/4.
This effectively gives raijin's compute nodes a base frequency of 3.00GHz, rising to 3.3GHz if 1 or 2 cores are fully utilised.
At login users will have modules loaded for pbs, openmpi and the Intel Fortran and C compilers.
See module avail for the full list of software installed. Not all packages have been ported from vayu but please email firstname.lastname@example.org if there is something that you require urgently.
Transferring Files to raijin
If you are transferring data to raijin from off-site please scp/rsync/sftp to r-dm.nci.org.au rather than to raijin.nci.org.au. The login nodes should be used for normal interactive load rather than data transfers.
Binary Compatibility and Recompiling
We recommend that codes be recompiled on raijin for maximum performance. However many binaries from vayu will work without recompilation.
We would prefer to only make available recent versions of the Intel compilers. So far we have not ported version 11.1.046 which was the default for vayu. This may result in the need to recompile codes to use the more recent compiler versions. See module avail intel-fc to see the versions of the Intel compilers available.
Current recommendations are to use the option -xHost for maximum performance on the Intel processors. Version 188.8.131.523 has been set as the default as there have been some problems reported with the version 13 compilers. Type module avail intel-fc to see what versions are currently installed.
We also recommend that users build with openmpi/1.6.3 for MPI executables. Any MPI binaries from vayu should be rebuilt in order to use the most recent version of OpenMPI but openmpi/1.4.3 will be available to support some pre-built packages. The mpirun command produces warning output which can be ignored at this stage.
We are using PBSPro for job submission and scheduling. Many batch scripts from vayu and xe should work without major changes. The known differences are as follows:
Users should use the -lmem request in their batch jobs rather than -lvmem as for vayu. You may need to experiment to find the best value for a mem request for your jobs.
#PBS -wd becomes
#PBS -l wdto start the batch job in the working directory from which it was submitted.
PBSPro is not yet integrated into our software license management system. Jobs that use the Intel compiler licenses should be able to run without the usual #PBS -l software=intel-fc option as we there are sufficient licenses, but please avoid submitting jobs on raijin that use other software licenses (see http://nf.nci.org.au/license-status.html). Jobs on raijin that "steal" software licenses may cause jobs to crash due to license unavailability. If you are a heavy user of licensed packages you should delay your move to raijin for the moment. We are working to resolve this issue, but ask that you be patient.
Make sure that the $PROJECT variable is set before submitting a job or ensure that your script includes a line line such as
#PBS -P z00
qsub -v PROJECTwill also mean that the batch job runs under the correct project.
The standard PBS format of #PBS -l nodes or #PBS -l select are not being allowed at the moment. We are looking at relaxing this, to allow the syntax. However, our intention is to have jobs allocated to full nodes rather than partial nodes. In the meantime, please use our previous allowed syntax of #PBS -lncpus.
- Batch jobs do not start as though they were a fresh login as has been the practice on vayu. This means that modules that you load in .login or .profile will not be loaded in batch jobs. You need to edit the .cshrc (for tcsh) or the .bashrc (for bash) file in your /home directory in order to load modules automatically in batch jobs. Otherwise you need to explicitly load all modules and environment variables that are needed for the batch job in the script.
You may find the command qstat useful e.g. qstat -a to list all running jobs and qstat -f jobid to show the resources being used by a running job. The command nqstat is also available and has options such as -a for all jobs, -u for a particular user and -P for a project.
The vayu NCI commands such as qps, qcat, qls etc are being ported and will appear in due course. (qps and qcat are available now). Please contact email@example.com if you need to run longer jobs and this can be arranged.
- The current default walltime limits for the queues are as follows:
Copyq has a 10 hour walltime limit. Express queue walltime limit is 24 hours for up to 511 cores, 10 hours for 512-1023 cores, and 5 hours for larger. Normal queue walltime limit is 48 hours for 1-255 cores, 24 hours for 256-511 cores, 10 hours for 512-1024 and 5 hours for larger.
The command nf_limits -P <project> -n <ncpus> -q <queue> will show your current limits.
If you require exemptions to these limits please contact firstname.lastname@example.org.
- The vayu command nf_limits to show batch job limits for walltime, memory, JOBFS scratch space etc will soon be available on raijin.
- The man pages for qsub and pbs_resources are not yet correct as they need to be rewritten to include the changes that have been made to standard PBSPro.
- If you need use pbsdsh with the -N option please use pbsdsh_anu for the moment. There is a significant difference between pbsdsh on vayu and pbsdsh (pbsdsh_anu) on raijin. On vayu, pbsdsh working directory is set to $PBS_O_WORKDIR; on raijin, it is set to the user home directory. Therefore, if your script is using commands like pbsdsh -N cp file $PBS_JOBFS it needs to be modified to have pbsdsh_anu -N cp $PBS_O_WORKDIR/file $PBS_JOBFS. Furthermore, a double dash needs to be placed between pbsdsh options and the command for pbsdsh to run, e.g.
pbsdsh -n 16 -- cp -a $PBS_O_WORKDIR/directory $PBS_JOBFS
Job scratch space, jobfs, can be requested as has been the experience in the past. Currently small JOBFS requests will be supplied by local disk and larger ones placed into larger scratch spaces on Lustre. Currently this means that large JOBFS requests are using /short. PBSPro are working on a fix so that the /short quota for the project will be increased by the JOBFS request for the duration of the job so that you do not run out of quota on /short. This is not yet in place so, if you are experiencing I/O error or messages such as Disk quota exceeded please contact us to have your /short quota increased so you can run jobs with large JOBFS requests.
Copyq and mdss command
The copyq queue has a maximum walltime request of 40hrs. The default is 10 hrs if you do not specifically request the walltime.
Quotas and Reporting
We have moved to a new accounting and reporting system for raijin, which is not integrated with the old accounting database used on vayu. This means that commands such as quotasu, quota -v etc will not work. A new version of quotasu is called nci_account. It provides more information than the current command and allows for reporting of usage that are funded by multiple partners as well as giving information on grants for storage. To use it do e.g.
nci_account -P z00 -p 2013.q2
Note that we are still modifying the format and updating information presented. However, if you notice issues please contact us.
Note that the units used for SU/CPU for the grant and queue tables may be different e.g. We use dynamic units in the budget table (KSUs, MSU), and SUs in the per-queue table.
File system quotas are currently being implemented using the raw Lustre file system quotas. Initially these will be fairly restrictive but please contact email@example.com if you believe you have a case for your project to be changed. As with our systems in the past, the /home quota is set quite small but there is much more space available in /short/$PROJECT. The command /opt/bin/lquota will give you details on your project's quota limits and usage.
Note that as these are hard Lustre filesystem limits if you exceed them you will not be able to write files and will receive the error Disk quota exceeded. We will be introducing soft quotas (that will stop the running of PBS jobs) in the near future.
The /g/data filesystem is not yet mounted on raijin and will not be until the hardware is physically shifted to the new machine room. This is expected to be completed by late August.
Using mpirun in batch jobs
Our PBSPro does not currently support cpusets so it is possible for two small (i.e. fewer than 16 cpu) OpenMPI jobs to be scheduled to run on the same cpus. Experience suggests that using
mpirun -np $PBS_NCPUS -bind-to-none run.exe
will avoid this problem. We will be investigating this more and may modify the mpirun wrapper to automate this process but expect that future releases of PBSPro will handle cpusets.
IPM profiling is now available for openmpi/1.6.x, module load ipm in your batch script to generate a profile, ipm_view is now available.
- If a node fails while running a job it may disappear from the queue. There should be an explanation in the .o. If you notice a job disappear without any explanation, please let us know the jobids.
- We are still making some changes to the Lustre filesystem setup as the performance of reads was slightly lower than expected. This temporary arrangement will be addressed in the next couple of weeks when some additional hardware is installed.
- SOme users have reported that IPM is producing incorrect % communication figures for MPI jobs using more than about 720 cpus. It is correct up to 540 cpus but may be incorrect for larger jobs. We are investigating fixing this but, in the meantime, you can extract the information from the final tables produced by ipm_view