Useful links and batch job examples
for the latest versions of commonly used software on various computers. See
migration and note cedar and plato have the same setup except allocation. Note
changed from computecanada name to alliancecan and therefore please use this
fro support:
Links to Compute Canada login: https://ccdb.computecanada.ca/security/login
Main
storage for acl-jerzy group is on cedar and you can access it issuing at home
dir:
cd /project/rrg-jerzy-ab/jerzygroup
(recommended to create link: ln –s /project/rrg-jerzy-ab/jerzygroup
jerzygroup or:
ln -s /project/6004094/jerzygroup/ jerzygroup , if
above does not work.)
Please see there info on permissions in file: /jerzygroup/backup_records/permisionsetup
and under acl on: https://docs.alliancecan.ca/wiki/Sharing_data
use command quota to check used space and number of files.
Do not store wfc files produced by QE. Either delete them (rm *.wfc*) or use for backup:
rsync -rv --exclude '*.wfc*' * szpunarb@cedar.computecanada.ca:/home/szpunarb/dirnane
Not much used directories can be archived but please add REDME file to describe content:
In order to "zip" a directory, the correct command would be
tar -zcvf directory_name.tar.gz
directory_name/
This will tell tar to
RAC allocation is limited and default priority may
became higher than RAC, therefore please check your usage level for all
accessible allocations if your jobs do not run on currently used allocation and
compare using the commands (here compared default and RAC):
sshare -l -A def-szpunarb_cpu -a
--format=Account,User,EffectvUsage,LevelFS
Account User EffectvUsage LevelFS
def-szpunarb_cpu
0.000239 4.174797
sshare -l -A rrg-szpunarb-ad_cpu
-a --format=Account,User,EffectvUsage,LevelFS
Account
User EffectvUsage
LevelFS
rrg-szpunarb-ad_cpu 0.000063 1.508949
As you can see, the LevelFS for the default allocation is "4.174797" while it is only "1.508949" for rrg-* accounting group because of heavier usage of rg*
See more on (Ali):
https://docs.alliancecan.ca/wiki/Frequently_Asked_Questions#Why_are_my_jobs_taking_so_long_to_start.3F
Resources:
ComputeCanada: https://docs.alliancecan.ca/wiki/Technical_documentation
Good training: https://computecanada.github.io/2019-01-21-sfu/
WestGrid web including training
materials: https://westgrid.ca
Materials Getting
started videos http://bit.ly/2sxGO33
Compute Canada YouTube
channel http://bit.ly/2ws0JDC
Software |
Cedar |
Graham |
Beluga (40p/n)/narval (64p/n) |
QE |
|||
QE run array with%1 only |
|
sbatch --array=1-108%1 joname |
|
QE (pw, ph) Verif. 2020 |
jobcomplete_cedar_intel_QE6.2..2 jobcnotworkingomplete_cedar_gcc_QE6.2.2 for primitive unite cells: |
||
EPW |
|||
ShengBTE |
Beluga_BTE
complete_current .job Tested with QE6.2.2 |
||
almaBTE |
|
||
LAMMPS* |
|
||
GULP |
|
|
|
WIEN2K |
|
|
|
Phono3py Phonopy/phonons |
|
||
VASP (licensed) VASP (Jaya) |
|
|
|
Priority/share |
[jaya@cedar5 ~]$
sshare -U |
|
|
Notes: |
|
|
*Note that the web page is
managed on PC, with windows operating system, therefore when downloading the
above batch_job examples into unix system if it is in DOS format one should
reformat it to UNIX format by issuing for example command: dos2unix batch_job .
* The name of the executable may differ from one version to another.
If you take the whole node, you can ask for the whole memory by using: --mem=0 (see examples prepared by Jaya for Graham)
#To check current QE versions do:
module -r spider 'quantumespresso*'
Versions, expanded info on current:
quantumespresso/6.1
quantumespresso/6.2
quantumespresso/6.3
quantumespresso/6.4
module spider quantumespresso/6.4.1
nixpkgs/16.09 gcc/7.3.0 openmpi/3.1.2 (example in table)
nixpkgs/16.09 gcc/8.3.0
openmpi/4.0.1
nixpkgs/16.09 intel/2018.3
openmpi/3.1.2
nixpkgs/16.09 intel/2019.3
openmpi/4.0.1
To use the GCC build:module for which EPW works OK do:
module load gcc/5.4.0, but more recent intel on beluga works too and see
notes above.
module load quantumespresso/6.3, 6.4.1 lists resistivity.
To use the more common Intel build:module load quantumespresso/6.3. 6.4
version works OK on beluga with new intel on one node.
For detailed information about a specific "quantumespresso" module (including how to load the modules) use the module's full name.
For example:
$ module spider quantumespresso/6.4.1
So use
$ module load quantumespresso/6.4.1
to set up runtime environment.
Find the path example:
which thirdorder_espresso.py
/cvmfs/soft.computecanada.ca/easybuild/software/2017/avx/MPI/intel2016.4/openmpi2.1/shengbte/1.1.1/bin/thirdorder_espresso.py
Fixture for error in
diagonalization (Olivier):
1)
Davidson (default) uses serial diagonalization as checked on beluga for
this change in batch job:
srun pw.x
-ndiag 1 <DISP.un_sc.in. >
DISP.un_sc.in.out
2)
Switch to conjugate gradient method and play with mixing (input file,
electrons):
diagonalization = 'cg'
mixing_ndim=8 (default could be lowered e.g. to 4)
The most recent LAMMPS 2019, August is installed on cedar, graham and beluga.
Unfortunately, the package LATTE did not work. I will
have to find a way to install it separately.
To load the module, use:
module load nixpkgs/16.09 intel/2018.3 openmpi/3.1.2
lammps-omp/20190807
As for the scripts, the same ones from the
documentation should work:
https://docs.computecanada.ca/wiki/LAMMPS
Note that the name of the executable is: lmp_icc_openmpi
The following packages are included in this version:
$ cat ${EBROOTLAMMPS}/list-packages.txt | grep -a "Installed YES:"
Installed YES: package ASPHERE
Installed YES: package BODY
…..
The following packages are not supported in this
version:
$ cat ${EBROOTLAMMPS}/list-packages.txt | grep -a "Installed
NO:"
Installed NO: package GPU
Installed NO: package KOKKOS
……
To figure out what is the name of the executable for LAMMPS, do the following (example for lammps-omp/20170331):
module load lammps-omp//20170811
ls ${EBROOTLAMMPS}/bin/
lmp lmp_icc_openmpi
From this output, the executable is: lmp_icc_openmpi and lmp is a symbolic link to the executable. We have done the same for each version installed: there is a symbolic link to each executable and it is called lmp. It means that no matter which module you pick, lmp will work as executable for that module.
For detailed information about a specific "lammps" module (including how to load the modules) use the module's full name. For example: $ module spider lammps/20170331
Check available modules:
module available
To find other possible module matches execute:
$ module -r spider '.*lammps.*'
See also: slurm-directives on plato and cedar.
The details on performance of running job can be found via portal and login to see e.g. on narval:
https://portail.narval.calculquebec.ca/
Info on running
job:
scontrol show jobid 13448796
scontrol update
jobid=446153 timelimit=10-00:00:00
To cancel job:
scancel 13448796
The details about efficiency of completed job can be found via seff:
seff 40220208
Job ID: 40220208
Cluster: cedar
User/Group: szpunarb/szpunarb
State: TIMEOUT (exit code 0)
Nodes: 6
Cores per node: 48
CPU Utilized: 186-22:22:53
CPU Efficiency: 16.23% of 1152-01:12:00 core-walltime
Job Wall-clock time: 4-00:00:15
Memory Utilized: 5.54 GB
Memory Efficiency: 0.49% of 1.10 TB
Not very useful command on running job info:
sacct -j 39677871
Account User JobID
Start
End AllocCPUS Elapsed
AllocTRES
CPUTime AveRSS MaxRSS MaxRSSTask
MaxRSSNode NodeList ExitCode
State
---------- --------- ------------ ------------------- -------------------
---------- ---------- ------------------------------ ---------- ----------
---------- ---------- ---------- --------------- -------- --------------------
rrg-szpun+ szpunarb 39677871 2022-07-18T04:15:16
2022-07-20T04:15:24 48 2-00:00:08 billing=48,cpu=48,mem=187.50G+
96-00:06:24
cdr2183 0:0
TIMEOUT
https://www.rc.fas.harvard.edu/resources/documentation/convenient-slurm-commands/
https://researchcomputing.princeton.edu/support/knowledge-base/memory
https://westgrid.github.io/manitobaSummerSchool2018/4-materials.html
https://www.westgrid.ca/support/training
Info on new
servers:
https://docs.computecanada.ca/wiki/Available_software
https://docs.computecanada.ca/wiki/Project_layout
https://docs.computecanada.ca/wiki/Sharing_data
https://docs.computecanada.ca/wiki/Utiliser_des_modules/en
Note Ali presentation and note about LAMMPS:
In case you are interested in how the program scales,
I gave this week a presentation (online webinar) where I have presented few
slides about how the time spend in computing interactions between the particles
(in LAMMPS) scales with number of processors and number of particles. For a
given system, more processors you add, more communications between the
processors increase which kills the performance of the program. I used one type
of the potential but the idea is the same since for most all potentials, the
program spent more than 80% of the time in computing the pair interactions.
Even for a system with 1 000 000 particles, the efficiency drops for more than
16 or 32 cores. The results may differ a little bit if the shape of the
simulation box is different.
You can see the slides here:
https://www.westgrid.ca/events/introduction_classical_molecular_dynamics_simulations
Resources on local servers:
Globus
and other: https://www.usask.ca/ict/services/research-technologies/advanced-computing/plato/running-jobs.php
WestGrid
link: https://www.westgrid.ca/support/quickstart
Calcul Québec: https://wiki.calculquebec.ca/w/Accueil?setlang=en
https://wiki.calculquebec.ca/w/Ex%C3%A9cuter_une_t%C3%A2che/en
Plato:
https://wiki.usask.ca/display/ARC/Plato+technical+specifications
https://wiki.usask.ca/display/ARC/Quantum+ESPRESSO+on+Plato
https://wiki.usask.ca/display/ARC/LAMMPS+on+Plato
to search:
https://wiki.usask.ca/display/ARC/Advanced+Research+Computing
Note:
avx1 nodes have 16 processors with memory per node 310000M for use
avx2 nodes have 40 processors with memory per node 190000M for use
See: https://wiki.usask.ca/pages/viewpage.action?pageId=1620607096
Memory limit on plato: physical memory existent on
each node: 16 x 1.94 = 31.04 G
Increase memory by using more nodes.
More info on Plato according to Juan:
Time limits
(and priorities) are as follows:
long: 504 hours (lowest priority)
short: 84 hours (normal priority)
rush: 14 hours (highest priority)
jobs are simply sorted according to walltime. If you request less than 14 hours then the job goes to rush and gets a boost in priority.
Now, the ‘R’ or ’S’ stand for Researchers or Students. Students’ jobs have also higher priority, i.e. time limits are the same, but Srush has higher priority than Rrush.
Note
QE from Compute Canada only fully compatible on avx2
Batch jobs*: Note users who use
VASP on plato have a separate access account to all software and are required
to add in the second line(see VASP batch job) #SBATCH--account=hpc_p_szpunar
Software |
Plato |
guillimin |
Grex |
Bugaboo |
Jasper |
Third on avx2 |
|
|
|
|
|
QE |
|||||
QE (pw,ph) |
|
|
|
|
|
EPW |
|||||
BTE |
|
||||
almaBTE |
|
|
|
|
|
LAMMPS |
|||||
GULP |
|
|
|
|
|
VASP |
|
|
|
|
|
WIEN2K |
|
|
|
|
|
STATUS |
qstat –u UN |
showq –u UN |
showq –u UN |
showq –u UN |
showq –u UN |
*Note
that the web page is managed on PC, with windows operating system, therefore
when downloading the above batch_job examples into unix system if it is in DOS
format one should reformat it to UNIX format by issuing for example command: dos2unix batch_job .
# versions of QE on plato:
Versions:
quantumespresso/6.0
quantumespresso/6.1
quantumespresso/6.2.2
quantumespresso/6.3
quantumespresso/6.4.1
[...]
$ module spider
quantumespresso/6.4.1
[...]
You will need to load all
module(s) on any one of the lines below before the
"quantumespresso/6.4.1"
module is available to load.
nixpkgs/16.09
gcc/7.3.0 openmpi/3.1.2
nixpkgs/16.09
intel/2019.3 openmpi/4.0.1
[...]
$ module load gcc/7.3.0
$ module load openmpi/3.1.2
$ module load
quantumespresso/6.4.1
MIGRATION |
|||
UNIVERSITY |
CONSORTIA |
CLUSTER NAME |
DEFUNDING |
U. Guelph |
Compute Ontario |
Mako |
7/31/2017 |
U. Waterloo |
Compute Ontario |
Saw |
7/31/2017 |
U. Toronto |
Compute Ontario |
TCS |
9/27/2017 |
U. Alberta |
WestGrid |
Jasper |
9/30/2017 |
U. Alberta |
WestGrid |
Hungabee |
9/30/2017 |
U. Toronto |
Compute Ontario |
GPC |
12/31/2017 |
Dalhousie U. |
ACENET |
Glooscap |
4/18/2018 |
Memorial U. |
ACENET |
Placentia |
4/18/2018 |
St. Mary's U. |
ACENET |
Mahone |
4/18/2018 |
U. New Brunswick |
ACENET |
Fundy |
4/18/2018 |
McMaster U. |
Compute Ontario |
Requin |
4/18/2018 |
Queen's U. |
Compute Ontario |
CAC |
4/18/2018 |
U. Guelph |
Compute Ontario |
Global_b |
4/18/2018 |
U. Guelph |
Compute Ontario |
Redfin |
4/18/2018 |
U. Waterloo |
Compute Ontario |
Orca |
4/18/2018 |
Western U. |
Compute Ontario |
Monk |
4/18/2018 |
Western U. |
Compute Ontario |
Global_a |
4/18/2018 |
Western U. |
Compute Ontario |
Global_c |
4/18/2018 |
Western U. |
Compute Ontario |
Kraken |
4/18/2018 |
Simon Fraser U. |
WestGrid |
Bugaboo |
4/18/2018 |
U. British Columbia |
WestGrid |
Orcinus |
4/18/2018 |
U. Calgary |
WestGrid |
Parallel |
4/18/2018 |
U. Manitoba |
WestGrid |
Grex |
4/18/2018 |
Concordia U. |
Calcul Quebec |
Psi |
12/31/2018 |
McGill U. |
Calcul Quebec |
Guillimin |
12/31/2018 |
U. Laval |
Calcul Quebec |
Colosse |
12/31/2018 |
U. Montreal |
Calcul Quebec |
Cottos |
12/31/2018 |
U. Montreal |
Calcul Quebec |
Briarée |
12/31/2018 |
U. Montreal |
Calcul Quebec |
Hadès |
12/31/2018 |
U. Sherbrooke |
Calcul Quebec |
MS2 |
12/31/2018 |
U. Sherbrooke |
Calcul Quebec |
MP2 |
12/31/2018 |
|
|
|
|