Introduction

User documentation for HPC resources at University of Manitoba

Since you have found this Website, you may be interested in Grex documentation. Grex is the University of Manitoba's High-Performance Computing system.

For experienced Grex users

Grex had a major, drastic Update on 9th year of its lifetime! The Update strongly affects the ways users interact with it: the OS updated to CentOS7, resource management software changed from Torque/Moab to SLURM, and communication libraries switched from MLNX OFED to RDMA-Core and UCX.

Thus, if you are a user experienced in the previous “version” of Grex, you might benefit from reading this dociment: Description of Grex changes.

The the old Westgrid documentation, hosted on the Westgrid website became irrelevant after the Grex upgrade, so please visit Grex's New Documentation.

For new Grex users

If you are a new Grex user, proceed to the quick start guide.

A Very Quick Start guide

  1. Create an account on CCDB. You will need and institutional Email address. If you are a sponsored user, you'd want to ask your PI for his CCRI code.
  2. After the CCDB account is approved, login to CCDB and apply for Westgrid Consortium account. Follow directions on portal.westgrid.ca to create Grex account.
  3. Wait for half a day. Install an SSH client, and SFTP client for your operating system.
  4. Connect to grex.westgrid.ca with SSH using your username/password from step 2.
  5. Make a sample job script, call it sleep.job . The job script is a text file that has a special syntax to be recognized by SLURM. You can use the editor nano , or any other right on Grex SSH prompt (vim, emacs, pico, …); you can also create the script file on your machine and upload to Grex using your SFTP client.
       #!/bin/bash
       #SBATCH --ntasks=1 --cpus-per-task=1
       #SBATCH --time=00:01 --mem-per-cpu=100mb
       echo "Hello world! will sleep for 10 seconds"
       time sleep 10
       echo "all done"
      
  6. Submit the script using sbatch command, to the compute partition

sbatch --partition=compute sleep.job

  1. Wait until the job finishes; you can monitor queue's state with the ‘sq’ command. When the job finishes, slurm-NNNN.out should be in the job directory.
  2. Download the output slurm-NNNN.out from grex.westgrid.ca to your local machine using your SFTP client.
  3. Congratulations, you have just ran your fist HPC-style batch job. This is the general workflow, more or less; you'd just want to substitute the sleep command to something useful, like your-code.x your-input.dat .