A typical HPC cluster layout
The architecture of a cluster is pretty straightforward. The master node is connected with one ethernet port to the outside world and with one Ethernet port to the management network of the cluster. Normally there is fast network available (low Latency) for parallel calculations (InfiniBand or Omni path) and one or more storage nodes (In smaller setups, the storage can also be integrated into the master node) Compute nodes are connected with the master node via the management network and via the fast network. A third network for Out Of Band (OoB) management is also commonly used.
Custom cluster installation
Together with the customer we can generate a profile of the desired and suitable hardware. And create an optimal cluster configuration. Where one type of simulation software runs best with less cores and a high clock frequency, other simulation software scales very well and is more efficient with many cores per node. Also, memory usage varies per program and per calculation model. Sometime scaling can be more beneficial with the use of AMD EPYC™ CPU’s instead of Intel® Xeon® CPU’s and vice versa.
Other issues that need to be taken into account are:
- The amount of storage
- The degree of redundancy of head node, storage nodes and/or login nodes.
You can even choose to bundle head node, storage node and login node into one machine. The redundancy needs to come out of the hardware components like drives, CPU’s, Power supplies etc…
A turnkey cluster is the easiest option, you just order a Linux preinstalled cluster, ready to operate. Beforehand we make some choices together, such as: which Linux distribution will be used (e.g. CentOS, Debian/Ubuntu), How new nodes will be deployed, and what other cluster components needs to be installed to make operating as convenient as possible.
The Linux distribution is often determined by the compatibility of the calculation software that is used. If conflicts occur, we can sort them out and create a workable solution.
Cluster Access and Data Exchange
Users would like to have access to the cluster as easy as possible. Depending on the degree of Linux knowledge (especially the bash shell skills) users can choose for a pure SSH access or for a more graphical frontend. The exchange of data between the Office and the HPC-cluster can be handled in different ways. For example, sftp, samba or rsync.
Data: Most users operate the cluster from a Windows workstation. These users want their models to run on the cluster and receive the calculated results back to their workstation or network-share. During installation, we help the customers to connect their Office network to the compute cluster.
Interact: The connection to the HPC cluster can be established via a terminal application such as PuTTY (command-line) or by using a remote desktop application, where the user gets a complete graphical Linux-desktop. The latter can be useful when the simulation software is only available for Linux and supports a GUI. Also for users that are not so skilled in the use of the Linux command-line.
Batch job scheduler
A job scheduler is useful if: There are more than one user, if there are a lot of jobs to run after each other (a job may only run if the previous is finished), or if there is difference in job-priority or user-priority.
Schedulers that can be installed are: Torque/Maui, Slurm (open source) or LSF (commercial). Out of these schedulers, LSF has the best windows support for submitting jobs and it also supports Windows compute nodes.
Slurm is most commonly used in large supercomputing centers and Torque/Maui is a simple alternative. Which scheduler is best depends on user experience and software compatibility. If the software has a command-line interface, it can also be used via a scheduler. Only the use of software plugins, can determine what scheduler suits the application best.
Torque (Terascale Open Source Resource and Queue Manager)
Torque, the open source resource manager, manages hardware resources and controls batch jobs within distributed compute nodes. Batch operation ensures efficient use of existing cluster hardware. Leading HPC organizations around the world have actively promoted the development of the PBS-based community project, and their contributions have increased scalability and fault tolerance. The individual resources (compute cores, memory, etc.) of the various cluster nodes are mapped to a configuration file that, in combination with a scheduler, guarantees optimal job distribution within the cluster. Torque is often combined with the non-commercial cluster scheduler Maui (or the commercial variant MOAB), which improves the administration, scheduling and overall exploitation of the cluster.
Maui Cluster Scheduler
Maui is an open-source job scheduler and workload manager for HPC clusters and supercomputers optimized for key scheduling strategies. For example, dynamic priorities, reservations or even policies based on the “fair share” principle can be realized. Cluster Scheduler Maui is currently present in public, academic and commercial environments worldwide. Both Torque and Maui are open source and thus freely available. The license to use these programs is free, as are all software updates. We are happy to assist you with advice and other services when upgrading your software to the latest version.
SLURM (Simple Linux Utility for Resource Management)
SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Platform Load Sharing Facility (LSF)
Platform LSF – the industry’s most powerful, comprehensive, policy driven workload management solution for engineering and scientific distributed computing environments. By scheduling workloads intelligently according to policy. LSF is a workload management platform, job scheduler, for distributed high-performance computing. It can be used to execute batch jobs on networked Unix and Windows systems on many different architectures. Since to acquisition by IBM the name has been altered into IBM Spectrum LSF
After the cluster is installed and ready for operation, the cluster can be managed remote (by us) and/or local by the customer.
- By us: the cluster is delivered, installed and (remote) controlled by a support contract or a prepaid service contract. No hassle and worries for the customer.
- By us in combination with the customer: some (small) management tasks are done by the customer, other (more complicated) business by us, via a support contract or prepaid contract.
- By the client: the cluster is managed by the customer. Sometimes this can be accompanied by the requirement to an easy-to-use cluster management system to install/use.
Generally, we see that the management by us the most appreciated. It provides the client ease of mind and ensures that the most highly skilled Linux users do not have to act as cluster system administrator.