Two new GPU nodes with 8× NVIDIA H200, 192 cores, 1.5 TB RAM, and 3.8 TB NVMe are now available in NEMO2 for testing via the “h200” partition. No software modules are installed yet.
Two new GPU nodes with 8× NVIDIA H200, 192 cores, 1.5 TB RAM, and 3.8 TB NVMe are now available in NEMO2 for testing via the “h200” partition. No software modules are installed yet.
NEMO2 has officially launched, transitioning from a testing phase to full production with expanded hardware, including AMD Instinct MI300A and Nvidia L40A nodes. NEMO1 is being phased out, with limited resources available until May 31st. Users are encouraged to transition to NEMO2 and consult the wiki for details.
NEMO2 uses Miniforge for conda environments, offering a streamlined setup with conda-forge as the default repository. The Miniforge module auto-initializes conda, simplifying environment activation without modifying shell profiles.
The AMD Genoa, Machine Learning and AI partitions for NEMO2 were delivered on December 4th. The acceptance of the storage has been delayed, so that NEMO2 could not yet start this year. However, calculations with the Milan nodes in NEMO1 are still possible.
The Genoa partition for NEMO2 will be delivered on December 4th, at the same time all old NEMO1 nodes will be removed. To ease the transition, some new Milan nodes will be booted into NEMO1 environment and will remain available until at least January 31st. Users are encouraged to switch to the new Milan nodes and use the ‘milan’ queue for their jobs (-q milan). If demand increases, additional nodes will be added next week. The launch of NEMO2 is delayed due to unavailable storage, with further updates on testing and data transfer to follow once it becomes available.
The Weka Storage and Milan partition have been successfully delivered for the new NEMO2 cluster. Testing, benchmarking, and system configuration will take place in the coming weeks. We anticipate starting with limited functionality and gradually expanding it over time. A portion of the old NEMO cluster had to be shut down to accommodate the installation of the Milan partition.
The 10th bwHPC Symposium will take place on September 25th and 26th, 2024 and will be hosted by the University of Freiburg. Registration and call for participation are now open.
The use of a second factor to secure logins to services is becoming increasingly mandatory. bwHPC currently uses time-based one-time passwords (TOTP) or Yubico OTP as a second factor for SSH logins. We have looked at some hardware security tokens for bwIDM/bwHPC that can be used instead of a mobile phone.
The initial partition of NEMO2, consisting of around 140 Milan nodes and 1000 terabytes (one petabyte) of high-speed storage space, has been ordered. A tender for a GPU partition and a second CPU partition will be opened in early 2024.