Hall C Compute Cluster
- 1 Hall C Compute Cluster
Hall C Compute Cluster
Systems and (nominal) Functions
The Hall C compute cluster is composed of roughly 4 'classes' of machines. Hosts within these classes are intended to be largely interchangeable, allowing for easier upgrades and failover. === CODA / DAQ nodes (rackmount)
- cdaql5, cdaql6
- These are 'modestly' provision rackmount servers dedicated to running CODA. They each have ~5.5 TB of local disk intended to be used as a local buffer to the large NFS fileserver nodes if needed. It has never been needed, to date we have been fine with just pushing data over an NFS mount to cdaql1/2/3.
Compute / Fileserver nodes (rackmount)
- cdaql1, cdaql2, cdaql3
- These are generally pretty beefy machines with a lot of CPU and disk. They are intended for data storage and online replays (when feasible).
Right now, cdaql1:/data1 is the primary (NFS) destination volume for CODA data from cdaql5 and cdaql6. This is fine under present DAQ loads, but this will need to change for NPS/LAD and other data-heavy experiments.
'hcdesk' / User nodes (desktop)
- hcdesk1, 2, 3, ... : 'User' consoles in the Hall C counting house used by shift crew
- shmshut, hmshut : 'User' consoles in the respective spectrometer huts
These are relatively low-powered computers the primarily perform as consoles to hang monitors and a keyboard off of. All the real work is done on other hosts.
- cvideo1 -- rackmount machine that handles munin service and runs motion software that handles the cameras. See also Video Capture Systems
- cvideo2 -- desktop host that handles the 2 left-most large wall display screens
- cvideo3 -- desktop host that handles the 4 newest display screens
- cmagnets (VM) -- A Win10 VM hosted on cdaqfs1 that handles the 'go_magnets' spectrometer magnet controls
- skylla10 -- a rackmount Win10 host that hosts the Rockwell HMI software used to interact with the SHMS/HMS spectrometer PLCs
- cdaqbackup1 -- a rackmount host used to provide backups of (linux) Hall C systems. See #System Backups below.
- CNAMES (DNS 'aliases' allowing systems to be pointed at a new physical host with a single DNS change)
- hcpxeboot -> cdaqfs1
Puppet Configuration Management
- The Hall C cluster machines are configured and maintained using the open-source [Puppet] system.
- Main repo is hosted at: firstname.lastname@example.org:brads/hallc-puppet.git
- Updates/Upgrades are handled manually to minimize any surprises during Production
- Brad uses 'cssh' to periodically run global updates and/or push out configuration changes -- bug him for support.
- cdaqbackup1 is as an older rackmount host repurposed to provide backups of some important systems
- All cdaqfs1 NFS exports are backed up nightly (rsync images; no snapshotting)
- cdaqfs1:home/ is backed up nightly with' snapshots
- The backup software is [Borg Backup]
- This is handled by the script: cdaqbackup1:/data1/cdaqfs-backup/BACKUP-borg/borg-backup-cdaqfs-home.sh running on cdaqbackup1
Network Configuration / Management
- All systems on the Hall C network should be registered with the central systems. Talk to Brad Sawatzky and he will set you up quickly.
- Do not throw something on the network with a hardcoded IP address. That was fine 15 years ago, not a good plan in a modern network.
- The network layout is roughly described on the Hall_C_Network page, but that is deprecated and may be out of date. JNET should be considered canonical.
- vxWorks hosts presently boot off cdaql1 (184.108.40.206)
PXE boot (intel/Linux ROCs)
- Intel/Linux ROCs boot using the PXE mechanism. The PXE stanza is delivered by the CNI DHCP service to hosts on the 168 subnet:
tftp-host: hcpxeboot # Bootloader program tftp-path: linux-diskless/pxelinux.0 # TFTP server