Hall C Compute Cluster

From HallCWiki
Revision as of 18:23, 26 April 2022 by Brads (Talk | contribs) (Hall C Compute Cluster)

Jump to: navigation, search

Hall C Compute Cluster

Systems and (nominal) Functions

The Hall C compute cluster is composed of roughly 4 'classes' of machines. Hosts within these classes are intended to be largely interchangeable, allowing for easier upgrades and failover. === CODA / DAQ nodes (rackmount)

  • cdaql5, cdaql6

Compute / Fileserver nodes (rackmount)

'hcdesk' / User nodes (desktop)

Miscellaneous

  • cvideo1 -- rackmount machine that handles munin service and runs motion software that handles the cameras
  • cvideo2 -- desktop host that handles the 2 left-most large wall display screens
  • cvideo3 -- desktop host that handles the 4 newest display screens
  • cmagnets (VM) -- A Win10 VM hosted on cdaqfs1 that handles the 'go_magnets' spectrometer magnet controls
  • skylla10 -- a rackmount Win10 host that hosts the Rockwell HMI software used to interact with the SHMS/HMS spectrometer PLCs
  • cdaqbackup1 -- a rackmount host used to provide backups of (linux) Hall C systems. See #System Backups below.
  • CNAMES (DNS 'aliases' allowing systems to be pointed at a new physical host with a single DNS change)
    • hcpxeboot -> cdaqfs1

Puppet Configuration Management

  • The Hall C cluster machines are configured and maintained using the open-source [Puppet] system.
    • Main repo is hosted at: git@hallcgit.jlab.org:brads/hallc-puppet.git
    • Updates/Upgrades are handled manually to minimize any surprises during Production
      • Brad uses 'cssh' to periodically run global updates and/or push out configuration changes -- bug him for support.

System Backups

  • cdaqbackup1 is as an older rackmount host repurposed to provide backups of some important systems
    • All cdaqfs1 NFS exports are backed up nightly (rsync images; no snapshotting)
    • cdaqfs1:home/ is backed up nightly with' snapshots
      • The backup software is [Borg Backup]
      • This is handled by the script: cdaqbackup1:/data1/cdaqfs-backup/BACKUP-borg/borg-backup-cdaqfs-home.sh running on cdaqbackup1

Network Configuration / Management

  • All systems on the Hall C network should be registered with the central systems. Talk to Brad Sawatzky and he will set you up quickly.
    • Do not throw something on the network with a hardcoded IP address. That was fine 15 years ago, not a good plan in a modern network.
  • The network layout is roughly described on the Hall_C_Network page, but that is deprecated and may be out of date. JNET should be considered canonical.

vxworks boot

  • vxWorks hosts presently boot off cdaql1 (129.57.168.41)

PXE boot (intel/Linux ROCs)

  • Intel/Linux ROCs boot using the PXE mechanism. The PXE stanza is delivered by the CNI DHCP service to hosts on the 168 subnet:
  tftp-host:    hcpxeboot                        # Bootloader program
  tftp-path:    linux-diskless/pxelinux.0        # TFTP server