Hall C CODA/DAQ Layout

From HallCWiki
Jump to: navigation, search

Hall C CODA Layout

  • Detailed 'User' instructions are on the Hall C DAQ page. That includes the ROC layout, standard recovery procedures, etc. Read and understand that first.
    • Includes instructions on updating F250 pedestals, switching DAQ modes, recovery procedures, etc.

CODA process and file locations

See also: Hall C Compute Cluster

There are two primary hosts dedicated to running the SHMS and HMS DAQs:

  • HMS: coda@cdaql5
  • SHMS: coda@cdaql6

When running in 'coincidence' mode, all ROCs (SHMS+HMS) are picked up by the 'SHMS' configuration running under coda@cdaql6

There is nothing 'special' about those machines however. If needed, failover to another host by replacing 'cdaql6' with a new/different host in

  • coda:bin/coda_user_setup
  • coda:bin/run-vncserver -- This will start a new CODA VNC server session (should only be run on cdaql5 or cdaql6). This will run @reboot via crontab
  • CODA msqld server
    • See coda:bin/run-msqld -- It is presently started through crontab @reboot under coda@cdaql6.

They share a common NFS mounted 'coda' directory. The /home/coda mount is 'special'. It is hosted on cdaqfs1 along with the rest of the filesystems, but is handled differently to avoid filesystem size limitations with binary components in the CODA 2.6 environment (and perhaps vxworks cross-compiler toolchain). That limitation should be removed when Hall C migrates to CODA 3.0.

CODA support software

  • The start-/end-of-run scripts, EPICS logger scripts, RunStart GUI, Prescale GUI are located in coda:coda/scripts/
  • There are multiple 'README' files in that directory and its children that describe the intended execution flow and `best practices'
  • Log files in coda:debug_logs/ may be useful in understanding problems.

ROC code

  • Run 'golinuxroc' as coda@cdaql1 to be moved into the ROC software directories for the linux/intel ROCs
    • Follow the instructions and type 'go_hcvme0X' (X:1, 2, 4, 5, 7, 8) after running the above script
    • The files are stored on cdaqfs1 and NFS mounted at /net/cdaqfs1/cdaqfs-coda-home/pxeboot/
    • The PXE boot options are delivered by the JLab central DHCP server to all hosts (non-PXE systems ignore them). At present they are:
    filename "linux-diskless/pxelinux.0";     # Bootloader program
    next-server hcpxeboot.jlab.org;           # TFTP server (hcpxboot is a CNAME for cdaqfs1 at present)

  • Run 'govxworksroc' as coda@cdaql1 to be moved into the vxworks directory and establish the PPC cross-compiler, etc
    • The files are physically located on the NFS mount (cdaqfs1): /home/coda/coda/{crl,boot}/

Experiment Changeover Tasks

  • Ensure all CODA files from prior run are pushed to the old tape destination and removed from the data/raw/ and data/raw.copiedtotape/ directories
    • Move any 'orphan' files from ~coda/data/raw/ ~coda/data/raw.copiedtotape/ (orphans can happen if there is a problem with CODA end-of-run or with the tape system.
    • Run jmirror-sync-raw.copiedtotape push as coda@cdaql1
      • Note that files will only be removed from the local system by jmirror if both the original copy and duplicate copy are on tape. Either repeat the jmirror-sync-raw.copiedtotape push command after the tape system has made the duplicates, or wait and the cron job will clear things up for you within a few days.
  • Update the '~coda/coda/scripts/DATAFILE-Locations.sh' to point at the new 'raw/' tape destination
    • NOTE: Only do this after all prior files have been moved or you can get files mixed up on tape.
    • This file is used by CODA and the data mover scripts to move raw CODA files at end of run (and watch for and correct file transfer interruptions, etc)
  • Update the 'T1, T2, ... T6' cables into the Trigger Master(s) modules to match Experimental requirements.
    • Note: The EDTM system is designed to trigger all detector pretriggers (3/4, PbGl, Cerenkovs, etc) with timing similar to what the physics will generate (including SHMS+HMS coincidences) so that can get you quite close pre-beam. However, the timing will need to be checked/tweaked when beam arrives (of course).
  • Confirm the trigger table mapping is consistent with the Experimental requirements.
    • This table sets the 'trigger bits' that the Trigger Master adds to its data header to flag whether an particular trigger involved the SHMS, HMS, etc.
    • See hcvme01.c (SHMS-single and COIN DAQ configurations), and hcvme02.c (HMS single-arm DAQ configuration). Helper scripts to generate the table are in the respective ROC code directories under 'hallC_triggerTable/'
  • Update the target ladder list in '~coda/coda/scripts/runstart_gui.tcl' to match actual target ladder (allows the prescale GUI to auto select the in-use target).
  • See also DAQ/Trigger Run Check Lists (NOTE: this is getting a little dated)
  • Trigger timing log entries / snapshots are in the logbook and are also recorded here: Trigger History
  • Ensure the go_analysis script is updated and pointing at the right replay scripts.
    • Update the cdaq:hallc-online/ symlink to point at the new experiment directory
    • Ensure symlinks in the new experiment directory are pointing at the right Hall C cluster /net/cdaq/cdaqlXdata/cdaq/<experiment> directories and are not writing files into the /home/cdaq/ directory.

Data Movers

The 'data mover' algorithm takes care of copying CODA data files from the Hall C system(s) to tape.

The initial copy is done using 'coda:coda/scripts/copy_file_to_tape.sh', triggered when each CODA run is stopped. It initiates a 'jput' from the system that has the data drive mounted to minimize unnecessary network traffic. See the script for details.

  • Log files in coda:debug_logs/ may be useful in understanding problems.

Clean-up of the local files is managed by the 'jmirror' tool through the following cron entry and associated script running under coda@cdaql1 (again, this should run on the system with the data file system physically attached to avoid unneeded network traffic).

There are other crontab entries under coda@cdaql1 that monitor the file transfers for 'stuck' files or other issues and will email responsible parties.

  • See 'coda:bin/jmirror-sync-raw.copiedtotape -h' for a list of options (runs jmirror in a few different "modes")

crontab on coda@cdaql1

## 'jmirror' verifies (via crc/md5sum) that all files in the raw.copiedtotape/ directory
## are in fact on tape.  (If they are not, it will copy them now).   Once they are
## verified to be on tape, it will remove them from the Hall C file system.
## Leave files on local disk for a nominal 48 hours before removing.
##   Files will only be removed from local disk if both the original and 'dup'
##   copies have been written and verified to be on tape.
@daily /$HOME/bin/jmirror-sync-raw.copiedtotape 2>&1 | egrep -v 'Found 0 files, comprising 0 bytes.|^WARN  Ignoring zero-length file:|already exists|^WARN  Unable to load previously calculated MD5'
## Sanity check of file count in ~/data/raw
## (Should be small unless there's an issue with CODA crashing before end of run, or file transport to tape)
## - Should probably convert this crontab entry to munin plugin at some point
@daily if [ `ls /$HOME/data/raw/ | wc -l` -gt 6 ]; then ( date; echo "$USER@$HOST"; echo "Warning: Extra files in $HOME/data/raw.  Verify things are working and manually move (non-active!) files to ../raw.copiedtotape.  A cronjob will ensure they are pushed to tape later." ); fi; 
## clean up $HOME/debug_logs/
@daily /usr/sbin/tmpwatch -qs --ctime 7d /$HOME/debug_logs/

Git repos

All of the CODA configurations, coda:bin/, and other directories are maintained with git.

The remote repos are stored on the 'hallcgit.jlab.org' server.