Difference between revisions of "Hall C CODA/DAQ Layout"

From HallCWiki
Jump to navigationJump to search
 
(5 intermediate revisions by one other user not shown)
Line 8: Line 8:
 
See also: [[Hall C Compute Cluster]]
 
See also: [[Hall C Compute Cluster]]
  
There are two primary hosts dedicated to running the SHMS and HMS DAQs:
+
The primary host for the "coinc" config is cdaql6
* HMS: coda@cdaql5
+
* "coinc" or "NPS" run on coda@cdaql6.
* SHMS: coda@cdaql6
+
* HMS-standalone: coda@cdaql5
When running in 'coincidence' mode, all ROCs (SHMS+HMS) are picked up by the 'SHMS' configuration running under coda@cdaql6
+
* SHMS DAQ is disabled at the moment.
 +
 
 +
When running in 'coincidence' mode, all ROCs (NPS+HMS) are picked up.
  
 
There is nothing 'special' about those machines however.  If needed, failover to another host by replacing 'cdaql6' with a new/different host in
 
There is nothing 'special' about those machines however.  If needed, failover to another host by replacing 'cdaql6' with a new/different host in
 
* coda:bin/coda_user_setup
 
* coda:bin/coda_user_setup
 
* '''coda:bin/run-vncserver'''  -- This will start a new CODA VNC server session (should only be run on cdaql5 or cdaql6).  This will run @reboot via crontab
 
* '''coda:bin/run-vncserver'''  -- This will start a new CODA VNC server session (should only be run on cdaql5 or cdaql6).  This will run @reboot via crontab
* CODA msqld server
+
* BTW, we don't use msqld anymore, it was for the older version of CODA.
** See '''coda:bin/run-msqld'''  -- It is presently started through crontab @reboot under coda@cdaql6.
 
 
 
They share a common NFS mounted 'coda' directory.  The /home/coda mount is 'special'.  It is hosted on cdaqfs1 along with the rest of the filesystems, but is handled differently to avoid filesystem size limitations with binary components in the CODA 2.6 environment (and perhaps vxworks cross-compiler toolchain).  That limitation should be removed when  Hall C migrates to CODA 3.0.
 
  
 
== CODA support software ==
 
== CODA support software ==
Line 25: Line 24:
 
* There are multiple 'README' files in that directory and its children that describe the intended execution flow and `best practices'
 
* There are multiple 'README' files in that directory and its children that describe the intended execution flow and `best practices'
 
* Log files in coda:debug_logs/ may be useful in understanding problems.
 
* Log files in coda:debug_logs/ may be useful in understanding problems.
 +
* [[CODA support files detailed notes]]
  
 
=== ROC code ===
 
=== ROC code ===
* Run 'golinuxroc' as coda@cdaql1 to be moved into the ROC software directories for the linux/intel ROCs
+
* Only the experts should be changing the ROC code.
** Follow the instructions and type 'go_hcvme0X' (X:1, 2, 4, 5, 7, 8) after running the above script
 
** The files are stored on cdaqfs1 and NFS mounted at /net/cdaqfs1/cdaqfs-coda-home/pxeboot/
 
** The PXE boot options are delivered by the JLab central DHCP server to all hosts (non-PXE systems ignore them).  At present they are:
 
    filename "linux-diskless/pxelinux.0";    # Bootloader program
 
    next-server hcpxeboot.jlab.org;          # TFTP server (hcpxboot is a CNAME for cdaqfs1 at present)
 
 
 
 
 
* Run 'govxworksroc' as coda@cdaql1 to be moved into the vxworks directory and establish the PPC cross-compiler, etc
 
** The files are physically located on the NFS mount (cdaqfs1): /home/coda/coda/{crl,boot}/
 
  
 
=== Experiment Changeover Tasks ===
 
=== Experiment Changeover Tasks ===
* Ensure all CODA files from prior run are pushed to the old tape destination and removed from the data/raw/ and data/raw.copiedtotape/ directories
+
This is probably a little obsolete but I'll leave it for now (noted Aug 2023).
** Move any 'orphan' files from ~coda/data/raw/ ~coda/data/raw.copiedtotape/ (orphans can happen if there is a problem with CODA end-of-run or with the tape system.
+
* Ensure all CODA files from prior run are pushed to the old tape destination and removed from the data/raw/ directory
 +
** Move any 'orphan' files from ~coda/data/raw/  
 
** Run <code>jmirror-sync-raw.copiedtotape push</code> as ''coda@cdaql1''
 
** Run <code>jmirror-sync-raw.copiedtotape push</code> as ''coda@cdaql1''
 
*** Note that files will only be removed from the local system by jmirror if both the original copy and duplicate copy are on tape.  Either repeat the <code>jmirror-sync-raw.copiedtotape push</code> command after the tape system has made the duplicates, or wait and the cron job will clear things up for you within a few days.
 
*** Note that files will only be removed from the local system by jmirror if both the original copy and duplicate copy are on tape.  Either repeat the <code>jmirror-sync-raw.copiedtotape push</code> command after the tape system has made the duplicates, or wait and the cron job will clear things up for you within a few days.
Line 50: Line 42:
 
* Confirm the trigger table mapping is consistent with the Experimental requirements.
 
* Confirm the trigger table mapping is consistent with the Experimental requirements.
 
** This table sets the 'trigger bits' that the Trigger Master adds to its data header to flag whether an particular trigger involved the SHMS, HMS, etc.
 
** This table sets the 'trigger bits' that the Trigger Master adds to its data header to flag whether an particular trigger involved the SHMS, HMS, etc.
** See hcvme01.c (SHMS-single and COIN DAQ configurations), and hcvme02.c (HMS single-arm DAQ configuration).  Helper scripts to generate the table are in the respective ROC code directories under 'hallC_triggerTable/'
+
 
 
* Update the target ladder list in '~coda/coda/scripts/runstart_gui.tcl' to match actual target ladder (allows the prescale GUI to auto select the in-use target).
 
* Update the target ladder list in '~coda/coda/scripts/runstart_gui.tcl' to match actual target ladder (allows the prescale GUI to auto select the in-use target).
* See also [[DAQ/Trigger Run Check Lists]] (NOTE: this is getting a little dated)
+
 
 
* Trigger timing log entries / snapshots are in the logbook and are also recorded here: [[Trigger History]]
 
* Trigger timing log entries / snapshots are in the logbook and are also recorded here: [[Trigger History]]
 
* Ensure the <code>go_analysis</code> script is updated and pointing at the right replay scripts.
 
* Ensure the <code>go_analysis</code> script is updated and pointing at the right replay scripts.
Line 59: Line 51:
  
 
== Data Movers ==
 
== Data Movers ==
The 'data mover' algorithm takes care of copying CODA data files from the Hall C system(s) to tape.
+
The 'data mover' algorithm takes care of copying CODA data files from the Hall C system(s) to tape.  Please don't interfere with this.
 
 
The initial copy is done using 'coda:coda/scripts/copy_file_to_tape.sh', triggered when each CODA run is stoppedIt initiates a 'jput' ''from the system that has the data drive mounted'' to minimize unnecessary network traffic.  See the script for details.
 
* Log files in coda:debug_logs/ may be useful in understanding problems.
 
 
 
Clean-up of the local files is managed by the 'jmirror' tool through the following cron entry and associated script running under coda@cdaql1 (again, this should run on the system with the data file system physically attached to avoid unneeded network traffic).
 
  
There are other crontab entries under coda@cdaql1 that monitor the file transfers for 'stuck' files or other issues and will email responsible parties.
+
The mover relies on jmirror.
  
* See 'coda:bin/jmirror-sync-raw.copiedtotape -h' for a list of options (runs jmirror in a few different "modes")
+
Clean-up of the local files is managed by the 'jmirror' tool.
  
=== crontab on coda@cdaql1 ===
+
There are crontab entries under coda@cdaql5 that monitor the file transfers for 'stuck' files or other issues and will email responsible parties.
<pre>
 
## 'jmirror' verifies (via crc/md5sum) that all files in the raw.copiedtotape/ directory
 
## are in fact on tape.  (If they are not, it will copy them now).  Once they are
 
## verified to be on tape, it will remove them from the Hall C file system.
 
##
 
## Leave files on local disk for a nominal 48 hours before removing.
 
##  Files will only be removed from local disk if both the original and 'dup'
 
##  copies have been written and verified to be on tape.
 
@daily /$HOME/bin/jmirror-sync-raw.copiedtotape 2>&1 | egrep -v 'Found 0 files, comprising 0 bytes.|^WARN  Ignoring zero-length file:|already exists|^WARN  Unable to load previously calculated MD5'
 
## Sanity check of file count in ~/data/raw
 
## (Should be small unless there's an issue with CODA crashing before end of run, or file transport to tape)
 
## - Should probably convert this crontab entry to munin plugin at some point
 
@daily if [ `ls /$HOME/data/raw/ | wc -l` -gt 6 ]; then ( date; echo "$USER@$HOST"; echo "Warning: Extra files in $HOME/data/raw.  Verify things are working and manually move (non-active!) files to ../raw.copiedtotape.  A cronjob will ensure they are pushed to tape later." ); fi;
 
## clean up $HOME/debug_logs/
 
@daily /usr/sbin/tmpwatch -qs --ctime 7d /$HOME/debug_logs/
 
</pre>
 
  
 
== Git repos ==
 
== Git repos ==

Latest revision as of 15:54, 24 June 2024

Hall C CODA Layout

  • Detailed 'User' instructions are on the Hall C DAQ page. That includes the ROC layout, standard recovery procedures, etc. Read and understand that first.
    • Includes instructions on updating F250 pedestals, switching DAQ modes, recovery procedures, etc.

CODA process and file locations

See also: Hall C Compute Cluster

The primary host for the "coinc" config is cdaql6

  • "coinc" or "NPS" run on coda@cdaql6.
  • HMS-standalone: coda@cdaql5
  • SHMS DAQ is disabled at the moment.

When running in 'coincidence' mode, all ROCs (NPS+HMS) are picked up.

There is nothing 'special' about those machines however. If needed, failover to another host by replacing 'cdaql6' with a new/different host in

  • coda:bin/coda_user_setup
  • coda:bin/run-vncserver -- This will start a new CODA VNC server session (should only be run on cdaql5 or cdaql6). This will run @reboot via crontab
  • BTW, we don't use msqld anymore, it was for the older version of CODA.

CODA support software

  • The start-/end-of-run scripts, EPICS logger scripts, RunStart GUI, Prescale GUI are located in coda:coda/scripts/
  • There are multiple 'README' files in that directory and its children that describe the intended execution flow and `best practices'
  • Log files in coda:debug_logs/ may be useful in understanding problems.
  • CODA support files detailed notes

ROC code

  • Only the experts should be changing the ROC code.

Experiment Changeover Tasks

This is probably a little obsolete but I'll leave it for now (noted Aug 2023).

  • Ensure all CODA files from prior run are pushed to the old tape destination and removed from the data/raw/ directory
    • Move any 'orphan' files from ~coda/data/raw/
    • Run jmirror-sync-raw.copiedtotape push as coda@cdaql1
      • Note that files will only be removed from the local system by jmirror if both the original copy and duplicate copy are on tape. Either repeat the jmirror-sync-raw.copiedtotape push command after the tape system has made the duplicates, or wait and the cron job will clear things up for you within a few days.
  • Update the '~coda/coda/scripts/DATAFILE-Locations.sh' to point at the new 'raw/' tape destination
    • NOTE: Only do this after all prior files have been moved or you can get files mixed up on tape.
    • This file is used by CODA and the data mover scripts to move raw CODA files at end of run (and watch for and correct file transfer interruptions, etc)
  • Update the 'T1, T2, ... T6' cables into the Trigger Master(s) modules to match Experimental requirements.
    • Note: The EDTM system is designed to trigger all detector pretriggers (3/4, PbGl, Cerenkovs, etc) with timing similar to what the physics will generate (including SHMS+HMS coincidences) so that can get you quite close pre-beam. However, the timing will need to be checked/tweaked when beam arrives (of course).
  • Confirm the trigger table mapping is consistent with the Experimental requirements.
    • This table sets the 'trigger bits' that the Trigger Master adds to its data header to flag whether an particular trigger involved the SHMS, HMS, etc.
  • Update the target ladder list in '~coda/coda/scripts/runstart_gui.tcl' to match actual target ladder (allows the prescale GUI to auto select the in-use target).
  • Trigger timing log entries / snapshots are in the logbook and are also recorded here: Trigger History
  • Ensure the go_analysis script is updated and pointing at the right replay scripts.
    • Update the cdaq:hallc-online/ symlink to point at the new experiment directory
    • Ensure symlinks in the new experiment directory are pointing at the right Hall C cluster /net/cdaq/cdaqlXdata/cdaq/<experiment> directories and are not writing files into the /home/cdaq/ directory.

Data Movers

The 'data mover' algorithm takes care of copying CODA data files from the Hall C system(s) to tape. Please don't interfere with this.

The mover relies on jmirror.

Clean-up of the local files is managed by the 'jmirror' tool.

There are crontab entries under coda@cdaql5 that monitor the file transfers for 'stuck' files or other issues and will email responsible parties.

Git repos

All of the CODA configurations, coda:bin/, and other directories are maintained with git.

The remote repos are stored on the 'hallcgit.jlab.org' server.