Gen01 syncfilter preprocessor

Overview

The Big Picture
What syncfilter Corrects (or flags)
syncfilter's Algorithm

Special Considerations
Usage Details
Limitations

Sample syncfilter Report Summary (annotated) Latest Version of syncfilter Source

Executive Summary
During Gen01, the DAQ was working independently from the scalers. Since there was no provision for any direct correlation, we cannot simply match the results from the replay with the scaler values. Further, the beam current monitors had a minimum reading well above 0, so even without beam the scalers were accumulating charge.
But even the DAQ itself had some issues that the replay engine is ill equipped to handle: if any part of the DAQ system crashed during a run, the raw data stream will be lacking an end-of-run marker, which causes the replay engine to crash, without outputting the analysis results. And there is a potential synchronization problem between the different FastBus crates.
These are the primary reasons why we need syncfilter -- it takes care of all of them. All you have to do is pipe the data to be analyzed by the replay engine through syncfilter. Pipes are frequently used in the UNIX world; they are simply a chain of commands that the data "flow" trough. Our replay engine is able to use this same approach, including the identical syntax. If we want to have the data modified by syncfilter before the replay engine gets a hold of them, all we need to do is modify the source file specification (most likely, this is in REPLAY.PARM). Instead of
g_data_source_filename = 'e93026_%d.log'
we simply use "|syncfilter" -- note the vertical bar -- to indicate that we want the data to be "piped" through syncfilter first:
g_data_source_filename = '|syncfilter e93026_%d.log'

syncfilter passes the prep'd data on to the replay engine and it also generates two output files. First, because now syncfilter is responsible for reading the raw data files and therefore has to handle the switching from one file segment to the next, is a short status file, comparable to the file statsrunno.txt, which indicates the number of the raw data file segment currently being processed. This file is called syncfilter.stat
Further, a log file called syncfilter.log is created. This file contains occasional progress indicators, indicators of (fixed) problems and a summary output. This summary is also an important aspect of syncfilter: it contains the proper scaler counts to use for the analyzed data. These are total charge by helicity state and the corresponding time, allowing for proper event rate calculation and current determination. A sample log file summary is explained here.

Overview
The replay engine commonly used to analyze Jlab's Hall C experiments has no event memory. This means that each event is processed as if it was the only event in the entire run. Only histograms and scalers accumulate information from event to event but without any correlation. This means that the data stream can only be processed sequentially; no out-of-order information can be used in the analysis process.

The data stream, however, includes numerous special events which contain information about the chunk of data that was just processed -- too late to change the analysis parameter for those events. To rectify this limitation, a pre-processor was created. The raw data are passed through this filter and flags are inserted to indicate the validity of the subsequent data.

Historically, the filter was used to detect synchronization problems between the different FastBus branches (see below), thus the name syncfilter. Since Gen01, this functionality has been significantly extended. All functions reduce to this, though: syncfilter inserts fake events into the raw data stream which indicate the conditions during which the subsequent data were acquired; this information is otherwise available only AFTER these data.

Since this is a filter, the data enter and exit as streams thus allowing redirection. This enables us to use syncfilter in a practically transparent fashion in the analysis engine's raw data file source specification.

The Big Picture
The following detailed discussion of syncfilter necessarily involves particulars of the experimental setup, especially DAQ related issues, and also some software items. To avoid confusion, I will briefly describe the relevant items here.

The electronic signals from the various detectors and other apparatus in the experiment enters the data stream recorded on tapes via two alternate avenues: the FastBus DAQ electronics and the scalers. A few other systems also exist, but their handling parallels one of these paths.

Prompted by a trigger, the FastBus path processes distinct events, each of which is completely independent of the others. The event data are read from the electronics by the DAQ system CODA and written to file as they occur.

The scaler system works differently. By intent, scalers continually respond to the signals fed into them and keep count. Once in a while, they are read out (also by CODA) and, possibly, reset. For Gen01, there were actually three different sets of scalers, distinct in the frequency with which they were read out. The traditional set, usually referred to as the asynchronous scalers, are read by CODA approximately every 2 seconds and their current value is then recorded into the data stream. This occurs together with the sync event which re-establishes synchronization between the different parts of the DAQ. We will refer to this two second period as a sync interval.

The helicity scalers, on the other hand, are read at the end of every helicity interval, the period during which the beam helicity is fixed, ~¹/₃₀ seconds. At the boundary between these "helicity buckets" the beam's helicity may change. Again, CODA inserts their values into the data file as a helicity event. It is important to realize that the two periodic events (scaler event and helicity event) do not have a fixed phase relationship, i.e. they occur completely independent of each other.

The third type of scalers used in Gen01 is the event scaler. These are actually a hybrid of a scaler event and a DAQ event: they occur at the same time as a triggered DAQ event and are inserted into the data together with the DAQ event.

The data-handling software CODA does a good amount of error checking as it processes the different events' data. At times, if trouble is detected, an error event is inserted into the data stream which is coded to indicate the specific error condition encountered. Some of these errors are found before the data are processed (many preclude the processing in the first place) and then the error event will be recorded in place of the data. Other cases, however, can only be detected once the data are processed and then the error code follows the data it relates to.

The raw data are recorded to a computer file which is eventually stored on tape. These data are later analyzed using a program called the replay engine (sometimes just replay). Since much of the analysis of experiments that run in Jlab's Hall C is similar, the standard code package CSOFT was developed. It provides a standard analysis engine, including the many utility routines that interface with the raw data file and decipher any hardware-specific data formats.

What syncfilter Corrects (or flags)
There are few data-related problems that can be repaired but many can be bypassed, if only by discarding some amount of the data. The actual data analysis is not what syncfilter is meant to do, instead it is intended to provide a way around some of the shortcomings of a real-world DAQ system. It therefore will not alter the recorded data but instead it adds additional information which is then used to make appropriate decisions in the replay engine. Some independent accounting of certain quantities does however take place. The following describes the various conditions syncfilter is designed to recognize and relay to the analysis engine in a fashion that allows the engine to deal with the issue in a timely manner.

fastbus crates out-of-sync
The DAQ uses several FastBus crates to process the data, each operating fairly autonomously. The respective data need to be matched up, however, so the crates have an internal counter which tracks the events. Approximately every 2 seconds the DAQ generates a synchronization event during which these counters get matched up. This event occurs together with a scaler event during which the asynchronous scalers are read out (and their data inserted into the data stream).
If at that time it is found that the event ID numbers do not match, an error event is inserted into the data stream to flag this condition. Essentially, this means that the last 2 seconds worth of data need to be discarded since we do not know at which point the error occurred. We do know that the error occurred in this latest sync interval so only those data are affected. Note that the flag necessarily follows the data.
scaler sync problem
Similar to the FastBus sync error, this is a synchronization problem in the scalers. Same effect as the FastBus sync, except that the only data affected are the scaler data. But we need those, so out go the correlated data, too. Again, this follows the data in question.
latch sync problem
Another sychronization error. This one is with the latch -- I don't know how many latches there are. Actually, all I know is that this is where we get the beam helicity from -- kindda vital, so the data hit the bucket (trash, not helicity!). And again, this follows the data in question.
missing end-of-run marker -- run ended abnormally
This one is different -- the data are ok. But our analysis engine doesn't like it and has the tendency not to properly close its files, which PAW in turn doesn't like -- especially for Ntuple files. Also, we like to have a scaler read with our data and the final one is missing. So we put a flag after the last asynchronous scaler read to mark the (improper) end of the raw data file. This makes engine is happy, too.
low beam current
This one is the reason why I started looking at this at all (well, wanted to -- a need existed anyway). Our beam current measuring devices (Beam Current Monitors) tell us not only the instantaneous beam current but also allow us to track the accumulated charge, even separately for each beam helicity. Unfortunately (there's that word again), they don't drop to zero when there is no beam. Actually, they act up way before then in that they become non-linear. So without beam they indicate some beam and at very low current they indicate slightly higher charge flow. Not good if you are integrating the readings over time. Since we cannot actually correct the readings (scalers only keep the sums, not the details) we have to selectively discard scaler reads that indicate that the average current was too low to be reasonable.
The actual determination of the beam current is based on the accumulated charge as indicated by a scaler. Since the current is continually measured and the scaler integrates these values into an amount of charge over a certain time interval, we need to properly define the start and the end of the interval. We also want the interval to be as short as possible to limit the impact of any problems. The scalers which satisfy these requirements best are the helicity scalers: the asynchronous scalers have a much longer time base, and the event scalers lack a well-defined time interval; the random nature of the occurrence of an event precludes us from using another (prior) physics event as a reference.
This means that for each helicity interval we extract from the corresponding helicity scaler a value for the charge accumulated in the interval and a measure of the elapsed time period. Properly normalized, the ratio provides the mean current for that ~1/30 second interval. Using the time recorded by the scaler instead of the expected time interval allows us to eliminate any inconsistencies or fluctuations but has a further advantage:
computer dead time
One set of the helicity scalers accumulates signals that are gated with a flag indicating the DAQ's readiness. If the DAQ is busy processing the previous event or otherwise occupied, physical events that might be observed by the electronics will nonetheless not be seen by the DAQ. By considering only the charge and the time when the DAQ is able to process events, we establish a direct correlation between the physical events and the beam charge.
Otherwise, this inconsistency (the scalers' dead time, if any, is different) would require a separate correction -- one which is not easy to define without any direct correlation. Note that this does not address any rate dependence in the response of the electronics (also called electronics dead time). The boundary between these is not easily explained but in practice quite well defined.