Monitoring and Alert Systems
From HallCWiki
Jump to navigationJump to searchMonitoring and Alert Systems
EPICS Alarm Handler
During production the EPICS alarm handler should always be running.
- See Hall C Alarm Handler and Hall C EPICS
Munin
MUNIN is an open source general purpose monitoring system used to track and monitor a large variety of systems across Hall C. This service is always running and is unrelated to the EPICS infrastructure (other than using EPICS PVs as a data source for some systems.)
Run "go_munin" on a Hall C computer to bring up the monitoring graphs, or connect to https://hallcweb.jlab.org/munin/ directly.
Monitored systems presently include:
- The majority of Hall C linux hosts
- Gas system flows, temperatures, pressures Gas Shed, Hall A GEM gas
- HVAC status in G0 cage (where HV crates and other critical systems reside, and
- DAQ crate power and temperature information
Deployment
- The primary MUNIN server runs on
cvideo1.jlab.org
, but MUNIN clients run on the majority of linux hosts in the Hall. MUNIN is a 'pluggable' system that can be broadly extended with scripts that deliver data to the software in a standardized format. See the documentation on MUNIN for details. - The Hall C Puppet system automatically deploys the munin client on new hosts, but those hosts must be manually added to the server config under 'cvideo1:/etc/munin/conf.d/'
- Aspects of the configuration can be modified under /etc/munin/conf.d/ if you are in the (local) 'munin' unix group on cvideo1.
Alerts
Munin can be configured to send notifications via email and/or text-message if a monitored value exceeds threshold. Email notifications can occur every 5 minutes until the problem is addressed, so appropriate filtering/redirection in your mail client is recommended.
Notifications are sent to the HallC_Alarm_Notifications Mailing List. Subscribe if you wish to see them.
NOTE: This can be a very high-volume list when things go south in the Hall. It is strongly recommended to configure your mail reader/system to filter messages from that list into a dedicated folder.