Difference between revisions of "Monitoring and Alert Systems"
From HallCWiki
Jump to navigationJump to search (Created page with "= Monitoring and Alert Systems = == Munin == An open source general purpose monitoring system [https://munin-monitoring.org/ MUNIN] is used to track and monitor a large vari...") |
|||
Line 1: | Line 1: | ||
= Monitoring and Alert Systems = | = Monitoring and Alert Systems = | ||
+ | |||
+ | == EPICS Alarm Handler == | ||
+ | During production the EPICS alarm handler should always be running. | ||
+ | * See [[Hall C Alarm Handler]] and [[Hall C EPICS]] | ||
== Munin == | == Munin == | ||
− | + | [https://munin-monitoring.org/ MUNIN] is an open source general purpose monitoring system used to track and monitor a large variety of systems across Hall C. This service is always running and is unrelated to the EPICS infrastructure (other than using EPICS PVs as a data source for some systems.) | |
− | |||
Run "go_munin" on a Hall C computer to bring up the monitoring graphs, or connect to https://hallcweb.jlab.org/munin/ directly. | Run "go_munin" on a Hall C computer to bring up the monitoring graphs, or connect to https://hallcweb.jlab.org/munin/ directly. |
Revision as of 16:19, 5 May 2022
Monitoring and Alert Systems
EPICS Alarm Handler
During production the EPICS alarm handler should always be running.
- See Hall C Alarm Handler and Hall C EPICS
Munin
MUNIN is an open source general purpose monitoring system used to track and monitor a large variety of systems across Hall C. This service is always running and is unrelated to the EPICS infrastructure (other than using EPICS PVs as a data source for some systems.)
Run "go_munin" on a Hall C computer to bring up the monitoring graphs, or connect to https://hallcweb.jlab.org/munin/ directly.
Monitored systems presently include:
- The majority of Hall C linux hosts
- Gas system flows, temperatures, pressures Gas Shed, Hall A GEM gas
- HVAC status in G0 cage (where HV crates and other critical systems reside, and
- DAQ crate power and temperature information
Deployment
- The primary MUNIN server runs on
cvideo1.jlab.org
, but MUNIN clients run on the majority of linux hosts in the Hall. MUNIN is a 'pluggable' system that can be broadly extended with scripts that deliver data to the software in a standardized format. See the documentation on MUNIN for details. - The Hall C Puppet system automatically deploys the munin client on new hosts, but those hosts must be manually added to the server config under 'cvideo1:/etc/munin/conf.d/'
- Aspects of the configuration can be modified under /etc/munin/conf.d/ if you are in the (local) 'munin' unix group on cvideo1.
Alerts
Munin can be configured to send notifications via email and/or text-message if a monitored value exceeds threshold. Email notifications can occur every 5 minutes until the problem is addressed, so appropriate filtering/redirection in your mail client is recommended.