Monitoring and Alert Systems

From HallCWiki
Revision as of 16:16, 5 May 2022 by Brads (talk | contribs) (Created page with "= Monitoring and Alert Systems = == Munin == An open source general purpose monitoring system [https://munin-monitoring.org/ MUNIN] is used to track and monitor a large vari...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Monitoring and Alert Systems

Munin

An open source general purpose monitoring system MUNIN is used to track and monitor a large variety of systems across Hall C.

Run "go_munin" on a Hall C computer to bring up the monitoring graphs, or connect to https://hallcweb.jlab.org/munin/ directly.

Monitored systems presently include:

Deployment

  • The primary MUNIN server runs on cvideo1.jlab.org, but MUNIN clients run on the majority of linux hosts in the Hall. MUNIN is a 'pluggable' system that can be broadly extended with scripts that deliver data to the software in a standardized format. See the documentation on MUNIN for details.
  • The Hall C Puppet system automatically deploys the munin client on new hosts, but those hosts must be manually added to the server config under 'cvideo1:/etc/munin/conf.d/'
  • Aspects of the configuration can be modified under /etc/munin/conf.d/ if you are in the (local) 'munin' unix group on cvideo1.


Alerts

Munin can be configured to send notifications via email and/or text-message if a monitored value exceeds threshold. Email notifications can occur every 5 minutes until the problem is addressed, so appropriate filtering/redirection in your mail client is recommended.