this post was submitted on 14 Jun 2023
2 points (100.0% liked)

Sysadmin

5612 readers
1 users here now

A community dedicated to the profession of IT Systems Administration

founded 5 years ago
MODERATORS
 

cross-posted from: https://lemmy.cloudhub.social/post/14149

What's everyone using for status monitoring and/or status pages either in their lab or at work?

I setup a status page for my fediverse instances using Uptime Robot (have an existing subscription), and the features are kinda lacking. I feel like they haven't really updated anything in the last 5 years which is unfortunate.

you are viewing a single comment's thread
view the rest of the comments
[–] redcalcium@c.calciumlabs.com 1 points 1 year ago* (last edited 1 year ago)

I run Vigil in a separate small VM. Vigil's features really suit my needs:

  • when a service is down, first it'll notify you via email (I use mailgun). If the service is still down for an extended period, it'll start texting you (via twilio).
  • it has the usual ping/http check to see if your services are up
  • it can even monitor services that's not reachable from the vigil instance (e.g. services that only accessible from local network) by using Agents. In addition to ping/http check, the agent can also run arbitrary commands. It basically can be used to monitor uptime of anything this way

The drawback of running your own monitoring service is the monitoring service itself can be down. Happened several times to me, and each time I was spammed with DEAD email alert, which immediately followed with HEALTHY alert email.

Edit: now that I think about it, I'll probably need to add my monitoring service into a monitoring service to monitor the monitoring service's uptime.