Tuesday, May 26, 2015

Using M/Monit Safely with AWS

In my main role, I am CTO of BitMEX.com, an up-and-coming Bitcoin derivatives exchange. We run on AWS, and we have a number of isolated hosts and a very restrictive and partitioned network.

At BitMEX, we are wary of using any monitoring platforms that could cause us to lose control. This means staying away from as much closed-source software as possible, and only using tried-and-true tools.

One of our favorite system monitoring tools is the aptly-named Monit. Monit is great at keeping on top of your processes in more advanced ways than simple pid-checking; it can check file permissions, send requests to the monitored process on a port (even over SSL), and check responses. It can do full-system monitoring (cpu, disk, memory) and has an easy-to-use config format for all of it.

Sounds good, right? Well, as nice as it is, we need to get notifications when a system fails - and fast. We configured all of our servers to send mail to a reserved email address that would follow certain rules, post a support ticket to our usual system, and ping Slack. But emails on EC2 are notoriously slow because EC2 could so easily be used for spam.

We found our postfix queue getting 30 minutes deep or more because of AWS's throttling. We use external monitoring tools so it's not always a problem; we generally find out about an outage within about 30 seconds. But if there is a more subtle problem (like a server running out of inodes or RAM) but it takes 30 minutes to get the alert, there's a problem.

I started reconfiguring our monit instances to simply curl a webhook so we'd get Slack notifications right away. That's good, but any change in the hook and I'd need to log into every single server and reconfigure. Besides, I could really use a full-system dashboard. What to do?


M/Monit solves this problem. It provides a simple (yeah, not the prettiest, but functional) dashboard for all hosts, detailed analytics, and configurable alerts. Now, I can just configure all of my hosts to alert on various criteria, and configure M/Monit's Alert Rules to take care of figuring out how severe it is, who should be notified, and how.

M/Monit supports actually starting, stopping, and restarting services via the dashboard, but that requires M/Monit to have the capability to connect to each server's Monit (httpd) server. I didn't want this; setting it up correctly is a pain (have fun generating separate credentials per-server for SSL) and it's a potential DoS vector. Thankfully, M/Monit runs just fine if you don't open the port.

Setting up M/Monit is pretty easy and I don't think I need to go through it; I recommend using SSL though, even inside your private network.

Here's how I set it up on each server (assuming existing monit configs):

set mmonit https://<user>:<pass>@<host>:<port>/collector 
  and register without credentials

That's it. Just reload the monit service and it'll start uploading data.

What's the credentials bit? Well, Monit supports automatically creating credentials at first start that M/Monit can use for controlling processes. But I didn't want to support that anyway, so I added this line to disable it. It's not a big deal if you omit it, and the control port is left closed; M/Monit just won't be able to connect, but it will continue monitoring properly.

That's it! I then used the script above as an alert mechanism inside Monit (Admin -> Alerts) and it happily started sending instant notifications to Slack. You can configure just about any type of webhook, mail notification, whatever.

It took surprisingly little time to get this right, and the team behind Monit deserves a lot of credit for this.


5 comments:

  1. The function is simply necessary to monitor actions in the process. For me it is simply irreplaceable and convenient.

    ReplyDelete
  2. M/Monit version solve a wide range of problems that seem to be inevitable to be sorted out by other versions. Moreover, it is very convenient in usage.

    ReplyDelete
  3. Jammin' Jars Casino - KTM Hub
    Jars Casino is in operation since 2005. The 제주 출장안마 casino has been owned and operated by the 춘천 출장마사지 Jardines Band 순천 출장마사지 of Southern California. The casino is 광주 출장마사지 open year-round. 안동 출장마사지

    ReplyDelete
  4. How to Win at Betfair: Casino, Sportsbook - Dr.
    Betfair, the world's biggest bookmaker, provides you with the widest selection of sports 화성 출장샵 & 여수 출장샵 markets, 광양 출장샵 from Premier League 성남 출장마사지 football, 군산 출장마사지 to Horse Racing and

    ReplyDelete
  5. CASINO GAMES | Online Slot Machines | Airjordan6Retro
    Play 사다리 사이트 online air jordan 18 retro yellow suede online store slots games or get air jordan 18 retro red suede visit the website for real money online at Jordan6 casino. We offer you jordan 18 white royal blue free shipping the best game how to order air jordan 18 retro racer blue experience!

    ReplyDelete