Home » Ruby » How to monitor delayed_job with monit

How to monitor delayed_job with monit

Posted by: admin November 30, 2017 Leave a comment

Questions:

Are there any examples on the web of how to monitor delayed_job with Monit?

Everything I can find uses God, but I refuse to use God since long running processes in Ruby generally suck. (The most current post in the God mailing list? God Memory Usage Grows Steadily.)

Update: delayed_job now comes with a sample monit config based on this question.

Answers:

Here is how I got this working.

  1. Use the collectiveidea fork of delayed_job besides being actively maintained, this version has a nice script/delayed_job daemon you can use with monit. Railscasts has a good episode about this version of delayed_job (ASCIICasts version). This script also has some other nice features, like the ability to run multiple workers. I don’t cover that here.
  2. Install monit. I installed from source because Ubuntu’s version is so ridiculously out of date. I followed these instructions to get the standard init.d scripts that come with the Ubuntu packages. I also needed to configure with ./configure --sysconfdir=/etc/monit so the standard Ubuntu configuration dir was picked up.
  3. Write a monit script. Here’s what I came up with:

    check process delayed_job with pidfile /var/www/app/shared/pids/delayed_job.pid
    start program = "/var/www/app/current/script/delayed_job -e production start"
    stop program = "/var/www/app/current/script/delayed_job -e production stop"

    I store this in my soucre control system and point monit at it with include /var/www/app/current/config/monit in the /etc/monit/monitrc file.

  4. Configure monit. These instructions are laden with ads but otherwise OK.
  5. Write a task for capistrano to stop and start. monit start delayed_job and monit stop delayed_job is what you want to run. I also reload monit when deploying to pick up any config file changes.

Problems I ran into:

  1. daemons gem must be installed for script/delayed_job to run.
  2. You must pass the Rails environment to script/delayed_job with -e production (for example). This is documented in the README file but not in the script’s help output.
  3. I use Ruby Enterprise Edition, so I needed to get monit to start with that copy of Ruby. Because of the way sudo handles the PATH in Ubuntu, I ended up symlinking /usr/bin/ruby and /usr/bin/gem to the REE versions.

When debugging monit, I found it helps to stop the init.d version and run it from the th command line, so you can get error messages. Otherwise it is very difficult to figure out why things are going wrong.

sudo /etc/init.d/monit stop
sudo monit start delayed_job

Hopefully this helps the next person who wants to monitor delayed_job with monit.

Questions:
Answers:

For what it’s worth, you can always use /usr/bin/env with monit to setup the environment. This is especially important in the current version of delayed_job, 1.8.4, where the environment (-e) option is deprecated.

check process delayed_job with pidfile /var/app/shared/pids/delayed_job.pid
start program = "/usr/bin/env RAILS_ENV=production /var/app/current/script/delayed_job start"
stop  program = "/usr/bin/env RAILS_ENV=production /var/app/current/script/delayed_job stop"

In some cases, you may also need to set the PATH with env, too.

Questions:
Answers:

I found it was easier to create an init script for delayed job. It is available here: http://gist.github.com/408929
or below:

#! /bin/sh
set_path="cd /home/rails/evatool_staging/current"

case "$1" in
  start)
        echo -n "Starting delayed_job: "
                su - rails -c "$set_path; RAILS_ENV=staging script/delayed_job start" >> /var/log/delayed_job.log 2>&1
        echo "done."
        ;;
  stop)
        echo -n "Stopping sphinx: "
                su - rails -c "$set_path; RAILS_ENV=staging script/delayed_job stop" >> /var/log/delayed_job.log 2>&1
        echo "done."
        ;;
      *)
            N=/etc/init.d/delayed_job_staging
            echo "Usage: $N {start|stop}" >&2
            exit 1
            ;;
    esac

    exit 0

Then make sure that monit is set to start / restart the app so in your monitrc file:

check process delayed_job with pidfile "/path_to_my_rails_app/shared/pids/delayed_job.pid"
start program = "/etc/init.d/delayed_job start"
stop program = "/etc/init.d/delayed_job stop"

and that works great!

Questions:
Answers:

I found a nice way to start delayed_job with cron on boot. I’m using whenever to control cron.

My schedule.rb:

# custom job type to control delayed_job
job_type :delayed_job, 'cd :path;RAILS_ENV=:environment script/delayed_job ":task"'

# delayed job start on boot
every :reboot do
  delayed_job "start"
end

Note: I upgraded whenever gem to 0.5.0 version to be able to use job_type

Questions:
Answers:

I don’t know with Monit, but I’ve written a couple Munin plugins to monitor Queue Size and Average Job Run Time. The changes I made to delayed_job in that patch might also make it easier for you to write Monit plugins in case you stick with that.

Questions:
Answers:

Thanks for the script.

One gotcha — since monit by definition has a ‘spartan path’ of

/bin:/usr/bin:/sbin:/usr/sbin

… and for me ruby was installed / linked in /usr/local/bin, I had to thrash around for hours trying to figure out why monit was silently failing when trying to restart delayed_job (even with -v for monit verbose mode).

In the end I had to do this:

check process delayed_job with pidfile /var/www/app/shared/pids/delayed_job.pid
start program = "/usr/bin/env PATH=$PATH:/usr/local/bin /var/www/app/current/script/delayed_job -e production start"
stop program = "/usr/bin/env PATH=$PATH:/usr/local/bin /var/www/app/current/script/delayed_job -e production stop"

Questions:
Answers:

If your monit is running as root and you want to run delayed_job as my_user then do this:

/etc/init.d/delayed_job:

#!/bin/sh
#   chmod 755 /etc/init.d/delayed_job
#   chown root:root /etc/init.d/delayed_job

case "$1" in
  start|stop|restart)
    DJ_CMD=$1
    ;;
  *)
    echo "Usage: $0 {start|stop|restart}"
    exit
esac

su -c "cd /var/www/my_app/current && /usr/bin/env bin/delayed_job $DJ_CMD" - my_user

/var/www/my_app/shared/monit/delayed_job.monitrc:

check process delayed_job with pidfile /var/www/my_app/shared/tmp/pids/delayed_job.pid
start program = "/etc/init.d/delayed_job start"
stop  program = "/etc/init.d/delayed_job stop"
if 5 restarts within 5 cycles then timeout

/etc/monit/monitrc:

# add at bottom
include /var/www/my_app/shared/monit/*

Questions:
Answers:

Since i didn’t want to run as root, I ended up creating a bash init script that monit used for starting and stopping (PROGNAME would be the absolute path to script/delayed_job):

start() {
    echo "Starting $PROGNAME"
    sudo -u $USER /usr/bin/env HOME=$HOME RAILS_ENV=$RAILS_ENV $PROGNAME start
}

stop() {
    echo "Stopping $PROGNAME"
    sudo -u $USER /usr/bin/env HOME=$HOME RAILS_ENV=$RAILS_ENV $PROGNAME stop
}

Questions:
Answers:

I had to combine the solutions on this page with another script made by toby to make it work with monit and starting with the right user.

So my delayed_job.monitrc looks like this:

check process delayed_job
  with pidfile /var/app/shared/pids/delayed_job.pid
  start program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /var/app/current/script/delayed_job start' - rails"
  stop program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /var/app/current/script/delayed_job stop' - rails"

Questions:
Answers:

I have spent quite a bit of time on this topic. I was fed up with not having a good solution for it so I wrote the delayed_job_tracer plugin that specifically addresses monitoring of delayed_job and its jobs.

Here’s is an article I’ve written about it: http://modernagility.com/articles/5-monitoring-delayed_job-and-its-jobs

This plugin will monitor your delayed job process and send you an e-mail if delayed_job crashes or if one of its jobs fail.

Questions:
Answers:

For Rails 3, you may need set HOME env to make compass work properly, and below config works for me:

check process delayed_job
  with pidfile /home/user/app/shared/pids/delayed_job.pid
  start program = "/bin/sh -c 'cd /home/user/app/current; HOME=/home/user RAILS_ENV=production script/delayed_job start'"
  stop program  = "/bin/sh -c 'cd /home/user/app/current; HOME=/home/user RAILS_ENV=production script/delayed_job stop'"

Questions:
Answers:

I ran into an issue where if the delayed job dies while it still has a job locked, that job will not be freed. I wrote a wrapper script around delayed job that will look at the pid file and free any jobs from the dead worker.

The script is for rubber/capistrano

roles/delayedjob/delayed_job_wrapper:

<% @path = '/etc/monit/monit.d/monit-delayedjob.conf' %>
<% workers = 4 %>
<% workers.times do |i| %>
<% PIDFILE = "/mnt/custora-#{RUBBER_ENV}/shared/pids/delayed_job.#{i}.pid" %>
<%= "check process delayed_job.#{i} with pidfile #{PIDFILE}"%>
group delayed_job-<%= RUBBER_ENV %>
<%= " start program = \"/bin/bash /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper #{i} start\"" %>
<%= " stop program = \"/bin/bash /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper #{i} stop\"" %>
<% end %>

roles/delayedjob/delayed_job_wrapper

#!/bin/bash
<%   @path = "/mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper" %>

<%= "pid_file=/mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/shared/pids/delayed_job.$1.pid" %>
if [ -e $pid_file ]; then
 pid=`cat $pid_file`
 if [ $2 == "start" ]; then
   ps -e | grep ^$pid
   if [ $? -eq 0 ]; then
     echo "already running $pid"
     exit
   fi
   rm $pid_file
 fi

locked_by="delayed_job.$1 host:`hostname` pid:$pid"

<%="   /usr/bin/mysql -e \"update delayed_jobs set locked_at = null, locked_by = null where locked_by='$locked_by'\" -u#{rubber_env.db_user} -h#{rubber_instances.for_role('db', 'primary' => true).first.full_name}  #{rubber_env.db_name} " %>

fi
<%= "cd /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current" %>

. /etc/profile
<%= "RAILS_ENV=#{RUBBER_ENV} script/delayed_job -i $1 $2"%>

Questions:
Answers:

to see what is going on, run monit in foreground verbose mode: sudo monit -Iv

using rvm installed under user “www1” and group “www1”.

in file /etc/monit/monitrc:

#delayed_job
check process delayed_job with pidfile /home/www1/your_app/current/tmp/pids/delayed_job.pid
    start program "/bin/bash -c 'PATH=$PATH:/home/www1/.rvm/bin;source /home/www1/.rvm/scripts/rvm;cd /home/www1/your_app/current;RAILS_ENV=production bundle exec script/delayed_job start'" as uid www1 and gid www1
    stop program "/bin/bash -c 'PATH=$PATH:/home/www1/.rvm/bin;source /home/www1/.rvm/scripts/rvm;cd /home/www1/your_app/current;RAILS_ENV=production bundle exec script/delayed_job stop'" as uid www1 and gid www1
    if totalmem is greater than 200 MB for 2 cycles then alert