docs/scheduling.txt

   1 Advanced usage and scheduling notes
   2 ===================================
   3
   4 upsmon can call out to a helper script or program when the device changes
   5 state.  The example upsmon.conf has a full list of which state changes
   6 are available - ONLINE, ONBATT, LOWBATT, and more.
   7
   8 There are two options, that will be presented in details:
   9
  10 - the simple approach: create your own helper, and manage all events and actions
  11 yourself,
  12 - the advanced approach: use the NUT provided helper, called 'upssched'.
  13
  14
  15 The simple approach, using your own script
  16 ------------------------------------------
  17
  18 How it works relative to upsmon
  19 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  20
  21 Your command will be called with the full text of the message as one argument.
  22
  23 For the default values, refer to the sample upsmon.conf file.
  24
  25 The environment string NOTIFYTYPE will contain the type string of whatever
  26 caused this event to happen - ONLINE, ONBATT, LOWBATT, ...
  27
  28 Making this some sort of shell script might be a good idea, but the helper can
  29 be in any programming or scripting language.
  30
  31 NOTE: Remember that your helper must be *executable*. If you are using a script,
  32 make sure the execution flags are set.
  33
  34 For more information, refer to linkman:upsmon[8] and
  35 linkman:upsmon.conf[5] manual pages.
  36
  37 Setting up everything
  38 ~~~~~~~~~~~~~~~~~~~~~
  39
  40 - Set EXEC flags on various things in linkman:upsmon.conf[5]:
  41 +
  42         NOTIFYFLAG ONBATT EXEC
  43         NOTIFYFLAG ONLINE EXEC
  44 +
  45 If you want other things like WALL or SYSLOG to happen, just add them:
  46 +
  47         NOTIFYFLAG ONBATT EXEC+WALL+SYSLOG
  48 +
  49 You get the idea.
  50
  51 - Tell upsmon where your script is
  52
  53         NOTIFYCMD /path/to/my/script
  54
  55 - Make a simple script like this at that location:
  56
  57         #! /bin/bash
  58         echo "$*" | sendmail -F"ups@mybox" bofh@pager.example.com
  59
  60 - Restart upsmon, pull the plug, and see what happens.
  61
  62 That approach is bare-bones, but you should get the text content of the
  63 alert in the body of the message, since upsmon passes the alert text
  64 (from NOTIFYMSG) as an argument.
  65
  66 Using more advanced features
  67 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  68
  69 Your helper script will be run with a few environment variables set.
  70
  71 - UPSNAME: the name of the system that generated the change.
  72 +
  73 This will be one of your identifiers from the MONITOR lines in upsmon.conf.
  74
  75 - NOTIFYTYPE: this will be ONLINE, ONBATT, or whatever event took place which
  76 made upsmon call your script.
  77
  78 You can use these to do different things based on which system has
  79 changed state.  You could have it only send pages for an important
  80 system while totally ignoring a known trouble spot, for example.
  81
  82 Suppressing notify storms
  83 ~~~~~~~~~~~~~~~~~~~~~~~~~
  84
  85 upsmon will call your script every time an event happens that has the EXEC flag
  86 set.  This means a quick power failure that lasts mere seconds might generate a
  87 notification storm.  To suppress this sort of annoyance, use upssched as your
  88 NOTIFYCMD program, and configure it to call your command after a timer has
  89 elapsed.
  90
  91
  92 The advanced approach, using upssched
  93 -------------------------------------
  94
  95 upssched is a helper for upsmon that will invoke commands for you at some
  96 interval relative to a UPS event.  It can be used to send pages, mail out
  97 notices about things, or even shut down the box early.
  98
  99 There will be examples scattered throughout.  Change them to suit your
 100 pathnames, UPS locations, and so forth.
 101
 102 How upssched works relative to upsmon
 103 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 104
 105 When an event occurs, upsmon will call whatever you specify as a 'NOTIFYCMD'
 106 in your upsmon.conf, if you also enable the 'EXEC' in your 'NOTIFYFLAGS'.  In
 107 this case, we want upsmon to call upssched as the notifier, since it will
 108 be doing all the work for us.  So, in the upsmon.conf:
 109
 110         NOTIFYCMD /usr/local/ups/bin/upssched
 111
 112 Then we want upsmon to actually _use_ it for the notify events, so again
 113 in the upsmon.conf we set the flags:
 114
 115         NOTIFYFLAG ONLINE SYSLOG+EXEC
 116         NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
 117         NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC
 118         ... and so on.
 119
 120 For the purposes of this document I will only use those three, but you can set
 121 the flags for any of the valid notify types.
 122
 123 Setting up your upssched.conf
 124 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 125
 126 Once upsmon has been configured with the NOTIFYCMD and EXEC flags, you're
 127 ready to deal with the upssched.conf details.  In this file, you specify
 128 just what will happen when a given event occurs on a particular UPS.
 129
 130 First you need to define the name of the script or program that will
 131 handle timers that trigger.  This is your CMDSCRIPT, and needs to be above
 132 any AT defines.  There's an example provided with the program, so we'll
 133 use that here:
 134
 135         CMDSCRIPT /usr/local/ups/bin/upssched-cmd
 136
 137 Then you have to define the variables PIPEFN and LOCKFN; the former
 138 sets the file name of the FIFO that will pass communications between
 139 processes to start and stop timers, while the latter sets the file name
 140 for a temporary file created by upssched in order to avoid a race condition
 141 under some circumstances. Please see the relevant comments in upssched.conf
 142 for additional information and advice about these variables.
 143
 144 Now you can tell your CMDSCRIPT what to do when it is called by upsmon.
 145
 146 The big picture
 147 ^^^^^^^^^^^^^^^
 148
 149 The design in a nutshell is:
 150
 151         upsmon ---> calls upssched ---> calls your CMDSCRIPT
 152
 153 Ultimately, the CMDSCRIPT does the actual useful work,  whether that's
 154 initiating an early shutdown with 'upsmon -c fsd', sending a page by
 155 calling sendmail, or opening a subspace channel to V'ger.
 156
 157 Establishing timers
 158 ^^^^^^^^^^^^^^^^^^^
 159
 160 Let's say that you want to receive a page when any UPS has been running on
 161 battery for 30 seconds.  Create a handler that starts a 30 second timer
 162 for an ONBATT condition.
 163
 164         AT ONBATT * START-TIMER onbattwarn 30
 165
 166 This means "when any UPS (the *) goes on battery, start a timer called
 167 onbattwarn that will trigger in 30 seconds".  We'll come back to the
 168 onbattwarn part in a moment.  Right now we need to make sure that we
 169 don't trigger that timer if the UPS happens to come back before the
 170 time is up.  In essence, if it goes back on line, we need to cancel it.
 171 So, let's tell upssched that.
 172
 173         AT ONLINE * CANCEL-TIMER onbattwarn
 174
 175 Executing commands immediately
 176 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 177
 178 As an example, consider the scenario where a UPS goes onto battery power.
 179 However, the users are not informed until 60 seconds later - using a timer as
 180 described above. Whilst this may let the *logged in* users know that the UPS
 181 is on battery power, it does not inform any users subsequently logging in. To
 182 enable this we could, at the same time, create a file which is read and
 183 displayed to any user trying to login whilst the UPS is on battery power. If
 184 the UPS comes back onto utility power within 60 seconds, then we can cancel
 185 the timer and remove the file, as described above. However, if the UPS comes
 186 back onto utility power say 5 minutes later then we do not want to use any
 187 timers but we still want to remove the file. To do this we could use:
 188
 189         AT ONLINE * EXECUTE ups-back-on-power
 190
 191 This means that when upsmon detects that the UPS is back on utility power it
 192 will signal upssched. Upssched will see the above command and simply pass
 193 'ups-back-on-power' as an argument directly to CMDSCRIPT. This occurs
 194 immediately, there are no timers involved.
 195
 196 Writing the command script handler
 197 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 198
 199 OK, now that upssched knows how the timers are supposed to work, let's
 200 give it something to do when one actually triggers.  The name of the
 201 example timer is onbattwarn, so that's the argument that will be passed
 202 into your CMDSCRIPT when it triggers.  This means we need to do some
 203 shell script writing to deal with that input.
 204
 205 --------------------------------------------------------------------------------
 206
 207         #! /bin/sh
 208
 209         case $1 in
 210                 onbattwarn)
 211                         echo "The UPS has been on battery for awhile" \
 212                         | mail -s"UPS monitor" bofh@pager.example.com
 213                         ;;
 214                 ups-back-on-power)
 215                         /bin/rm -f /some/path/ups-on-battery
 216                         ;;
 217                 *)
 218                         logger -t upssched-cmd "Unrecognized command: $1"
 219                         ;;
 220         esac
 221
 222 --------------------------------------------------------------------------------
 223
 224 This is a very simple script example, but it shows how you can test for
 225 the presence of a given trigger.  With multiple ATs creating various timer
 226 names, you will need to test for each possibility and handle it according
 227 to your desires.
 228
 229 NOTE: You can invoke just about anything from inside the CMDSCRIPT.  It doesn't
 230 need to be a shell script, either - that's just an example.  If you want to
 231 write a program that will parse argv[1] and deal with the  possibilities, that
 232 will work too.
 233
 234
 235 Early Shutdowns
 236 ~~~~~~~~~~~~~~~
 237
 238 One thing that gets requested a lot is early shutdowns in upsmon.  With
 239 upssched, you can now have this functionality.  Just set a timer for some
 240 length of time at ONBATT which will invoke a shutdown command if it elapses.
 241 Just be sure to cancel this timer if you go back ONLINE before then.
 242
 243 The best way to do this is to use the upsmon callback feature.  You can
 244 make upsmon set the "forced shutdown" (FSD) flag on the upsd so your
 245 slave systems shut down early too.  Just do something like this in your
 246 CMDSCRIPT:
 247
 248         /usr/local/ups/sbin/upsmon -c fsd
 249
 250 It's not a good idea to call your system's shutdown routine directly
 251 from the CMDSCRIPT, since there's no synchronization with the slave
 252 systems hooked to the same UPS.  FSD is the master's way of saying
 253 "we're shutting down *now* like it or not, so you'd better get ready".
 254
 255
 256 Background
 257 ~~~~~~~~~~
 258
 259 This program was written primarily to fulfill the requests of users for
 260 the early shutdown scenario.  The "outboard" design of the program
 261 (relative to upsmon) was intended to reduce the load on the average
 262 system.  Most people don't have the requirement of shutting down after n
 263 seconds on battery, since the usual OB+LB testing is sufficient.
 264
 265 This program was created separately so those people don't have to spend
 266 CPU time and RAM on something that will never be used in their
 267 environments.
 268
 269 The design of the timer handler is also geared towards minimizing impact.
 270 It will come and go from the process list as necessary.  When a new timer
 271 is started, a process will be forked to actually watch the clock and
 272 eventually start the CMDSCRIPT.  When a timer triggers, it is removed from
 273 the queue.  Cancelling a timer will also remove it from the queue.  When
 274 no timers are present in the queue, the background process exits.
 275
 276 This means that you will only see upssched running when one of two things
 277 is happening:
 278
 279  1. There's a timer of some sort currently running
 280  2. upsmon just called it, and you managed to catch the brief instance
 281
 282 The final optimization handles the possibility of trying to cancel a timer
 283 when there's none running.  If there's no process already running, there
 284 are no timers to cancel, and furthermore there is no need to start a
 285 clock-watcher.  As a result, it skips that step and exits sooner.