nut-debian/docs/scheduling.txt
2015-04-30 15:53:36 +02:00

286 lines
11 KiB
Plaintext

Advanced usage and scheduling notes
===================================
upsmon can call out to a helper script or program when the device changes
state. The example upsmon.conf has a full list of which state changes
are available - ONLINE, ONBATT, LOWBATT, and more.
There are two options, that will be presented in details:
- the simple approach: create your own helper, and manage all events and actions
yourself,
- the advanced approach: use the NUT provided helper, called 'upssched'.
The simple approach, using your own script
------------------------------------------
How it works relative to upsmon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Your command will be called with the full text of the message as one argument.
For the default values, refer to the sample upsmon.conf file.
The environment string NOTIFYTYPE will contain the type string of whatever
caused this event to happen - ONLINE, ONBATT, LOWBATT, ...
Making this some sort of shell script might be a good idea, but the helper can
be in any programming or scripting language.
NOTE: Remember that your helper must be *executable*. If you are using a script,
make sure the execution flags are set.
For more information, refer to linkman:upsmon[8] and
linkman:upsmon.conf[5] manual pages.
Setting up everything
~~~~~~~~~~~~~~~~~~~~~
- Set EXEC flags on various things in linkman:upsmon.conf[5]:
+
NOTIFYFLAG ONBATT EXEC
NOTIFYFLAG ONLINE EXEC
+
If you want other things like WALL or SYSLOG to happen, just add them:
+
NOTIFYFLAG ONBATT EXEC+WALL+SYSLOG
+
You get the idea.
- Tell upsmon where your script is
NOTIFYCMD /path/to/my/script
- Make a simple script like this at that location:
#! /bin/bash
echo "$*" | sendmail -F"ups@mybox" bofh@pager.example.com
- Restart upsmon, pull the plug, and see what happens.
That approach is bare-bones, but you should get the text content of the
alert in the body of the message, since upsmon passes the alert text
(from NOTIFYMSG) as an argument.
Using more advanced features
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Your helper script will be run with a few environment variables set.
- UPSNAME: the name of the system that generated the change.
+
This will be one of your identifiers from the MONITOR lines in upsmon.conf.
- NOTIFYTYPE: this will be ONLINE, ONBATT, or whatever event took place which
made upsmon call your script.
You can use these to do different things based on which system has
changed state. You could have it only send pages for an important
system while totally ignoring a known trouble spot, for example.
Suppressing notify storms
~~~~~~~~~~~~~~~~~~~~~~~~~
upsmon will call your script every time an event happens that has the EXEC flag
set. This means a quick power failure that lasts mere seconds might generate a
notification storm. To suppress this sort of annoyance, use upssched as your
NOTIFYCMD program, and configure it to call your command after a timer has
elapsed.
The advanced approach, using upssched
-------------------------------------
upssched is a helper for upsmon that will invoke commands for you at some
interval relative to a UPS event. It can be used to send pages, mail out
notices about things, or even shut down the box early.
There will be examples scattered throughout. Change them to suit your
pathnames, UPS locations, and so forth.
How upssched works relative to upsmon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When an event occurs, upsmon will call whatever you specify as a 'NOTIFYCMD'
in your upsmon.conf, if you also enable the 'EXEC' in your 'NOTIFYFLAGS'. In
this case, we want upsmon to call upssched as the notifier, since it will
be doing all the work for us. So, in the upsmon.conf:
NOTIFYCMD /usr/local/ups/bin/upssched
Then we want upsmon to actually _use_ it for the notify events, so again
in the upsmon.conf we set the flags:
NOTIFYFLAG ONLINE SYSLOG+EXEC
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC
... and so on.
For the purposes of this document I will only use those three, but you can set
the flags for any of the valid notify types.
Setting up your upssched.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once upsmon has been configured with the NOTIFYCMD and EXEC flags, you're
ready to deal with the upssched.conf details. In this file, you specify
just what will happen when a given event occurs on a particular UPS.
First you need to define the name of the script or program that will
handle timers that trigger. This is your CMDSCRIPT, and needs to be above
any AT defines. There's an example provided with the program, so we'll
use that here:
CMDSCRIPT /usr/local/ups/bin/upssched-cmd
Then you have to define the variables PIPEFN and LOCKFN; the former
sets the file name of the FIFO that will pass communications between
processes to start and stop timers, while the latter sets the file name
for a temporary file created by upssched in order to avoid a race condition
under some circumstances. Please see the relevant comments in upssched.conf
for additional information and advice about these variables.
Now you can tell your CMDSCRIPT what to do when it is called by upsmon.
The big picture
^^^^^^^^^^^^^^^
The design in a nutshell is:
upsmon ---> calls upssched ---> calls your CMDSCRIPT
Ultimately, the CMDSCRIPT does the actual useful work, whether that's
initiating an early shutdown with 'upsmon -c fsd', sending a page by
calling sendmail, or opening a subspace channel to V'ger.
Establishing timers
^^^^^^^^^^^^^^^^^^^
Let's say that you want to receive a page when any UPS has been running on
battery for 30 seconds. Create a handler that starts a 30 second timer
for an ONBATT condition.
AT ONBATT * START-TIMER onbattwarn 30
This means "when any UPS (the *) goes on battery, start a timer called
onbattwarn that will trigger in 30 seconds". We'll come back to the
onbattwarn part in a moment. Right now we need to make sure that we
don't trigger that timer if the UPS happens to come back before the
time is up. In essence, if it goes back on line, we need to cancel it.
So, let's tell upssched that.
AT ONLINE * CANCEL-TIMER onbattwarn
Executing commands immediately
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As an example, consider the scenario where a UPS goes onto battery power.
However, the users are not informed until 60 seconds later - using a timer as
described above. Whilst this may let the *logged in* users know that the UPS
is on battery power, it does not inform any users subsequently logging in. To
enable this we could, at the same time, create a file which is read and
displayed to any user trying to login whilst the UPS is on battery power. If
the UPS comes back onto utility power within 60 seconds, then we can cancel
the timer and remove the file, as described above. However, if the UPS comes
back onto utility power say 5 minutes later then we do not want to use any
timers but we still want to remove the file. To do this we could use:
AT ONLINE * EXECUTE ups-back-on-power
This means that when upsmon detects that the UPS is back on utility power it
will signal upssched. Upssched will see the above command and simply pass
'ups-back-on-power' as an argument directly to CMDSCRIPT. This occurs
immediately, there are no timers involved.
Writing the command script handler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OK, now that upssched knows how the timers are supposed to work, let's
give it something to do when one actually triggers. The name of the
example timer is onbattwarn, so that's the argument that will be passed
into your CMDSCRIPT when it triggers. This means we need to do some
shell script writing to deal with that input.
--------------------------------------------------------------------------------
#! /bin/sh
case $1 in
onbattwarn)
echo "The UPS has been on battery for awhile" \
| mail -s"UPS monitor" bofh@pager.example.com
;;
ups-back-on-power)
/bin/rm -f /some/path/ups-on-battery
;;
*)
logger -t upssched-cmd "Unrecognized command: $1"
;;
esac
--------------------------------------------------------------------------------
This is a very simple script example, but it shows how you can test for
the presence of a given trigger. With multiple ATs creating various timer
names, you will need to test for each possibility and handle it according
to your desires.
NOTE: You can invoke just about anything from inside the CMDSCRIPT. It doesn't
need to be a shell script, either - that's just an example. If you want to
write a program that will parse argv[1] and deal with the possibilities, that
will work too.
Early Shutdowns
~~~~~~~~~~~~~~~
One thing that gets requested a lot is early shutdowns in upsmon. With
upssched, you can now have this functionality. Just set a timer for some
length of time at ONBATT which will invoke a shutdown command if it elapses.
Just be sure to cancel this timer if you go back ONLINE before then.
The best way to do this is to use the upsmon callback feature. You can
make upsmon set the "forced shutdown" (FSD) flag on the upsd so your
slave systems shut down early too. Just do something like this in your
CMDSCRIPT:
/usr/local/ups/sbin/upsmon -c fsd
It's not a good idea to call your system's shutdown routine directly
from the CMDSCRIPT, since there's no synchronization with the slave
systems hooked to the same UPS. FSD is the master's way of saying
"we're shutting down *now* like it or not, so you'd better get ready".
Background
~~~~~~~~~~
This program was written primarily to fulfill the requests of users for
the early shutdown scenario. The "outboard" design of the program
(relative to upsmon) was intended to reduce the load on the average
system. Most people don't have the requirement of shutting down after n
seconds on battery, since the usual OB+LB testing is sufficient.
This program was created separately so those people don't have to spend
CPU time and RAM on something that will never be used in their
environments.
The design of the timer handler is also geared towards minimizing impact.
It will come and go from the process list as necessary. When a new timer
is started, a process will be forked to actually watch the clock and
eventually start the CMDSCRIPT. When a timer triggers, it is removed from
the queue. Canceling a timer will also remove it from the queue. When
no timers are present in the queue, the background process exits.
This means that you will only see upssched running when one of two things
is happening:
1. There's a timer of some sort currently running
2. upsmon just called it, and you managed to catch the brief instance
The final optimization handles the possibility of trying to cancel a timer
when there's none running. If there's no process already running, there
are no timers to cancel, and furthermore there is no need to start a
clock-watcher. As a result, it skips that step and exits sooner.