nut-debian/docs/man/upsmon.txt

450 lines
15 KiB
Plaintext
Raw Normal View History

2011-01-26 11:35:08 +02:00
UPSMON(8)
=========
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
NAME
----
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
upsmon - UPS monitor and shutdown controller
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
SYNOPSIS
--------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
*upsmon* -h
*upsmon* -c 'command'
2012-06-01 16:55:19 +03:00
*upsmon* [-D] [-K] [-p] [-u 'user']
2011-01-26 11:35:08 +02:00
DESCRIPTION
-----------
*upsmon* is the client process that is responsible for the most important part
of UPS monitoring--shutting down the system when the power goes out. It
2010-03-26 01:20:59 +02:00
can call out to other helper programs for notification purposes during
power events.
upsmon can monitor multiple systems using a single process. Every UPS
2011-01-26 11:35:08 +02:00
that is defined in the linkman:upsmon.conf[5] configuration file is assigned
2012-06-01 16:55:19 +03:00
a power value and a type (*slave* or *master*).
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
OPTIONS
-------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
*-h*::
2010-03-26 01:20:59 +02:00
Display the help message.
2011-01-26 11:35:08 +02:00
*-c* 'command'::
Send the command 'command' to the existing upsmon process. Valid
2010-03-26 01:20:59 +02:00
commands are:
2011-01-26 11:35:08 +02:00
*fsd*;; shutdown all master UPSes (use with caution)
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
*stop*;; stop monitoring and exit
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
*reload*;; reread linkman:upsmon.conf[5] configuration file. See
2010-03-26 01:20:59 +02:00
"reloading nuances" below if this doesn't work.
2011-01-26 11:35:08 +02:00
*-D*::
2010-03-26 01:20:59 +02:00
Raise the debugging level. upsmon will run in the foreground and prints
information on stdout about the monitoring process. Use this multiple
times for more details.
2011-01-26 11:35:08 +02:00
*-K*::
2010-03-26 01:20:59 +02:00
Test for the shutdown flag. If it exists and contains the magic string
2011-01-26 11:35:08 +02:00
from upsmon, then upsmon will exit with `EXIT_SUCCESS`. Any other condition
will make upsmon exit with `EXIT_FAILURE`.
+
You can test for a successful exit from `upsmon -K` in your shutdown
scripts to know when to call linkman:upsdrvctl[8] to shut down the UPS.
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
*-p*::
2010-03-26 01:20:59 +02:00
Run privileged all the time. Normally upsmon will split into two
processes. The majority of the code runs as an unprivileged user, and
only a tiny stub runs as root. This switch will disable that mode, and
run the old "all root all the time" system.
2011-01-26 11:35:08 +02:00
+
2010-03-26 01:20:59 +02:00
This is not the recommended mode, and you should not use this unless you
have a very good reason.
2011-01-26 11:35:08 +02:00
*-u* 'user'::
2010-03-26 01:20:59 +02:00
Set the user for the unprivileged monitoring process. This has no effect
2011-01-26 11:35:08 +02:00
when using -p.
+
2010-03-26 01:20:59 +02:00
The default user is set at configure time with 'configure
2011-01-26 11:35:08 +02:00
--with-user=...'. Typically this is 'nobody', but other distributions
2010-03-26 01:20:59 +02:00
will probably have a specific 'nut' user for this task. If your
notification scripts need to run as a specific user, set it here.
2011-01-26 11:35:08 +02:00
+
You can also set this in the linkman:upsmon.conf[5] file with the
2010-03-26 01:20:59 +02:00
RUN_AS_USER directive.
2011-01-26 11:35:08 +02:00
UPS DEFINITIONS
---------------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
In the linkman:upsmon.conf[5], you must specify at least one UPS that will
2010-03-26 01:20:59 +02:00
be monitored. Use the MONITOR directive.
2011-01-26 11:35:08 +02:00
MONITOR 'system' 'powervalue' 'username' 'password' 'type'
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
The 'system' refers to a linkman:upsd[8] server, in the form
+upsname[@hostname[:port]]+. The default hostname is "localhost". Some
2010-03-26 01:20:59 +02:00
examples follow:
2011-01-26 11:35:08 +02:00
- "su700@mybox" means a UPS called "su700" on a system called "mybox".
2010-03-26 01:20:59 +02:00
This is the normal form.
2011-01-26 11:35:08 +02:00
- "fenton@bigbox:5678" is a UPS called "fenton" on a system called
"bigbox" which runs linkman:upsd[8] on port "5678".
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
The 'powervalue' refers to how many power supplies on this system are
2010-03-26 01:20:59 +02:00
being driven this UPS. This is typically set to 1, but see the section
on power values below.
2011-01-26 11:35:08 +02:00
The 'username' is a section in your linkman:upsd.users[5] file.
Whatever password you set in that section must match the 'password'
2010-03-26 01:20:59 +02:00
set in this file.
2011-01-26 11:35:08 +02:00
The type set in that section must also match the 'type' here--
*master* or *slave*. In general, a master process is one
2010-03-26 01:20:59 +02:00
running on the system with the UPS actually plugged into a serial
port, and a slave is drawing power from the UPS but can't talk to it
directly. See the section on UPS types for more.
2011-01-26 11:35:08 +02:00
NOTIFY EVENTS
-------------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
*upsmon* senses several events as it monitors each UPS. They are called
2010-03-26 01:20:59 +02:00
notify events as they can be used to tell the users and admins about the
2011-01-26 11:35:08 +02:00
change in status. See the additional NOTIFY-related sections below for
2010-03-26 01:20:59 +02:00
information on customizing the delivery of these messages.
2011-01-26 11:35:08 +02:00
*ONLINE*::
2010-03-26 01:20:59 +02:00
The UPS is back on line.
2011-01-26 11:35:08 +02:00
*ONBATT*::
2010-03-26 01:20:59 +02:00
The UPS is on battery.
2011-01-26 11:35:08 +02:00
*LOWBATT*::
2010-03-26 01:20:59 +02:00
The UPS battery is low (as determined by the driver).
2011-01-26 11:35:08 +02:00
*FSD*::
2010-03-26 01:20:59 +02:00
The UPS has been commanded into the "forced shutdown" mode.
2011-01-26 11:35:08 +02:00
*COMMOK*::
2010-03-26 01:20:59 +02:00
Communication with the UPS has been established.
2011-01-26 11:35:08 +02:00
*COMMBAD*::
2010-03-26 01:20:59 +02:00
Communication with the UPS was just lost.
2011-01-26 11:35:08 +02:00
*SHUTDOWN*::
2010-03-26 01:20:59 +02:00
The local system is being shut down.
2011-01-26 11:35:08 +02:00
*REPLBATT*::
2010-03-26 01:20:59 +02:00
The UPS needs to have its battery replaced.
2011-01-26 11:35:08 +02:00
*NOCOMM*::
2010-03-26 01:20:59 +02:00
The UPS can't be contacted for monitoring.
2011-01-26 11:35:08 +02:00
NOTIFY COMMAND
--------------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
In linkman:upsmon.conf[5], you can configure a program called the NOTIFYCMD
2010-03-26 01:20:59 +02:00
that will handle events that occur.
2011-01-26 11:35:08 +02:00
+NOTIFYCMD+ "'path to program'"
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
+NOTIFYCMD "/usr/local/bin/notifyme"+
2010-03-26 01:20:59 +02:00
Remember to wrap the path in "quotes" if it contains any spaces.
The program you run as your NOTIFYCMD can use the environment variables
NOTIFYTYPE and UPSNAME to know what has happened and on which UPS. It
also receives the notification message (see below) as the first (and
only) argument, so you can deliver a preformatted message too.
Note that the NOTIFYCMD will only be called for a given event when you set
the EXEC flag by using the notify flags, below:
2011-01-26 11:35:08 +02:00
NOTIFY FLAGS
------------
2010-03-26 01:20:59 +02:00
By default, all notify events (see above) generate a global message
(wall) to all users, plus they are logged via the syslog. You can change
this with the NOTIFYFLAG directive in the configuration file:
2011-01-26 11:35:08 +02:00
+NOTIFYFLAG+ 'notifytype' 'flags'
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
Examples:
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
- `NOTIFYFLAG ONLINE SYSLOG`
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
- `NOTIFYFLAG ONBATT SYSLOG+WALL`
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
- `NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC`
2010-03-26 01:20:59 +02:00
The flags that can be set on a given notify event are:
2011-01-26 11:35:08 +02:00
*SYSLOG*::
2010-03-26 01:20:59 +02:00
Write this message to the syslog.
2011-01-26 11:35:08 +02:00
*WALL*::
Send this message to all users on the system via *wall*(1).
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
*EXEC*::
2010-03-26 01:20:59 +02:00
Execute the NOTIFYCMD.
2011-01-26 11:35:08 +02:00
*IGNORE*::
2010-03-26 01:20:59 +02:00
Don't do anything. If you use this, don't use any of the other flags.
2011-01-26 11:35:08 +02:00
2010-03-26 01:20:59 +02:00
You can mix these flags. "SYSLOG+WALL+EXEC" does all three for a given
event.
2011-01-26 11:35:08 +02:00
NOTIFY MESSAGES
---------------
2010-03-26 01:20:59 +02:00
upsmon comes with default messages for each of the NOTIFY events. These
can be changed with the NOTIFYMSG directive.
2011-01-26 11:35:08 +02:00
+NOTIFYMSG+ 'type' "'message'"
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
Examples:
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
- `NOTIFYMSG ONLINE "UPS %s is getting line power"`
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
- ` NOTIFYMSG ONBATT "Someone pulled the plug on %s"`
2010-03-26 01:20:59 +02:00
The first instance of %s is replaced with the identifier of the UPS that
generated the event. These messages are used when sending walls to the
users directly from upsmon, and are also passed to the NOTIFYCMD.
2011-01-26 11:35:08 +02:00
POWER VALUES
------------
2010-03-26 01:20:59 +02:00
The "current overall power value" is the sum of all UPSes that are
currently able to supply power to the system hosting upsmon. Any
UPS that is either on line or just on battery contributes to this
number. If a UPS is critical (on battery and low battery) or has been
put into "forced shutdown" mode, it no longer contributes.
A "power value" on a MONITOR line in the config file is the number of
power supplies that the UPS runs on the current system.
2011-01-26 11:35:08 +02:00
+MONITOR+ 'upsname' 'powervalue' 'username' 'password' 'type'
2010-03-26 01:20:59 +02:00
Normally, you only have one power supply, so it will be set to 1.
2011-01-26 11:35:08 +02:00
+MONITOR myups@myhost 1 username mypassword master+
2010-03-26 01:20:59 +02:00
On a large server with redundant power supplies, the power value for a UPS
may be greater than 1. You may also have more than one of them defined.
2011-01-26 11:35:08 +02:00
+MONITOR ups-alpha@myhost 2 username mypassword master+
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
+MONITOR ups-beta@myhost 2 username mypassword master+
2010-03-26 01:20:59 +02:00
You can also set the power value for a UPS to 0 if it does not supply any
power to that system. This is generally used when you want to use the
upsmon notification features for a UPS even though it's not actually
running the system that hosts upsmon. Don't set this to "master" unless
you really want to power this UPS off when this instance of upsmon needs
to shut down for its own reasons.
2011-01-26 11:35:08 +02:00
+MONITOR faraway@anotherbox 0 username mypassword slave+
2010-03-26 01:20:59 +02:00
The "minimum power value" is the number of power supplies that must be
receiving power in order to keep the computer running.
2011-01-26 11:35:08 +02:00
+MINSUPPLIES+ 'value'
2010-03-26 01:20:59 +02:00
Typical PCs only have 1, so most users will leave this at the default.
2011-01-26 11:35:08 +02:00
+MINSUPPLIES 1+
2010-03-26 01:20:59 +02:00
If you have a server or similar system with redundant power, then this
value will usually be set higher. One that requires three power supplies
to be running at all times would simply set it to 3.
2011-01-26 11:35:08 +02:00
+MINSUPPLIES 3+
2010-03-26 01:20:59 +02:00
When the current overall power value drops below the minimum power value,
upsmon starts the shutdown sequence. This design allows you to lose some
of your power supplies in a redundant power environment without bringing
down the entire system while still working properly for smaller systems.
2011-01-26 11:35:08 +02:00
UPS TYPES
---------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
*upsmon* and linkman:upsd[8] don't always run on the same system. When they
2010-03-26 01:20:59 +02:00
do, any UPSes that are directly attached to the upsmon host should be
monitored in "master" mode. This makes upsmon take charge of that
equipment, and it will wait for slaves to disconnect before shutting
down the local system. This allows the distant systems (monitoring over
2011-01-26 11:35:08 +02:00
the network) to shut down cleanly before `upsdrvctl shutdown` runs
2010-03-26 01:20:59 +02:00
and turns them all off.
When upsmon runs as a slave, it is relying on the distant system to tell
it about the state of the UPS. When that UPS goes critical (on battery
and low battery), it immediately invokes the local shutdown command. This
needs to happen quickly. Once it disconnects from the distant
2011-01-26 11:35:08 +02:00
linkman:upsd[8] server, the master upsmon will start its own shutdown
2010-03-26 01:20:59 +02:00
process. Your slaves must all shut down before the master turns off the
power or filesystem damage may result.
upsmon deals with slaves that get wedged, hang, or otherwise fail to
2011-01-26 11:35:08 +02:00
disconnect from linkman:upsd[8] in a timely manner with the HOSTSYNC
2010-03-26 01:20:59 +02:00
timer. During a shutdown situation, the master upsmon will give up after
this interval and it will shut down anyway. This keeps the master from
sitting there forever (which would endanger that host) if a slave should
break somehow. This defaults to 15 seconds.
If your master system is shutting down too quickly, set the FINALDELAY
interval to something greater than the default 15 seconds. Don't set
this too high, or your UPS battery may run out of power before the
master upsmon process shuts down that system.
2011-01-26 11:35:08 +02:00
TIMED SHUTDOWNS
---------------
2010-03-26 01:20:59 +02:00
For those rare situations where the shutdown process can't be completed
between the time that low battery is signalled and the UPS actually powers
2011-01-26 11:35:08 +02:00
off the load, use the linkman:upssched[8] helper program. You can use it
2010-03-26 01:20:59 +02:00
along with upsmon to schedule a shutdown based on the "on battery" event.
2011-01-26 11:35:08 +02:00
upssched can then come back to upsmon to initiate the shutdown once it has
2010-03-26 01:20:59 +02:00
run on battery too long.
This can be complicated and messy, so stick to the default critical UPS
handling if you can.
2011-01-26 11:35:08 +02:00
REDUNDANT POWER SUPPLIES
------------------------
2010-03-26 01:20:59 +02:00
If you have more than one power supply for redundant power, you may also
have more than one UPS feeding your computer. upsmon can handle this. Be
sure to set the UPS power values appropriately and the MINSUPPLIES value
high enough so that it keeps running until it really does need to shut
down.
For example, the HP NetServer LH4 by default has 3 power supplies
installed, with one bay empty. It has two power cords, one per side of
the box. This means that one power cord powers two power supply bays,
and that you can only have two UPSes supplying power.
Connect UPS "alpha" to the cord feeding two power supplies, and UPS
"beta" to the cord that feeds the third and the empty slot. Define alpha
as a powervalue of 2, and beta as a powervalue of 1. Set the MINSUPPLIES
to 2.
When alpha goes on battery, your current overall power value will stay
at 3, as it's still supplying power. However, once it goes critical (on
battery and low battery), it will stop contributing to the current overall
power value. That means the value will be 1 (beta alone), which is less
than 2. That is insufficient to run the system, and upsmon will invoke
the shutdown sequence.
However, if beta goes critical, subtracting its contribution will take the
current overall value from 3 to 2. This is just high enough to satisfy
the minimum, so the system will continue running as before. If beta
2011-01-26 11:35:08 +02:00
returns later, it will be re-added and the current value will go back to
2010-03-26 01:20:59 +02:00
3. This allows you to swap out UPSes, change a power configuration, or
whatever, as long as you maintain the minimum power value at all times.
2011-01-26 11:35:08 +02:00
MIXED OPERATIONS
----------------
2010-03-26 01:20:59 +02:00
Besides being able to monitor multiple UPSes, upsmon can also monitor them
as different roles. If you have a system with multiple power supplies
serviced by separate UPS batteries, it's possible to be a master on one
and a slave on the other. This usually happens when you run out of serial
ports and need to do the monitoring through another system nearby.
This is also complicated, especially when it comes time to power down a
UPS that has gone critical but doesn't supply the local system. You can
do this with some scripting magic in your notify command script, but it's
beyond the scope of this manual.
2011-01-26 11:35:08 +02:00
FORCED SHUTDOWNS
----------------
2010-03-26 01:20:59 +02:00
When upsmon is forced to bring down the local system, it sets the
"FSD" (forced shutdown) flag on any UPSes that it is running in master
mode. This is used to synchronize slaves in the event that a master UPS
that is otherwise OK needs to be brought down due to some pressing event
on the master.
You can manually invoke this mode on the master upsmon by starting another
2011-01-26 11:35:08 +02:00
copy with `-c fsd`. This is useful when you want to initiate a shutdown
2010-03-26 01:20:59 +02:00
before the critical stage through some external means, such as
2011-01-26 11:35:08 +02:00
linkman:upssched[8].
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
DEAD UPSES
----------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
In the event that upsmon can't reach linkman:upsd[8], it declares that UPS
2010-03-26 01:20:59 +02:00
"dead" after some interval controlled by DEADTIME in the
2011-01-26 11:35:08 +02:00
linkman:upsmon.conf[5]. If this happens while that UPS was last known to be
2010-03-26 01:20:59 +02:00
on battery, it is assumed to have gone critical and no longer contributes
to the overall power value.
upsmon will alert you to a UPS that can't be contacted for monitoring
with a "NOCOMM" notifier by default every 300 seconds. This can be
changed with the NOCOMMWARNTIME setting.
2011-01-26 11:35:08 +02:00
RELOADING NUANCES
-----------------
2010-03-26 01:20:59 +02:00
upsmon usually gives up root powers for the process that does most of
the work, including handling signals like SIGHUP to reload the configuration
2011-01-26 11:35:08 +02:00
file. This means your linkman:upsmon.conf[8] file must be readable by
the non-root account that upsmon switches to.
2010-03-26 01:20:59 +02:00
If you want reloads to work, upsmon must run as some user that has
permissions to read the configuration file. I recommend making a new
user just for this purpose, as making the file readable by "nobody"
(the default user) would be a bad idea.
2011-01-26 11:35:08 +02:00
See the RUN_AS_USER section in linkman:upsmon.conf[8] for more on this topic.
2010-03-26 01:20:59 +02:00
Additionally, you can't change the SHUTDOWNCMD or POWERDOWNFLAG
2011-01-26 11:35:08 +02:00
definitions with a reload due to the split-process model. If you change
those values, you *must* stop upsmon and start it back up. upsmon
2010-03-26 01:20:59 +02:00
will warn you in the syslog if you make changes to either of those
values during a reload.
2011-01-26 11:35:08 +02:00
SIMULATING POWER FAILURES
-------------------------
2010-03-26 01:20:59 +02:00
To test a synchronized shutdown without pulling the plug on your UPS(es),
you need only set the forced shutdown (FSD) flag on them. You can do this
2011-01-26 11:35:08 +02:00
by calling upsmon again to set the flag, i.e.:
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
+upsmon -c fsd+
2010-03-26 01:20:59 +02:00
After that, the master and the slaves will do their usual shutdown sequence
as if the battery had gone critical. This is much easier on your UPS
equipment, and it beats crawling under a desk to find the plug.
2011-01-26 11:35:08 +02:00
FILES
-----
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
linkman:upsmon.conf[5]
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
SEE ALSO
--------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
Server:
~~~~~~~
linkman:upsd[8]
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
Clients:
~~~~~~~~
linkman:upsc[8], linkman:upscmd[8],
linkman:upsrw[8], linkman:upsmon[8]
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
CGI programs:
~~~~~~~~~~~~~
linkman:upsset.cgi[8], linkman:upsstats.cgi[8], linkman:upsimage.cgi[8]
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
Internet resources:
~~~~~~~~~~~~~~~~~~~
2010-03-26 01:20:59 +02:00
The NUT (Network UPS Tools) home page: http://www.networkupstools.org/