332 lines
12 KiB
Plaintext
332 lines
12 KiB
Plaintext
UPSMON.CONF(5)
|
|
==============
|
|
|
|
NAME
|
|
----
|
|
|
|
upsmon.conf - Configuration for Network UPS Tools upsmon
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
|
|
This file's primary job is to define the systems that linkman:upsmon[8]
|
|
will monitor and to tell it how to shut down the system when necessary.
|
|
It will contain passwords, so keep it secure. Ideally, only the upsmon
|
|
process should be able to read it.
|
|
|
|
Additionally, other optional configuration values can be set in this
|
|
file.
|
|
|
|
CONFIGURATION DIRECTIVES
|
|
------------------------
|
|
|
|
*DEADTIME* 'seconds'::
|
|
|
|
upsmon allows a UPS to go missing for this many seconds before declaring
|
|
it "dead". The default is 15 seconds.
|
|
+
|
|
upsmon requires a UPS to provide status information every few seconds
|
|
(see POLLFREQ and POLLFREQALERT) to keep things updated. If the status
|
|
fetch fails, the UPS is marked stale. If it stays stale for more than
|
|
DEADTIME seconds, the UPS is marked dead.
|
|
+
|
|
A dead UPS that was last known to be on battery is assumed to have
|
|
changed to a low battery condition. This may force a shutdown if it is
|
|
providing a critical amount of power to your system. This seems
|
|
disruptive, but the alternative is barreling ahead into oblivion and
|
|
crashing when you run out of power.
|
|
+
|
|
Note: DEADTIME should be a multiple of POLLFREQ and POLLFREQALERT.
|
|
Otherwise, you'll have "dead" UPSes simply because upsmon isn't polling
|
|
them quickly enough. Rule of thumb: take the larger of the two POLLFREQ
|
|
values, and multiply by 3.
|
|
|
|
*FINALDELAY* 'seconds'::
|
|
|
|
When running in master mode, upsmon waits this long after sending the
|
|
NOTIFY_SHUTDOWN to warn the users. After the timer elapses, it then
|
|
runs your SHUTDOWNCMD. By default this is set to 5 seconds.
|
|
+
|
|
If you need to let your users do something in between those events,
|
|
increase this number. Remember, at this point your UPS battery is
|
|
almost depleted, so don't make this too big.
|
|
+
|
|
Alternatively, you can set this very low so you don't wait around when
|
|
it's time to shut down. Some UPSes don't give much warning for low
|
|
battery and will require a value of 0 here for a safe shutdown.
|
|
+
|
|
NOTE: If FINALDELAY on the slave is greater than HOSTSYNC on the master,
|
|
the master will give up waiting for the slave to disconnect.
|
|
|
|
*HOSTSYNC* 'seconds'::
|
|
|
|
upsmon will wait up to this many seconds in master mode for the slaves
|
|
to disconnect during a shutdown situation. By default, this is 15
|
|
seconds.
|
|
+
|
|
When a UPS goes critical (on battery + low battery, or "FSD": forced
|
|
shutdown), the slaves are supposed to disconnect and shut down right
|
|
away. The HOSTSYNC timer keeps the master upsmon from sitting there
|
|
forever if one of the slaves gets stuck.
|
|
+
|
|
This value is also used to keep slave systems from getting stuck if
|
|
the master fails to respond in time. After a UPS becomes critical,
|
|
the slave will wait up to HOSTSYNC seconds for the master to set the
|
|
FSD flag. If that timer expires, the slave will assume that the master
|
|
is broken and will shut down anyway.
|
|
+
|
|
This keeps the slaves from shutting down during a short-lived status
|
|
change to "OB LB" that the slaves see but the master misses.
|
|
|
|
*MINSUPPLIES* 'num'::
|
|
|
|
Set the number of power supplies that must be receiving power to keep
|
|
this system running. Normal computers have just one power supply, so
|
|
the default value of 1 is acceptable.
|
|
+
|
|
Large/expensive server type systems usually have more, and can run
|
|
with a few missing. The HP NetServer LH4 can run with 2 out of 4, for
|
|
example, so you'd set it to 2. The idea is to keep the box running
|
|
as long as possible, right?
|
|
+
|
|
Obviously you have to put the redundant supplies on different UPS
|
|
circuits for this to make sense! See big-servers.txt in the docs
|
|
subdirectory for more information and ideas on how to use this
|
|
feature.
|
|
+
|
|
Also see the section on "power values" in linkman:upsmon[8].
|
|
|
|
*MONITOR* 'system' 'powervalue' 'username' 'password' 'type'::
|
|
|
|
Each UPS that you need to be monitor should have a MONITOR line. Not
|
|
all of these need supply power to the system that is running upsmon.
|
|
You may monitor other systems if you want to be able to send
|
|
notifications about status changes on them.
|
|
|
|
You must have at least one MONITOR directive in `upsmon.conf`.
|
|
|
|
'system' is a UPS identifier. It is in this form:
|
|
|
|
+<upsname>[@<hostname>[:<port>]]+
|
|
|
|
The default hostname is "localhost". Some examples:
|
|
|
|
- "su700@mybox" means a UPS called "su700" on a system called "mybox".
|
|
This is the normal form.
|
|
- "fenton@bigbox:5678" is a UPS called "fenton" on a system called
|
|
"bigbox" which runs linkman:upsd[8] on port "5678".
|
|
|
|
'powervalue' is an integer representing the number of power supplies
|
|
that the UPS feeds on this system. Most normal computers have one power
|
|
supply, and the UPS feeds it, so this value will be 1. You need a very
|
|
large or special system to have anything higher here.
|
|
|
|
You can set the 'powervalue' to 0 if you want to monitor a UPS that
|
|
doesn't actually supply power to this system. This is useful when you
|
|
want to have upsmon do notifications about status changes on a UPS
|
|
without shutting down when it goes critical.
|
|
|
|
The 'username' and 'password' on this line must match an entry
|
|
in that system's linkman:upsd.users[5]. If your username is "monmaster"
|
|
and your password is "blah", the MONITOR line might look like this:
|
|
|
|
+MONITOR myups@bigserver 1 monmaster blah master+
|
|
|
|
Meanwhile, the `upsd.users` on `bigserver` would look like this:
|
|
|
|
[monmaster]
|
|
password = blah
|
|
upsmon master # (or slave)
|
|
|
|
The 'type' refers to the relationship with linkman:upsd[8]. It can
|
|
be either "master" or "slave". See linkman:upsmon[8] for more information
|
|
on the meaning of these modes. The mode you pick here also goes in
|
|
the `upsd.users` file, as seen in the example above.
|
|
|
|
*NOCOMMWARNTIME* 'seconds'::
|
|
|
|
upsmon will trigger a NOTIFY_NOCOMM after this many seconds if it can't
|
|
reach any of the UPS entries in this configuration file. It keeps
|
|
warning you until the situation is fixed. By default this is 300
|
|
seconds.
|
|
|
|
*NOTIFYCMD* 'command'::
|
|
|
|
upsmon calls this to send messages when things happen.
|
|
+
|
|
This command is called with the full text of the message as one
|
|
argument. The environment string NOTIFYTYPE will contain the type
|
|
string of whatever caused this event to happen.
|
|
+
|
|
If you need to use linkman:upssched[8], then you must make it your
|
|
NOTIFYCMD by listing it here.
|
|
+
|
|
Note that this is only called for NOTIFY events that have EXEC set with
|
|
NOTIFYFLAG. See NOTIFYFLAG below for more details.
|
|
+
|
|
Making this some sort of shell script might not be a bad idea. For
|
|
more information and ideas, see pager.txt in the docs directory.
|
|
+
|
|
Remember, this command also needs to be one element in the configuration file,
|
|
so if your command has spaces, then wrap it in quotes.
|
|
+
|
|
+NOTIFYCMD "/path/to/script --foo --bar"+
|
|
+
|
|
This script is run in the background--that is, upsmon forks before it
|
|
calls out to start it. This means that your NOTIFYCMD may have multiple
|
|
instances running simultaneously if a lot of stuff happens all at once.
|
|
Keep this in mind when designing complicated notifiers.
|
|
|
|
*NOTIFYMSG* 'type' 'message'::
|
|
|
|
upsmon comes with a set of stock messages for various events. You can
|
|
change them if you like.
|
|
|
|
NOTIFYMSG ONLINE "UPS %s is getting line power"
|
|
|
|
NOTIFYMSG ONBATT "Someone pulled the plug on %s"
|
|
+
|
|
Note that +%s+ is replaced with the identifier of the UPS in question.
|
|
+
|
|
The message must be one element in the configuration file, so if it
|
|
contains spaces, you must wrap it in quotes.
|
|
|
|
NOTIFYMSG NOCOMM "Someone stole UPS %s"
|
|
+
|
|
Possible values for 'type':
|
|
|
|
ONLINE;; UPS is back online
|
|
|
|
ONBATT;; UPS is on battery
|
|
|
|
LOWBATT;; UPS is on battery and has a low battery (is critical)
|
|
|
|
FSD;; UPS is being shutdown by the master (FSD = "Forced Shutdown")
|
|
|
|
COMMOK;; Communications established with the UPS
|
|
|
|
COMMBAD;; Communications lost to the UPS
|
|
|
|
SHUTDOWN;; The system is being shutdown
|
|
|
|
REPLBATT;; The UPS battery is bad and needs to be replaced
|
|
|
|
NOCOMM;; A UPS is unavailable (can't be contacted for monitoring)
|
|
|
|
*NOTIFYFLAG* 'type' 'flag'[\+'flag'][+'flag']...::
|
|
|
|
By default, upsmon sends walls global messages to all logged in users)
|
|
via /bin/wall and writes to the syslog when things happen. You can
|
|
change this.
|
|
+
|
|
Examples:
|
|
+
|
|
NOTIFYFLAG ONLINE SYSLOG
|
|
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
|
|
+
|
|
Possible values for the flags:
|
|
+
|
|
SYSLOG;; Write the message to the syslog
|
|
|
|
WALL;; Write the message to all users with /bin/wall
|
|
|
|
EXEC;; Execute NOTIFYCMD (see above) with the message
|
|
|
|
IGNORE;; Don't do anything
|
|
+
|
|
If you use IGNORE, don't use any other flags on the same line.
|
|
|
|
*POLLFREQ* 'seconds'::
|
|
|
|
Normally upsmon polls the linkman:upsd[8] server every 5 seconds. If this
|
|
is flooding your network with activity, you can make it higher. You can
|
|
also make it lower to get faster updates in some cases.
|
|
+
|
|
There are some catches. First, if you set the POLLFREQ too high, you
|
|
may miss short-lived power events entirely. You also risk triggering
|
|
the DEADTIME (see above) if you use a very large number.
|
|
+
|
|
Second, there is a point of diminishing returns if you set it too low.
|
|
While upsd normally has all of the data available to it instantly, most
|
|
drivers only refresh the UPS status once every 2 seconds. Polling any
|
|
more than that usually doesn't get you the information any faster.
|
|
|
|
*POLLFREQALERT* 'seconds'::
|
|
|
|
This is the interval that upsmon waits between polls if any of its UPSes
|
|
are on battery. You can use this along with POLLFREQ above to slow down
|
|
polls during normal behavior, but get quicker updates when something bad
|
|
happens.
|
|
+
|
|
This should always be equal to or lower than the POLLFREQ value. By
|
|
default it is also set 5 seconds.
|
|
+
|
|
The warnings from the POLLFREQ entry about too-high and too-low values
|
|
also apply here.
|
|
|
|
*POWERDOWNFLAG* 'filename'::
|
|
|
|
upsmon creates this file when running in master mode when the UPS needs
|
|
to be powered off. You should check for this file in your shutdown
|
|
scripts and call `upsdrvctl shutdown` if it exists.
|
|
+
|
|
This is done to forcibly reset the slaves, so they don't get stuck at
|
|
the "halted" stage even if the power returns during the shutdown
|
|
process. This usually does not work well on contact-closure UPSes that
|
|
use the genericups driver.
|
|
+
|
|
See the shutdown.txt file in the docs subdirectory for more information.
|
|
|
|
*RBWARNTIME* 'seconds'::
|
|
|
|
When a UPS says that it needs to have its battery replaced, upsmon will
|
|
generate a NOTIFY_REPLBATT event. By default, this happens every 43200
|
|
seconds (12 hours).
|
|
+
|
|
If you need another value, set it here.
|
|
|
|
*RUN_AS_USER* 'username'::
|
|
|
|
upsmon normally runs the bulk of the monitoring duties under another user
|
|
ID after dropping root privileges. On most systems this means it runs
|
|
as "nobody", since that's the default from compile-time.
|
|
+
|
|
The catch is that "nobody" can't read your upsmon.conf, since by default
|
|
it is installed so that only root can open it. This means you won't be
|
|
able to reload the configuration file, since it will be unavailable.
|
|
+
|
|
The solution is to create a new user just for upsmon, then make it run
|
|
as that user. I suggest "nutmon", but you can use anything that isn't
|
|
already taken on your system. Just create a regular user with no special
|
|
privileges and an impossible password.
|
|
+
|
|
Then, tell upsmon to run as that user, and make `upsmon.conf` readable by it.
|
|
Your reloads will work, and your config file will stay secure.
|
|
+
|
|
This file should not be writable by the upsmon user, as it would be
|
|
possible to exploit a hole, change the SHUTDOWNCMD to something
|
|
malicious, then wait for upsmon to be restarted.
|
|
|
|
*SHUTDOWNCMD* 'command'::
|
|
|
|
upsmon runs this command when the system needs to be brought down. If
|
|
it is a slave, it will do that immediately whenever the current overall
|
|
power value drops below the MINSUPPLIES value above.
|
|
+
|
|
When upsmon is a master, it will allow any slaves to log out before
|
|
starting the local shutdown procedure.
|
|
+
|
|
Note that the command needs to be one element in the config file. If
|
|
your shutdown command includes spaces, then put it in quotes to keep it
|
|
together, i.e.:
|
|
|
|
SHUTDOWNCMD "/sbin/shutdown -h +0"
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkman:upsmon[8], linkman:upsd[8], linkman:nutupsdrv[8].
|
|
|
|
Internet resources:
|
|
~~~~~~~~~~~~~~~~~~~
|
|
The NUT (Network UPS Tools) home page: http://www.networkupstools.org/
|