231 lines
7.9 KiB
Plaintext
231 lines
7.9 KiB
Plaintext
Desc: Configuring automatic UPS shutdowns
|
|
File: shutdown.txt
|
|
Date: 24 August 2003
|
|
Auth: Russell Kroll <rkroll@exploits.org>
|
|
|
|
Shutdown design
|
|
===============
|
|
|
|
When your UPS batteries get low, the operating system needs to be brought
|
|
down cleanly. Also, the UPS load should be turned off so that all devices
|
|
that are attached to it are forcibly rebooted.
|
|
|
|
Here are the steps that occur when a critical power event happens:
|
|
|
|
1. The UPS goes on battery
|
|
|
|
2. The UPS reaches low battery (a "critical" UPS)
|
|
|
|
3. The upsmon master notices and sets "FSD" - the "forced shutdown"
|
|
flag to tell all slave systems that it will soon power down the load.
|
|
|
|
(If you have no slaves, skip to step 6)
|
|
|
|
4. upsmon slave systems see "FSD" and:
|
|
|
|
- generate a NOTIFY_SHUTDOWN event
|
|
- wait FINALDELAY seconds - typically 5
|
|
- call their SHUTDOWNCMD
|
|
- disconnect from upsd
|
|
|
|
5. The upsmon master system waits up to HOSTSYNC seconds (typically 15)
|
|
for the slaves to disconnect from upsd. If any are connected after
|
|
this time, upsmon stops waiting and proceeds with the shutdown
|
|
process.
|
|
|
|
6. The upsmon master:
|
|
|
|
- generates a NOTIFY_SHUTDOWN event
|
|
- waits FINALDELAY seconds - typically 5
|
|
- creates the POWERDOWNFLAG file - usually /etc/killpower
|
|
- calls the SHUTDOWNCMD
|
|
|
|
7. On most systems, init takes over, kills your processes, syncs and
|
|
unmounts some filesystems, and remounts some read-only.
|
|
|
|
8. init then runs your shutdown script. This checks for the
|
|
POWERDOWNFLAG, finds it, and tells the UPS driver(s) to power off
|
|
the load.
|
|
|
|
9. The system loses power.
|
|
|
|
10. Time passes. The power returns, and the UPS switches back on.
|
|
|
|
11. All systems reboot and go back to work.
|
|
|
|
How you set it up
|
|
=================
|
|
|
|
1. Make sure your POWERDOWNFLAG setting in upsmon.conf points somewhere
|
|
reasonable. Specifically, that filesystem must be mounted when your
|
|
shutdown script runs.
|
|
|
|
2. Edit your shutdown scripts to check for the POWERDOWNFLAG so they know
|
|
when to power off the UPS. You must check for this file, as you don't
|
|
want this to happen during normal shutdowns!
|
|
|
|
You can use upsdrvctl to start the shutdown process in your UPS
|
|
hardware. Use this script as an example, but change the paths to
|
|
suit your system:
|
|
|
|
if (test -f /etc/killpower)
|
|
then
|
|
echo "Killing the power, bye!"
|
|
/usr/local/ups/bin/upsdrvctl shutdown
|
|
|
|
sleep 120
|
|
|
|
# uh oh... the UPS poweroff failed!
|
|
# you probably should reboot here to avoid getting stuck
|
|
# *** see the section on power races below ***
|
|
fi
|
|
|
|
Make sure the filesystem containing upsdrvctl, ups.conf and your UPS
|
|
driver(s) is mounted when the system gets to this point. Otherwise
|
|
it won't be able to figure out what to do.
|
|
|
|
RAID warning
|
|
============
|
|
|
|
NOTE: If you run any sort of RAID equipment, make sure your arrays
|
|
are either halted (if possible) or switched to "read-only" mode.
|
|
Otherwise you may suffer a long resync once the system comes back up.
|
|
|
|
The kernel may not ever run its final shutdown procedure, so you
|
|
must take care of all array shutdowns in userspace before upsdrvctl
|
|
runs.
|
|
|
|
If you use software RAID (md) on Linux, get mdadm and try using
|
|
'mdadm --readonly' to put your arrays in a safe state. This has to
|
|
happen after your shutdown scripts have remounted the filesystems.
|
|
|
|
On hardware RAID or other kernels, you have to do some detective work.
|
|
It may be necessary to contact the vendor or the author of your
|
|
driver to find out how to put the array in a state where a power loss
|
|
won't leave it "dirty".
|
|
|
|
My understanding is that 3ware devices on Linux will be fine unless
|
|
there are pending writes. Make sure your filesystems are remounted
|
|
read-only and you should be covered.
|
|
|
|
Multiple UPS shutdowns
|
|
======================
|
|
|
|
If you have multiple UPSes connected to your system, chances are that you
|
|
need to shut them down in a specific order. The goal is to shut down
|
|
everything but the one keeping upsmon alive at first, then you do that one
|
|
last.
|
|
|
|
To set the order in which your UPSes receive the shutdown commands, define
|
|
the "sdorder" value in your ups.conf.
|
|
|
|
[bigone]
|
|
driver = apcsmart
|
|
port = /dev/ttyS0
|
|
sdorder = 2
|
|
|
|
[littleguy]
|
|
driver = bestups
|
|
port = /dev/ttyS1
|
|
sdorder = 1
|
|
|
|
[misc]
|
|
driver = megatec
|
|
port = /dev/ttyS2
|
|
sdorder = 0
|
|
|
|
The order runs from 0 to the highest number available. So, for this
|
|
configuration, the order of shutdowns would be misc, littleguy, and then
|
|
bigone.
|
|
|
|
If you have a UPS that shouldn't be shutdown when running "upsdrvctl
|
|
shutdown", set the sdorder to -1.
|
|
|
|
Testing shutdowns
|
|
=================
|
|
|
|
To see how upsdrvctl will behave without actually turning off power, use
|
|
the -t argument. It will display the sequence without actually calling
|
|
the drivers.
|
|
|
|
Other issues
|
|
============
|
|
|
|
You may delete the POWERDOWNFLAG in the startup scripts, but it is not
|
|
necessary. upsmon will clear that file for you when it starts.
|
|
|
|
Remember that some operating systems unmount a good number of filesystems
|
|
when going into read-only mode. If the UPS software is installed to /usr
|
|
and it's not mounted, your shutdowns will fail. If this happens, either
|
|
make sure it stays mounted at shutdown, or install to another partition.
|
|
|
|
Power races
|
|
===========
|
|
|
|
There is a situation where the power may return during the shutdown
|
|
process. This is known as a race. Here's how we handle it.
|
|
|
|
"Smart" UPSes typically handle this by using a command that forces the UPS
|
|
to power the load off and back on. This way, you are assured that the
|
|
systems will restart even if the power returns at the worst possible
|
|
moment.
|
|
|
|
Contact closure units (ala genericups), on the other hand, have the
|
|
potential for a race when feeding multiple systems. This is due to the
|
|
design of most contact closure UPSes. Typically, the "kill power" line
|
|
only functions when running on battery. As a result, if the line power
|
|
returns during the shutdown process, there is no way to power down the
|
|
load.
|
|
|
|
The workaround is to force your systems to reboot after some
|
|
interval. This way, they won't be stuck in the halted state with the UPS
|
|
running on line power.
|
|
|
|
Testing power races
|
|
===================
|
|
|
|
The easiest way to see if your configuration will handle a power race
|
|
successfully is to do 'upsmon -c fsd'. This will force the UPS software
|
|
to shut down as if it had a OB+LB situation, and your shutdown script
|
|
should call the UPS driver(s) in shutdown mode.
|
|
|
|
If everything works correctly, the computer will be forcibly powered off,
|
|
may remain off for a few seconds to a few minutes (depending on the
|
|
driver and UPS type), then will power on again.
|
|
|
|
If your UPS just sits there and never resets the load, you are vulnerable
|
|
to the above power race and should add the "reboot after timeout" hack
|
|
at the very least.
|
|
|
|
Know your hardware
|
|
==================
|
|
|
|
UPS equipment varies from manufacturer to manufacturer and even within
|
|
model lines. You should test the shutdown sequence on your systems before
|
|
leaving them unattended. A successful sequence is one where the OS halts
|
|
before the battery runs out, and the system restarts when power returns.
|
|
|
|
One more tip
|
|
============
|
|
|
|
If your UPS powers up immediately after a power failure instead of
|
|
waiting for the batteries to recharge, you can rig up a little hack to
|
|
handle it in software.
|
|
|
|
Essentially, you need to test for the POWERDOWNFLAG in your *startup*
|
|
scripts while the filesystems are still read-only. If it's there, you
|
|
know your last shutdown was caused by a power failure and the UPS
|
|
battery is probably still quite weak.
|
|
|
|
In this situation, your best bet is to sleep it off. Pausing in your
|
|
startup script to let the batteries recharge with the filesystems in a
|
|
safe state is recommended. This way, if the power goes out again, you
|
|
won't face a situation where there's not enough battery capacity left
|
|
for upsmon to do its thing.
|
|
|
|
Exactly how long to wait is a function of your UPS hardware, and will
|
|
require careful testing.
|
|
|
|
If this is too evil for you, buy another kind of UPS that will either
|
|
wait for a minimum amount of charge, a minimum amount of time, or both.
|