220 lines
7.9 KiB
Plaintext
220 lines
7.9 KiB
Plaintext
Desc: Network UPS Tools design document
|
|
File: design.txt
|
|
Date: 14 March 2004
|
|
Auth: Russell Kroll <rkroll@exploits.org>
|
|
|
|
This software is designed around a layered scheme with drivers, a
|
|
server and clients. These layers communicate with text-based
|
|
protocols for easier maintenance and diagnostics.
|
|
|
|
The layering
|
|
============
|
|
|
|
CLIENTS: upsmon, upsc, upsrw, upsstats, upsset, etc. (via upsclient)
|
|
|
|
< network: TCP sockets, typically on port 3493 >
|
|
|
|
SERVER: upsd
|
|
|
|
< Unix domain sockets with text-based messages >
|
|
|
|
DRIVERS: apcsmart, bestups, powercom, etc.
|
|
|
|
< serial communications, SNMP, USB, etc. >
|
|
|
|
EQUIPMENT: Smart-UPS 700, Fenton PowerPal 660, etc. (actual UPS hardware)
|
|
|
|
How information gets around
|
|
===========================
|
|
|
|
From the equipment
|
|
------------------
|
|
|
|
DRIVERS talk to the EQUIPMENT and receive updates. For most hardware this
|
|
is polled (DRIVER asks EQUIPMENT about a variable), but forced updates are
|
|
also possible. The exact method is not important, as it is abstracted
|
|
by the driver.
|
|
|
|
From the driver
|
|
---------------
|
|
|
|
The core of all DRIVERS maintains internal storage for every variable
|
|
that is known along with the auxiliary data for those variables. It
|
|
sends updates to this data to any process which connects to the Unix
|
|
domain socket.
|
|
|
|
The DRIVERS will also provide a full atomic copy of their internal
|
|
knowledge upon receiving the "DUMPALL" command on the socket. The dump
|
|
is in the same format as updates, and is followed by "DUMPDONE". When
|
|
"DUMPDONE" has been received, the view is complete.
|
|
|
|
The SERVER will connect to the socket of each DRIVER and will request a
|
|
dump at that time. It retains this data in local storage for later use.
|
|
It continues to listen on the socket for additional updates.
|
|
|
|
This protocol is documented in sock-protocol.txt.
|
|
|
|
From the server
|
|
---------------
|
|
|
|
The SERVER's internal storage maintains a complete copy of the data
|
|
which is in the DRIVER, so it is capable of answering any request
|
|
immediately. When a request for data arrives from a CLIENT, the SERVER
|
|
looks through the internal storage for that UPS and returns the
|
|
requested data if it is available.
|
|
|
|
The format for requests from the CLIENT is documented in protocol.txt.
|
|
|
|
Instant commands
|
|
================
|
|
|
|
Instant commands is the term given to a set of actions that result in
|
|
something happening to the UPS. Some of the common ones are
|
|
test.battery.start to initiate a battery test and test.panel.start to
|
|
test the front panel of the UPS.
|
|
|
|
They are passed to the SERVER from a CLIENT using an authenticated
|
|
network connection. The SERVER first checks to make sure that the instant
|
|
command is valid for the DRIVER. If it's supported, a message is sent
|
|
via a socket to the DRIVER containing the command and any auxiliary
|
|
information.
|
|
|
|
At this point, there is no confirmation to the SERVER of the command's
|
|
execution. This is (still) planned for a future release. This has been
|
|
delayed since returning a response involves some potentially interesting
|
|
timing issues. Remember that upsd services clients in a round-robin
|
|
fashion, so all queries must be lightweight and speedy.
|
|
|
|
Setting variables
|
|
=================
|
|
|
|
Some variables in the DRIVER or EQUIPMENT can be changed, and carry the
|
|
FLAG_RW flag. Upon receiving a SET command from the CLIENT, the SERVER
|
|
first verifies that it is valid for that DRIVER in terms of writability
|
|
and data type. If those checks pass, it then sends the SET command
|
|
through the socket, much like the instant command design.
|
|
|
|
The DRIVER is expected to commit the value to the EQUIPMENT and update
|
|
its internal representation of that variable.
|
|
|
|
Like the instant commands, there is currently no acknowledgement of the
|
|
command's completion from the DRIVER. This, too, is planned for a future
|
|
release.
|
|
|
|
Example data path
|
|
=================
|
|
|
|
Here's the path a piece of data might take through this architecture.
|
|
The event is a UPS going on battery, and the final result is a pager
|
|
delivering the alpha message to the admin.
|
|
|
|
1. EQUIPMENT reports on battery by setting flag in status register
|
|
|
|
2. DRIVER notices this flag and stores it in the ups.status variable as
|
|
OB. This update gets pushed out to any listeners via the sockets.
|
|
|
|
3. SERVER upsd sees activity on the socket, reads it, parses it, and
|
|
commits the new data to its local version of the status variable.
|
|
|
|
4. CLIENT upsmon does a routine poll of SERVER for "ups.status" and
|
|
gets "OB".
|
|
|
|
5. CLIENT upsmon then invokes its NOTIFYCMD which is upssched.
|
|
|
|
6. upssched starts up a daemon to handle a timer which will expire about
|
|
30 seconds into the future.
|
|
|
|
7. 30 seconds later, the timer expires since the UPS is still on battery,
|
|
and upssched calls the CMDSCRIPT upssched-cmd.
|
|
|
|
8. upssched-cmd parses the args and calls sendmail.
|
|
|
|
9. Avian carriers, smoke signals, SMTP, and some magic result in the
|
|
message getting from the pager company's gateway to a transmitter
|
|
and then to the admin's pager.
|
|
|
|
This scenario requires some configuration, obviously:
|
|
|
|
1. There's a UPS driver running.
|
|
(Whatever applies for the hardware)
|
|
|
|
2. upsd has a valid UPS entry in ups.conf for this UPS.
|
|
|
|
[myups]
|
|
driver = upsdriver
|
|
port = /dev/ttySx
|
|
|
|
3. upsd has a valid user for upsmon in upsd.users.
|
|
|
|
[monuser]
|
|
password = somepass
|
|
upsmon master
|
|
|
|
4. upsmon is set to monitor this UPS in upsmon.conf.
|
|
|
|
MONITOR myups@localhost 1 monuser somepass master
|
|
|
|
5. upsmon is set to EXEC the NOTIFYCMD for the ONBATT condition in
|
|
upsmon.conf.
|
|
|
|
NOTIFYFLAG ONBATT EXEC
|
|
|
|
6. upsmon calls upssched as the NOTIFYCMD in upsmon.conf.
|
|
|
|
NOTIFYCMD /path/to/upssched
|
|
|
|
7. upssched has a 30 second timer for ONBATT in upssched.conf.
|
|
|
|
AT ONBATT * START-TIMER upsonbatt 30
|
|
|
|
8. upssched calls upssched-cmd as the CMDSCRIPT in upssched.conf.
|
|
|
|
CMDSCRIPT /path/to/upssched-cmd
|
|
|
|
8. upssched-cmd knows what to do with "upsonbatt" as its first argument
|
|
(A quick case..esac construct, see the examples)
|
|
|
|
==============================================================================
|
|
|
|
History
|
|
=======
|
|
|
|
The oldest versions of this software (1998) had no separation between
|
|
the driver and the network server and only supported the latest APC
|
|
Smart-UPS hardware as a result. The network protocol used brittle
|
|
binary structs. This had numerous bad implications for compatibility
|
|
and portability.
|
|
|
|
After the driver and server were separated, data was shared through the
|
|
state file concept. Status was written into a static array (the "info
|
|
array") by drivers, and that array was stored on disk. upsd would
|
|
periodically read that file into a local copy of that array.
|
|
|
|
Shared memory mode was added a bit later, and that removed some of the
|
|
lag from the status updates. Unfortunately, it didn't have any locking
|
|
originally, and the possibility for corruption due to races existed.
|
|
|
|
mmap() support was added at some point after that, and became the
|
|
default. The drivers and upsd would mmap() the file into memory and
|
|
read or write from it. Locking was done using the state file as the
|
|
token, so contention problems were avoided. This method was relatively
|
|
quick, but it involved at least 3 copies of the data (driver, disk/mmap,
|
|
server) and a whole lot of locking and unlocking. It could occasionally
|
|
delay the driver or server when waiting for a lock.
|
|
|
|
In April 2003, the entire state management subsystem was removed and
|
|
replaced with a single local socket. The drivers listen for
|
|
connections and push updates asynchronously to any listeners. They also
|
|
recognize a few commands. Drivers also dampen updates, and only push
|
|
them out when something actually changes.
|
|
|
|
As a result, upsd no longer has to poll any files on the disk, and can
|
|
just select() all of its fds and wait for activity. When one of them is
|
|
active, it reads the fd and parses the results. Updates from the
|
|
hardware now get to upsd about as fast as they possibly can.
|
|
|
|
Drivers used to call setinfo() to change the local array, and then would
|
|
call writeinfo() to push the array onto the disk, or into the
|
|
mmap/shared memory space. This introduced a lag since many drivers poll
|
|
quite a few variables during an update.
|