nut-debian/docs/design.txt

213 lines
7.9 KiB
Plaintext
Raw Normal View History

2011-01-26 11:35:08 +02:00
NUT design document
===================
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
This software is designed around a layered scheme with drivers, a
2010-03-26 01:20:59 +02:00
server and clients. These layers communicate with text-based
protocols for easier maintenance and diagnostics.
The layering
2011-01-26 11:35:08 +02:00
------------
2010-03-26 01:20:59 +02:00
2011-01-26 11:35:08 +02:00
image:images/nut_layering.png[NUT layering]
2010-03-26 01:20:59 +02:00
How information gets around
2011-01-26 11:35:08 +02:00
---------------------------
2010-03-26 01:20:59 +02:00
From the equipment
2011-01-26 11:35:08 +02:00
~~~~~~~~~~~~~~~~~~
2010-03-26 01:20:59 +02:00
DRIVERS talk to the EQUIPMENT and receive updates. For most hardware this
is polled (DRIVER asks EQUIPMENT about a variable), but forced updates are
also possible. The exact method is not important, as it is abstracted
by the driver.
From the driver
2011-01-26 11:35:08 +02:00
~~~~~~~~~~~~~~~
2010-03-26 01:20:59 +02:00
The core of all DRIVERS maintains internal storage for every variable
2022-07-10 10:23:45 +03:00
that is known along with the auxiliary data for those variables. It
2010-03-26 01:20:59 +02:00
sends updates to this data to any process which connects to the Unix
domain socket.
The DRIVERS will also provide a full atomic copy of their internal
knowledge upon receiving the "DUMPALL" command on the socket. The dump
is in the same format as updates, and is followed by "DUMPDONE". When
"DUMPDONE" has been received, the view is complete.
The SERVER will connect to the socket of each DRIVER and will request a
dump at that time. It retains this data in local storage for later use.
It continues to listen on the socket for additional updates.
2022-07-10 10:23:45 +03:00
This protocol is documented in link:sock-protocol.txt[].
2010-03-26 01:20:59 +02:00
From the server
2011-01-26 11:35:08 +02:00
~~~~~~~~~~~~~~~
2010-03-26 01:20:59 +02:00
The SERVER's internal storage maintains a complete copy of the data
which is in the DRIVER, so it is capable of answering any request
immediately. When a request for data arrives from a CLIENT, the SERVER
looks through the internal storage for that UPS and returns the
requested data if it is available.
2022-07-10 10:23:45 +03:00
The format for requests from the CLIENT is documented in link:protocol.txt[].
2010-03-26 01:20:59 +02:00
Instant commands
2011-01-26 11:35:08 +02:00
----------------
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
"Instant commands" is the term given to a set of actions that result in
2010-03-26 01:20:59 +02:00
something happening to the UPS. Some of the common ones are
2022-07-10 10:23:45 +03:00
`test.battery.start` to initiate a battery test and `test.panel.start` to
2010-03-26 01:20:59 +02:00
test the front panel of the UPS.
They are passed to the SERVER from a CLIENT using an authenticated
network connection. The SERVER first checks to make sure that the instant
2022-07-10 10:23:45 +03:00
command is valid for the DRIVER. If it's supported, a message is sent
2010-03-26 01:20:59 +02:00
via a socket to the DRIVER containing the command and any auxiliary
information.
At this point, there is no confirmation to the SERVER of the command's
execution. This is (still) planned for a future release. This has been
delayed since returning a response involves some potentially interesting
2022-07-10 10:23:45 +03:00
timing issues. Remember that `upsd` services clients in a round-robin
2010-03-26 01:20:59 +02:00
fashion, so all queries must be lightweight and speedy.
2022-07-10 10:23:45 +03:00
NOTE: FIXME: Wasn't "TRACKING" mechanism for "INSTCMD/SET VAR" introduced
to address just this? See https://github.com/networkupstools/nut/pull/659
2010-03-26 01:20:59 +02:00
Setting variables
2011-01-26 11:35:08 +02:00
-----------------
2010-03-26 01:20:59 +02:00
Some variables in the DRIVER or EQUIPMENT can be changed, and carry the
FLAG_RW flag. Upon receiving a SET command from the CLIENT, the SERVER
first verifies that it is valid for that DRIVER in terms of writability
and data type. If those checks pass, it then sends the SET command
through the socket, much like the instant command design.
The DRIVER is expected to commit the value to the EQUIPMENT and update
its internal representation of that variable.
Like the instant commands, there is currently no acknowledgement of the
command's completion from the DRIVER. This, too, is planned for a future
release.
2022-07-10 10:23:45 +03:00
NOTE: FIXME: Wasn't "TRACKING" mechanism for "INSTCMD/SET VAR" introduced
to address just this? See https://github.com/networkupstools/nut/pull/659
2010-03-26 01:20:59 +02:00
Example data path
2011-01-26 11:35:08 +02:00
-----------------
2010-03-26 01:20:59 +02:00
Here's the path a piece of data might take through this architecture.
The event is a UPS going on battery, and the final result is a pager
delivering the alpha message to the admin.
1. EQUIPMENT reports on battery by setting flag in status register
2022-07-10 10:23:45 +03:00
2. DRIVER notices this flag and stores it in the `ups.status` variable as
2010-03-26 01:20:59 +02:00
OB. This update gets pushed out to any listeners via the sockets.
2022-07-10 10:23:45 +03:00
3. SERVER `upsd` sees activity on the socket, reads it, parses it, and
commits the new data to its local version of the status variable.
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
4. CLIENT `upsmon` does a routine poll of SERVER for `ups.status` and
gets `OB`.
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
5. CLIENT `upsmon` then invokes its `NOTIFYCMD` which is `upssched`.
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
6. `upssched` starts up a daemon to handle a timer which will expire about
2010-03-26 01:20:59 +02:00
30 seconds into the future.
7. 30 seconds later, the timer expires since the UPS is still on battery,
2022-07-10 10:23:45 +03:00
and so `upssched` calls the `CMDSCRIPT` which is `upssched-cmd`.
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
8. `upssched-cmd` parses the args and calls `sendmail`.
2010-03-26 01:20:59 +02:00
9. Avian carriers, smoke signals, SMTP, and some magic result in the
message getting from the pager company's gateway to a transmitter
and then to the admin's pager.
This scenario requires some configuration, obviously:
2022-07-10 10:23:45 +03:00
1. There's an UPS driver running.
2010-03-26 01:20:59 +02:00
(Whatever applies for the hardware)
2022-07-10 10:23:45 +03:00
2. `upsd` has a valid UPS entry in 'ups.conf' for this UPS.
2010-03-26 01:20:59 +02:00
[myups]
2012-06-01 16:55:19 +03:00
driver = nutupsdrv
2010-03-26 01:20:59 +02:00
port = /dev/ttySx
2022-07-10 10:23:45 +03:00
3. `upsd` has a valid user for `upsmon` in 'upsd.users' file.
2010-03-26 01:20:59 +02:00
[monuser]
password = somepass
2022-07-10 10:23:45 +03:00
upsmon primary
4. `upsmon` is set to monitor this UPS with this user in 'upsmon.conf' file.
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
MONITOR myups@localhost 1 monuser somepass primary
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
5. `upsmon` is set to `EXEC` the `NOTIFYCMD` for the `ONBATT` condition in
'upsmon.conf' file.
2010-03-26 01:20:59 +02:00
NOTIFYFLAG ONBATT EXEC
2022-07-10 10:23:45 +03:00
6. `upsmon` calls `upssched` as the `NOTIFYCMD` in 'upsmon.conf' file.
2010-03-26 01:20:59 +02:00
NOTIFYCMD /path/to/upssched
2022-07-10 10:23:45 +03:00
7. `upssched` has a 30 second timer for `ONBATT` in 'upssched.conf' file.
2010-03-26 01:20:59 +02:00
AT ONBATT * START-TIMER upsonbatt 30
2022-07-10 10:23:45 +03:00
8. `upssched` calls `upssched-cmd` as the `CMDSCRIPT` in 'upssched.conf'.
2010-03-26 01:20:59 +02:00
CMDSCRIPT /path/to/upssched-cmd
2022-07-10 10:23:45 +03:00
9. `upssched-cmd` knows what to do with `upsonbatt` keyword as its first
argument (a quick `case..esac` construct, see the examples)
2010-03-26 01:20:59 +02:00
History
2011-01-26 11:35:08 +02:00
-------
2010-03-26 01:20:59 +02:00
The oldest versions of this software (1998) had no separation between
2022-07-10 10:23:45 +03:00
the driver and the network server, and only supported the latest APC
2010-03-26 01:20:59 +02:00
Smart-UPS hardware as a result. The network protocol used brittle
binary structs. This had numerous bad implications for compatibility
and portability.
After the driver and server were separated, data was shared through the
state file concept. Status was written into a static array (the "info
2022-07-10 10:23:45 +03:00
array") by drivers, and that array was stored on disk. The `upsd` would
2010-03-26 01:20:59 +02:00
periodically read that file into a local copy of that array.
Shared memory mode was added a bit later, and that removed some of the
lag from the status updates. Unfortunately, it didn't have any locking
originally, and the possibility for corruption due to races existed.
2022-07-10 10:23:45 +03:00
`mmap()` support was added at some point after that, and became the
default. The drivers and `upsd` would `mmap()` the file into memory and
2010-03-26 01:20:59 +02:00
read or write from it. Locking was done using the state file as the
token, so contention problems were avoided. This method was relatively
quick, but it involved at least 3 copies of the data (driver, disk/mmap,
server) and a whole lot of locking and unlocking. It could occasionally
delay the driver or server when waiting for a lock.
In April 2003, the entire state management subsystem was removed and
replaced with a single local socket. The drivers listen for
connections and push updates asynchronously to any listeners. They also
recognize a few commands. Drivers also dampen updates, and only push
them out when something actually changes.
2022-07-10 10:23:45 +03:00
As a result, `upsd` no longer has to poll any files on the disk, and can
just `select()` all of its file descriptors (fds) and wait for activity.
When one of them is active, it reads the fd and parses the results.
Updates from the hardware now get to `upsd` about as fast as they possibly
can.
2010-03-26 01:20:59 +02:00
2022-07-10 10:23:45 +03:00
Drivers used to call `setinfo()` to change the local array, and then would
call `writeinfo()` to push the array onto the disk, or into the
2010-03-26 01:20:59 +02:00
mmap/shared memory space. This introduced a lag since many drivers poll
quite a few variables during an update.