Testing without hardware

Starting various needed and optional components of drasi to test it (or just see how it looks) with a simulated trigger bus can be done without any special hardware using a simple script. Please note that the script will create a tmux session with many panes, and should therefore be run in a terminal window that is wider and higher than normal, e.g. some 250x50 characters. (If the window is too small, some panes will not be created, leading to missing tmux components.) To run the script:

scripts/runsim.sh

It uses the tmux terminal multiplexer to operate and view the output of several programs at the same time. (Note that this is for testing; it is not how drasi is intended to be operated in normal use.)

To detach from the tmux session: hit C-b d, i.e. Ctrl+b followed by a lone d. (On the land account, C-b has apparently been replaced by C-a.) To flip through the panes of a session: hit C-b o (repeatedly). Or to select a specific one: hit C-b q followed by the pane number (while it is shown).

Options

With no options, scripts/runsim.sh starts a master readout and one event builder. If a session already exists, it will be attached instead of a new one being created. Options:

--kill

Terminate running session.

--slaves=N[,M...]

Create N slaves together with the master.

Given several comma-separated numbers, multiple inpendent trigger buses are set up, each with a master (and an event-builder).

--no-eb

Do not use event builders for single masters.

--timesort

Create a time-sorter process, that reads the data from all event-builders.

--no-timesort2

Do not add the second (verification) time-sorter stage.

--trans=N

Start a transport server (allowing N clients) from the last event builder or time sorter process.

--stream[=N|hold]

Start a stream server (allowing N clients) from the last event builder or time sorter process.

hold

All data must be sent to one client.

--firsttrans[=N|hold]

Transport server from first stage.

--firststream[=N|hold]

Stream server from first stage.

--level=LVL

Display more messages from the logfile in the log pane. LVL is either info, log or debug.

--segfault=N

Cause segfault at event N in master.

--trigrate=N

Trigger rate.

--event-size=N

Event size (master nodes).

--timeout=LIM

Since the tmux session can continue in the background, the processes are by default started with a 1 hour timeout limit.

This option changes the timeout, e.g. 20s or 2h.

--valgrind

Run all processes under valgrind.

--regress

Use f_user_regress instead of f_user_example.

--session=NAME

Name of session (default $SESSION).

--no-attach

Do not attach to session after creation.

Overview of terminals

The panes in the left column is the master readout at the top, followed by any slaves and the event builder process at the bottom.

The panes in the middle column show raw monitoring of each respective process in the left column.

The right column has the trigger bus simulator at the top, followed by a rate meter (lwrocmon --rate), and the log file writer process (which connects to each readout/event builder) (lwrocmon --log). At the bottom is the log file colouriser (lwroclog), tailing the log file.

Testing failure handling

drasi is designed such that the failure of any process shall not bring any other process down. (Readout may of course come to a momentary halt.) Once the failed process is up again, the other reconnect and operations are resumed.

Aborting processes

This can be tested by selecting any pane and hitting Ctrl-C, followed by arrow up and enter, to restart the program. Note that some processes do catch Ctrl-C to do an orderly shutdown, but do not yet manage to complete the shutdown (mainly the event builder). Hitting Ctrl-C twice more will bring them down.

The exception to this rule is killing the trigger bus simulator: killing that is not intended, and doing that is sort of equivalent of all trigger modules causing a SIGBUS, and thus requires the restart of all readouts as well.

Trigger bus mismatches

The trigger bus simulator will upon receipt of SIGUSR1 inject a MISMATCH in a randomly selected trigger module. This can be provoked by doing (in some other terminal window):

killall -USR1 trigbussim

Then the mismatch will be detected, reported and the readout sequence should restart itself.

Automatic failure testing

The process of repeatedly giving the readout components hick-ups is rather tedious. An automatic script can be run in a terminal alongside the simulation:

scripts/simstress.sh [--slaves=N]

It shall be given the same value for N as the simulation.

It will randomly inject either mismatches, or send Ctrl-C signals to a process or the event builder. After a Ctrl-C the respective process is restarted.

It then checks to see that the system has come up again. Up means that a non-zero rate of events pass the event builder. If it is not up within 3 seconds, the stress script is aborted. (It has then likely found a bug / loophole in the drasi error handling.)

A work in progress…

It is still fairly easy to get the various readout or merge components into a state where they do not recover properly. This is being worked on.

Graphical overview

An ncurses-based graphical overview of the components of a drasi DAQ system can be obtained using the monitor lwrocmon --tree option:

bin/lwrocmon --tree localhost:24000

The monitor will automatically discover all connected drasi processes and thus walk the tree of trigger bus and data transport connections. (If several independent trigger buses are simulated, a time sorter is required for the walk to find the instances associated with other trigger buses.)