Memory buffers

This tuning guide briefly explains the data buffers used by drasi, and how to tune them.

Remember: memory for merger (event builder and time sorter) nodes is cheap! If the need to tune is due to overall memory constraints, upgrade the memory.

Circular buffers

Circular buffers for input, main stage and filter.

Circular data buffers are used after the main stage (readout or merger), after an (optional) filter, and for inputs to the merger. The readout or the merge mode decides the data format. The filter can produce a different data format. Options are used to configure sources for the merger, as well as data destinations.

The data buffers are implemented as circular buffers, with one producer and one or more consumers. This means that the entire memory area allocated to a buffer is available for use; as soon as some part has been used by all consumers, it can be reused.

Readout (output) buffer

The only buffer allocated by a readout node is the readout buffer. Usually most of the memory available on the readout node is reserved for use by the readout buffer. It is generally allocated using special means, to allow hardware-assisted block-copy readout directly into the buffer.

To make effective use of delayed event-building, this buffer must be large enough to hold the data from one in-spill period.

Merger output buffer

The output buffer of a merging node plays the same role as in the readout node. It is filled by merged events.

Merger input buffers

The event-builder and time-sorters read from all available sources simultaneusly. Each data stream fills an input buffer dedicated to that source.

Event-building or time-sorting cannot process an event unless at least one event is available in the buffer of each source. This can cause suboptimal network usage, especially in a delayed event-building scenario, if these input buffers are too small. For sources that happen to have filled their input buffer, the network data transport will be held momentarily, while other buffers are empty, or the merger has not caught up with the sudden burst of data yet. To avoid that, each of these buffers should be large enough to hold the data of at least one spill period from each source.

Since it is usually difficult to upgrade the memory on readout nodes (it tends to be soldered directly onto the board), situations where this is an issue can still be somewhat improved by providing enough input buffer space for each input on the event builder node. Then only readout nodes with too little buffer space have to resume data transmission in spill.

Tuning buffer sizes

Configuring a readout buffer is simple: use as much memory as available.

For a merger, the available memory has to be shared between input and output buffers. A reasonable split is to use 75%-50% for the input buffers and the remainder for the output buffer. Note however that the input buffers are subtracted from the output buffer size, which thus can be configured in relation just to available memory. The memory for the input buffers should then be shared in proportion to the amount of data produced by each source.

While it is possible to give sizes for individual input buffers explicitly, the default is to use 75% of the size given for the output buffer for input buffers. That space it then shared equally between the input sources. The size of individual input buffers can also be given as a percentage of how much of the input space to use for that source. Other sources will share the remaining space equally.

Virtual buffer size and maximum event size

It should be noted that the LMD buffer size (e.g. of a transport or stream server, or an output file) are completely virtual. They do not correspond to an actual memory allocation where data is prepared. Data is written directly from the readout or merger output buffers. The LMD buffer size is still important as it implies a maximum event size, since no event can be larger than one buffer.

In the case of ‘nohold’ network data servers (i.e. where clients are allowed to lag and skip data), memory must be set aside such that a lagging client still can get a consistent old buffer. This space is for each such possible client connection the size of the LMD buffer and subtracted from the output buffer.

Normally, LMD events can span multiple buffers, but never more than a fixed number of them, which makes up a so called stream. The limitation comes as the basic allocation unit was the buffers of one stream. drasi does away with event spanning by always setting the number of buffers per stream to 1 in data it produces. The buffer size then directly limits the maximum event length. It is however able to read inputs where events are spanning multiple buffers.

Consistency checks

In order to not run into accidental stall situations, or ping-pong behaviour between buffer producers and consumers, drasi requires that the output buffer is at least 4 times larger than any (virtual) LMD output buffer size. It also requires that the LMD output buffer size can hold the maximum configured event size. (Which in turn is strictly checked during readout/merging).

For merging, the maximum event size of each input will be summed up, and this must be smaller than the configured maximum event size. For a source where the maximum event size is not given, it will be assumed as the worst case size possible by buffer*streams. (Note that when using the drasi internal protocol, the maximum event size is passed along also, such that large virtual buffers can be used without implying large event sizes while not having to explicitly configure the input).

The reason for these tests is to deal with any issues at startup, instead of having the merger choke at some unspecified later time. (Still, it is strictly verified that each input event adheres to any promises given.)

Summary

Buffer size relationships.

The different buffer sizes and their relationships. When sizes are not explicitly given, green automatic rules are applied. Input and ‘nohold’ buffers are subtracted from the given output buffer size. The red consistency checks are always enforced.

Giving the --buf=size= is mandatory (limited by available memory), and --max-ev-size= is usually needed (given by readout hardware, or merger sources). The other buffer sizes will be automatically calculated (green rules), which is usually sufficient, if the machine has a reasonable amount of memory. The red consistency checks are always performed.

Area Option Default Constraint
Total memory --buf=size    
‘nohold’ reserve --server=nohold, Virtual LMD buf.  
Input buffers

--buf=input=frac|%

--drasi=inbufsize=

25%

> 5% and < 95% of total

≥ 2 × max event size of source

Output buffer   Remaining ≥ 4 × virtual LMD buffer size
Other sizes Option Default Constraint
Max event size --max-ev-size=  

EB: ≥ ∑ sources’ max event size

TS: ≥ max source’s event size

Virtual LMD buf. --server=bufsize=

2 × event size

min 32 kiB

≥ 2 × max event size

Server machine configuration

Swap space

Extending the available RAM using disk space, i.e. enabling swap space, should under no circumstances be done for a machine used for any data acquisition task. The reason is the abysmal performance that can be (randomly) obtained for an otherwise well-performing system, due to the large latencies of disk storage. Not even some small swap space for ‘just unused system tools’ is a good idea.

ECC memory

Memory with ECC error detection should be used wherever possible. (The author has seen enough memory errors to insist on this on all machines.)