Implementation Specifics¶
Eclipse Cyclone DDS runs with a default configuration for all of its settings. You can change the default
settings via an XML configuration located at a path defined by the variable CYCLONEDDS_URI
(set by the user). An example XML configuration follows:
/path/to/dds/configuration.xml
¶ 1<?xml version="1.0" encoding="utf-8"?>
2<CycloneDDS
3 xmlns="https://cdds.io/config"
4 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5 xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd"
6>
7 <Domain Id="any">
8 <General>
9 <Interfaces>
10 <NetworkInterface autodetermine="true" priority="default"
11 multicast="default" />
12 </Interfaces>
13 <AllowMulticast>default</AllowMulticast>
14 <MaxMessageSize>65500B</MaxMessageSize>
15 </General>
16 <Tracing>
17 <Verbosity>config</Verbosity>
18 <OutputFile>
19 ${HOME}/dds/log/cdds.log.${CYCLONEDDS_PID}
20 </OutputFile>
21 </Tracing>
22 </Domain>
23</CycloneDDS>
Discovery Behaviour¶
Proxy Participants and Endpoints¶
Eclipse Cyclone DDS is what the DDSI specification calls a stateful implementation. Writers only send data to discovered Readers, and Readers only accept data from discovered Writers. (There is one exception: the wWriter may choose to multicast the data, and anyone listening will be able to receive it. If a Reader has already discovered the Writer but not vice-versa, it may accept the data even though the connection is not fully established yet.
At present, not only can such asymmetrical discovery cause data to be delivered when it was perhaps not expected, but it can also cause indefinite blocking if the situation persists for a long time.) Consequently, for each remote participant and Reader or Writer, Eclipse Cyclone DDS internally creates a proxy participant, proxy Reader or proxy Writer. In the discovery process, Writers are matched with proxy Readers, and Readers are matched with proxy Writers, based on the topic name, type name and the QoS settings.
Proxies have the same natural hierarchy that ‘normal’ DDSI entities have: each proxy endpoint is owned by some proxy participant, and once the proxy participant is deleted, all of its proxy endpoints are deleted as well. Participants assert their liveliness periodically (called automatic liveliness in the DCPS specification, and the only mode currently supported by Eclipse Cyclone DDS). When nothing has been heard from a participant for the lease duration published by that participant in its SPDP message, the lease becomes expired, triggering a clean-up.
Under normal circumstances, deleting endpoints triggers disposes and un-registers in the SEDP protocol. Similarly, deleting a participant also creates special messages that allow the peers to immediately reclaim resources instead of waiting for the lease to expire.
Lingering Writers¶
When an application deletes a reliable DCPS Writer, there is no guarantee that all its Readers have already acknowledged the correct receipt of all samples. In such a case, Eclipse Cyclone DDS lets the Writer (and the owning participant if necessary) linger in the system for some time, controlled by the Internal/WriterLingerDuration option. The Writer is deleted when all Readers have acknowledged all samples or the linger duration has elapsed, whichever comes first.
Note
The Writer linger duration setting is currently not applied when Eclipse Cyclone DDS is requested to terminate.
Writer History QoS and Throttling¶
The DDSI specification heavily relies on the notion of a Writer history cache (WHC) within which a sequence number uniquely identifies each sample. This WHC integrates two different indices on the samples published by a Writer: one is on the sequence number, used for retransmitting lost samples, and one is on key value and is used for retaining the current state of each instance in the WHC.
The index on key value allows dropping samples from the index on sequence number when a new sample overwrites the state of an instance. For transient-local, it conversely (also) allows retaining the current state of each instance even when all Readers have acknowledged a sample.
The index on sequence number is required for retransmitting old data, and is therefore
needed for all reliable Writers. The index on key values is always required for
transient-local data, and will by default also be used for other Writers using a history
setting of KEEP_LAST
. In such a case, the advantage of an index on key value is superseded
samples can be dropped aggressively instead of delivering them to all Readers.
The disadvantage is that it is somewhat more resource-intensive.
The WHC distinguishes between history to be retained for existing Readers (controlled by the Writer’s history QoS setting) and the history to be retained for late-joining readers for transient-local writers (controlled by the topic’s durability-service history QoS setting). This makes it possible to create a Writer that never overwrites samples for live Readers while maintaining only the most recent samples for late-joining readers. Moreover, it ensures that the data that is available for late-joining Readers is the same for transient-local and for transient data.
Writer throttling is based on the WHC size using a simple controller. Once the WHC contains at least high bytes in unacknowledged samples, it stalls the Writer until the number of bytes in unacknowledged samples drops below Internal/Watermarks/WhcLow. The value of high is dynamically adjusted between Internal/Watermarks/WhcLow and Internal/Watermarks/WhcHigh based on transmit pressure and receive retransmit requests. The initial value of high is Internal/Watermarks/WhcHighInit and the adaptive behavior can be disabled by setting Internal/Watermarks/WhcAdaptive to false.
While the adaptive behaviour generally handles a variety of fast and slow Writers and Readers quite well, the introduction of a very slow Reader with small buffers in an existing network that is transmitting data at high rates can cause a sudden stop while the new Reader tries to recover the large amount of data stored in the Writer, before things can continue at a much lower rate.
Network and Discovery Configuration¶
Networking Interfaces¶
Eclipse Cyclone DDS can use multiple network interfaces simultaneously but defaults to using a single network interface. The set of enabled interfaces determines the addresses that the host advertises in the discovery information.
Default behaviour¶
To determine the default network interface, the eligible interfaces are ranked by quality and then the interface with the highest quality is selected. If multiple interfaces are of the highest quality, it will select the first enumerated one. Eligible interfaces are those that are up and have the right kind of address family (IPv4 or IPv6). Priority is then determined as follows:
interfaces with a non-link-local address are preferred over those with a link-local one;
multicast-capable is preferred (see also General/Interfaces/NetworkInterface[@multicast]), or if none is available
non-multicast capable and not point-to-point, or if none is available
point-to-point, or if none is available
loopback
If this procedure doesn’t select the desired interface automatically, it can be overridden by setting General/Interfaces by adding the interface(s) either by name of the interface (<NetworkInterface name='interface_name' />
), the IP address of the host on the desired interface (<NetworkInterface address='128.129.0.42' />
), or the network portion of the IP address of the host on the desired interface (<NetworkInterface address='128.11.0.0' />
). An exact match on the address is always preferred and is the only option that allows selecting the desired one when multiple addresses are tied to a single interface.
The default address family is IPv4, setting General/Transport to udp6
or tcp6
will change this to IPv6. Currently, Eclipse Cyclone DDS does not mix IPv4 and IPv6 addressing. Consequently, all DDSI participants in the network must use the same addressing mode. When inter-operating, this behaviour is the same, i.e., it will look at either IPv4 or IPv6 addresses in the advertised address information in the SPDP and SEDP discovery protocols.
IPv6 link-local addresses are considered undesirable because they need to be published and received via the discovery mechanism, but there is in general no way to determine to which interface a received link-local address is related.
If IPv6 is requested and the selected interface has a non-link-local address, Eclipse Cyclone DDS will operate in a global addressing mode and will only consider discovered non-link-local addresses. In this mode, one can select any set of interfaces for listening to multicasts. Note that this behaviour is essentially identical to that when using IPv4, as IPv4 does not have the formal notion of address scopes that IPv6 has. If instead only a link-local address is available, Eclipse Cyclone DDS will run in a link-local addressing mode. In this mode it will accept any address in a discovery packet, assuming that a link-local address is valid on the selected interface. To minimise the risk involved in this assumption, it only allows the selected interface for listening to multicasts.
Multiple network interfaces¶
Multiple network interfaces can be used simultaneously, by listing multiple NetworkInterface elements. In this case, the above still applies, with most things extended in the obvious manner, e.g., the SPDP packets will now advertise multiple addresses and it will send these packets out on all interfaces. That means the issue with link-local addressing discussed gains importance if link-local addresses are used, but they usually aren’t.
In a configuration with just a single network interface, it is obvious which one to use for sending packets to a peer. When there are multiple network interfaces, it is necessary to establish the set of interfaces via which multicasts can be sent, because these are sent on a specific interface. This in turn requires determining via which subset of interfaces a peer is reachable.
Cyclone DDS approaches this by checking which interfaces match the addresses advertised by a peer in its SPDP or SEDP messages, under the assumption that in most cases the peer will be attached to at least one of the configured networks and that checking the network parts of the addresses will result in a subset of the interfaces. The network interfaces in this subset are then the interfaces on which the peer is assumed to be reachable via multicast.
This leaves open two classes of addresses:
Loopback addresses: these are ignored unless (1) the configuration has enabled only loopback interfaces, (2) no other addresses are advertised in the discovery message, or (3) a non-loopback address matches that of the machine.
Routable addresses that do not match an interface: these are ignored if the General/DontRoute option is set, otherwise it is assumed that the network stack knows how to route them and any of the interfaces may be used.
When a message needs to be sent to a set of peers, Eclipse Cyclone DDS aims use the set of addresses spanning the set of intended recipients with the lowest cost (number of nodes that receive it without having a use for it, unicast vs multicast, loopback vs real network interface, configured priority). This is a variant of the set cover problem, and so Eclipse Cyclone DDS uses some heuristics rather than computing the optimal solution. The address selection can be influenced in two ways:
By using the
priority
attribute, which is used as an offset in the cost calculation. The default configuration gives loopback interfaces a slightly higher priority than other network types.By setting the
prefer_multicast
attribute, which raises the assumed cost of a unicast message.
The General/RedundantNetworking setting furthermore forces the address selection code to cover all interfaces advertised by a peer.
Overriding addresses/interfaces for readers/writers¶
The Partitioning element in the configuration allows configuring NetworkPartition elements and mapping topic/partition names to these “network partitions” using PartitionMappings elements.
Network partitions introduce alternative multicast addresses for data and/or restrict the set of unicast addresses (i.e., interfaces). In the DDSI discovery protocol, a reader can override the addresses at which it is reachable and this feature of the discovery protocol is used to advertise alternative multicast addresses and/or a subset of the unicast addresses. The writers in the network will use the addresses advertised by the reader rather than the default addresses advertised by the reader’s participant.
Unicast and multicast addresses in a network partition play different roles:
The multicast addresses specify an alternative set of addresses to be used instead of the participant’s default. This is particularly useful to limit high-bandwidth flows to the parts of a network where the data is needed (for IP/Ethernet, this assumes switches that are configured to do IGMP snooping).
The unicast addresses not only influence the set of interfaces that will be used for unicast, but thereby also the set of interfaces that will be considered for use by multicast. Thus, specifying a unicast address matching network interface A ensures all traffic to that reader will be using interface A, whether unicast or multicast.
Because the typical use of unicast addresses is to force traffic onto certain interfaces, the configuration also allows specifying interface names (using the interface
attribute).
The mapping of a data reader or writer to a network partition is indirect: first the partition and topic are matched against a table of partition mappings, partition/topic combinations to obtain the name of a network partition, then the network partition name is used to find the addressing information. This makes it easier to map many different partition/topic combinations to the same multicast address without having to specify the actual multicast address many times over. If no match is found, the default addresses are used.
Matching proceeds in the order in which the partition mappings are specified in the configuration. The first matching mapping is the one that will be used. The *
and ?
wildcards are available for the DCPS partition/topic combination in the partition mapping.
A single reader or writer is associated with a set of partitions, and each partition/topic combination can potentially map to a different network partition. In this case, the first matching network partition will be used. This does not affect what data the reader will receive; it only affects the addressing on the network.
Controlling Port Numbers¶
The port numbers used by Eclipse Cyclone DDS are determined as follows, where the first two items are given by the DDSI specification and the third is unique to Eclipse Cyclone DDS as a way of serving multiple participants by a single DDSI instance:
2 “well-known” multicast ports:
B
andB+1
2 unicast ports at which only this instance is listening:
B+PG*PI+10
andB+PG*PI+11
1 unicast port per domain participant it serves, chosen by the kernel from the anonymous ports, i.e. >= 32768
where:
B is Discovery/Ports/Base (
7400
) + Discovery/Ports/DomainGain (250
) * Domain[@Id]PG is Discovery/Ports/ParticipantGain (
2
)
The default values, taken from the DDSI specification, are in parentheses. There are actually even more parameters, here simply turned into constants as there is absolutely no point in ever changing these values; however, they are configurable and the interested reader can refer to the DDSI 2.1 or 2.2 specification, section 9.6.1.
PI is the most interesting, as it relates to having multiple processes in the same domain on a single node. Its configured value is either auto, none or a non-negative integer. This setting matters:
When it is auto, Eclipse Cyclone DDS probes UDP port numbers on start-up, starting with PI = 0, incrementing it by one each time until it finds a pair of available port numbers, or it hits the limit. The maximum PI it will ever choose is Discovery/MaxAutoParticipantIndex to limit the cost of unicast discovery.
When it is none (which is the default) it simply ignores the “participant index” altogether and asks the kernel to pick random ports (>= 32768). This eliminates the limit on the number of standalone deployments on a single machine and works fine with multicast discovery while complying with all other parts of the specification for interoperability. However, it is incompatible with unicast discovery.
When it is a non-negative integer, it is the value of PI in the above calculations. If multiple processes on a single machine are needed, they will need unique values for PI, and so for standalone deployments this particular alternative is of little use.
To fully control port numbers, setting Discovery/ParticipantIndex (= PI) to a hard-coded value is the only possibility. By fixing PI, the port numbers needed for unicast discovery are fixed as well. This allows listing peers as IP:PORT pairs, significantly reducing traffic, as explained in the preceding subsection.
The other non-fixed ports that are used are the per-domain participant ports, the third item in the list. These are used only because there exist some DDSI implementations that assume each domain participant advertises a unique port number as part of the discovery protocol, and hence that there is never any need for including an explicit destination participant id when intending to address a single domain participant by using its unicast locator. Eclipse Cyclone DDS never makes this assumption, instead opting to send a few bytes extra to ensure the contents of a message are all that is needed. With other implementations, you will need to check.
If all DDSI implementations in the network include full addressing information in the
messages like Eclipse Cyclone DDS does, then the per-domain participant ports serve no purpose
at all. The default false
setting of Compatibility/ManySocketsMode disables the
creation of these ports.
This setting can have a few other side benefits, as there may be multiple DCPS participants using the same unicast locator. This improves the chances of a single unicast sufficing even when addressing multiple participants.
Multicasting¶
Eclipse Cyclone DDS allows configuring to what extent multicast (the regular, any-source multicast as well as source-specific multicast) is to be used:
whether to use multicast for data communications,
whether to use multicast for participant discovery,
on which interfaces to listen for multicasts.
It is advised to allow multicasting to be used. However, if there are restrictions on
the use of multicasting, or if the network reliability is dramatically different for
multicast than for unicast, it may be attractive to disable multicast for normal
communications. In this case, setting General/AllowMulticast to false
will
force the use of unicast communications for everything.
If at all possible, it is strongly advised to leave multicast-based participant
discovery enabled, because that avoids having to specify a list of nodes to contact, and
it furthermore reduces the network load considerably. Setting
General/AllowMulticast to spdp
will allow participant discovery via multicast
while disabling multicast for everything else.
To disable incoming multicasts, or to control from which interfaces multicasts are to be accepted, one can use the General/MulticastRecvNetworkInterfaceAddresses setting. This allows listening on no interface, the preferred, all or a specific set of interfaces.
TCP Support¶
The DDSI protocol is a protocol designed to provide a connectionless transport with unreliable datagrams. However, there are times where TCP is the only practical network transport available (for example, across a WAN). This is the reason Eclipse Cyclone DDS can use TCP instead of UDP if needed.
The differences in the model of operation between DDSI and TCP are quite large: DDSI is based on the notion of peers, whereas TCP communication is based on the notion of a session that is initiated by a “client” and accepted by a “server”; therefore, TCP requires knowledge of the servers to connect to before the DDSI discovery protocol can exchange that information. The configuration of this is done in the same manner as for unicast-based UDP discovery.
TCP reliability is defined in terms of these sessions, but DDSI reliability is defined in terms of DDSI discovery and liveliness management. It is therefore possible that a TCP connection is (forcibly) closed while the remote endpoint is still considered alive. Following a reconnect, the samples lost when the TCP connection was closed can be recovered via the standard DDSI reliability. This also means that the Heartbeats and AckNacks still need to be sent over a TCP connection, and consequently that DDSI flow-control occurs on top of TCP flow-control.
Another point worth noting is that connection establishment potentially takes a long time, and that giving up on a transmission to a failed or no longer reachable host can also take a long time. These long delays can be visible at the application level at present.
TLS Support¶
The TCP mode can be used together with TLS to provide mutual authentication and encryption. When TLS is enabled, plain TCP connections are no longer accepted or initiated.
Raw Ethernet Support¶
As an additional option, on Linux, Eclipse Cyclone DDS can use a raw Ethernet network interface to communicate without a configured IP stack.
Discovery Configuration¶
Discovery Addresses¶
The DDSI discovery protocols, SPDP for the domain participants and SEDP for their endpoints, usually operate well without any explicit configuration. Indeed, the SEDP protocol never requires any configuration.
The SPDP protocol periodically sends, for each domain participant, an SPDP sample to a
set of addresses, which by default contains just the multicast address, which is
standardised for IPv4 (239.255.0.1
) but not for IPv6 (it uses
ff02::ffff:239.255.0.1
). The actual address can be overridden using the
Discovery/SPDPMulticastAddress
setting, which requires a valid multicast address.
In addition (or as an alternative) to the multicast-based discovery, any number of unicast addresses can be configured as addresses to be contacted by specifying peers in the Discovery/Peers section. Each time an SPDP message is sent, it is sent to all of these addresses.
The default behaviour is to include each IP address several times in the set (for participant indices 0 through Discovery/MaxAutoParticipantIndex, each time with a different UDP port number (corresponding to another participant index), allowing at least several applications to be present on these hosts.
Configuring several peers in this way causes a large burst of packets to be sent each time an SPDP message is sent out, and each local DDSI participant causes a burst of its own. Most participant indices will not be used, making this rather wasteful behaviour.
To avoid sending large numbers of packets to each host, differing only in port number, it is also possible to add a port number to the IP address, formatted as IP:PORT, but this requires manually calculating the port number. In practice it also requires fixing the participant index using Discovery/ParticipantIndex (see the description of “PI” in Controlling port numbers) to ensure that the configured port number indeed corresponds to the port number the remote DDSI implementation is listening on, and therefore is attractive only when it is known that there is but a single DDSI process on that node.
Asymmetrical Discovery¶
On reception of an SPDP packet, the addresses advertised in the packet are added to the set of addresses to which SPDP packets are sent periodically, allowing asymmetrical discovery. In an extreme example, if SPDP multicasting is disabled entirely, host A has the address of host B in its peer list, and host B has an empty peer list, B will eventually discover A because of an SPDP message sent by A, at which point it adds A’s address to its own set and starts sending its SPDP message to A, allowing A to discover B. This takes longer than normal multicast based discovery, though, and risks Writers being blocked by unresponsive Readers.
Timing of SPDP Packets¶
The interval with which the SPDP packets are transmitted is configurable, using the Discovery/SPDPInterval setting. A longer interval reduces the network load, but also increases the time discovery takes, especially in the face of temporary network disconnections.
Endpoint Discovery¶
Although the SEDP protocol never requires any configuration, network partitioning does interact with it. The “ignored partitions” option can be used to instruct Eclipse Cyclone DDS to completely ignore specific DCPS topics and partition combinations. Using the “ignored partitions” option prevents data for these topic/partition combinations from being forwarded to and from the network.
Combining Multiple Participants¶
If a single process creates multiple participants, these are mirrored in DDSI participants, so that a single process can appear like an extensive system with many participants. The Internal/SquashParticipants option can be used to simulate the existence of only one participant, which owns all endpoints on that node. This reduces the background messages because far fewer liveliness assertions need to be sent, but there are some downsides.
Firstly, the liveliness monitoring features related to domain participants will be affected if multiple DCPS domain participants are combined into a single DDSI domain participant. For the “automatic” liveliness setting, this is not an issue.
Secondly, this option makes it impossible for tooling to show the system topology.
Thirdly, the QoS of this sole participant is simply that of the first participant created in the process. In particular, no matter what other participants specify as their “user data”, it will not be visible on remote nodes.
An alternative that sits between squashing participants and normal operation,
is setting Internal/BuiltinEndpointSet to minimal
.
In the default setting, each DDSI participant handled has its Writers for
built-in topics and publishes discovery data on its entities, but when set to “minimal”,
only the first participant has these Writers and publishes data on all entities. This is not fully
compatible with other implementations as it means endpoint discovery data can be
received for a participant that has not yet been discovered.
Data Path Configuration¶
Re-Transmit Merging¶
A remote Reader can request re-transmission whenever it receives a Heartbeat and detects missing samples. If a sample was lost on the network for many or all Readers, the next heartbeat would likely trigger a “storm” of re-transmission requests. Thus, the Writer should attempt merging these requests into a multicast re-transmission to avoid re-transmitting the same sample over & over again to many different Readers. Similarly, while Readers should try to avoid requesting re-transmissions too often, in an interoperable system, the Writers should be robust against it.
In Eclipse Cyclone DDS, upon receiving a Heartbeat that indicates samples are missing, a Reader will schedule the second and following re-transmission requests to be sent after Internal/NackDelay or combine it with an already scheduled request if possible. Any samples received between receipt of the Heartbeat and the sending of the AckNack will not need to be re-transmitted.
Secondly, a Writer attempts to combine re-transmit requests in two different ways. The first is to change messages from unicast to multicast when another re-transmit request arrives while the re-transmit has not yet occurred. This is particularly effective when bandwidth limiting causes a backlog of samples to be re-transmitted. The second behaviour can be configured using the Internal/ReTransmitMerging setting. Based on this setting, a re-transmit request for a sample is either honoured unconditionally, or it may be suppressed (or “merged”) if it comes in shortly after a multicasted re-transmission of that very sample, on the assumption that the second reader will likely receive the re-transmit, too. The Internal/ReTransmitMergingPeriod controls the length of this time window.
Re-Transmit Backlogs¶
Another issue is that a Reader can request re-transmission of many samples at once. When the Writer queues all these samples for re-transmission, it may result in a considerable backlog of samples to be re-transmitted. As a result, the ones near the queue’s end may be delayed so much that the Reader issues another re-transmit request.
Therefore, Eclipse Cyclone DDS limits the number of samples queued for re-transmission and ignores (those parts of) re-transmission requests that would cause the re-transmit queue to contain too many samples or take too much time to process. Two settings govern the size of these queues, and the limits are applied per timed-event thread. The first is Internal/MaxQueuedRexmitMessages, which limits the number of re-transmit messages, the second Internal/MaxQueuedRexmitBytes which limits the number of bytes. The latter defaults to a setting based on the combination of the allowed transmit bandwidth and the Internal/NackDelay setting as an approximation of the likely time until the next potential re-transmit request from the Reader.
Controlling Fragmentation¶
Samples in DDS can be arbitrarily large, and will not always fit within a single datagram. DDSI has facilities to fragment samples so they can fit in UDP datagrams, and similarly IP has facilities to fragment UDP datagrams into network packets. The DDSI specification states that one must not unnecessarily fragment at the DDSI level, but Eclipse Cyclone DDS provides a fully configurable behaviour.
If the serialised form of a sample is at least General/FragmentSize, it will be fragmented using the DDSI fragmentation. All but the last fragment will be this exact size; the last one may be smaller.
Control messages, non-fragmented samples, and sample fragments are all subject to packing into datagrams before sending it out on the network, based on various attributes such as the destination address, to reduce the number of network packets. This packing allows datagram payloads of up to General/MaxMessageSize, overshooting this size if the set maximum is too small to contain what must be sent as a single unit. In that case it no longer matters where the data is rejected: there is a real problem anyway.
UDP/IP header sizes are not taken into account in the maximum message size.
The IP layer then takes this UDP datagram, possibly fragmenting it into multiple packets to stay within the maximum size the underlying network supports. A trade-off is that while DDSI fragments can be re-transmitted individually, the processing overhead of DDSI fragmentation is larger than that of UDP fragmentation.
Receive Processing¶
Receiving of data is split into multiple threads:
A single receive thread responsible for retrieving network packets and running the protocol state machine;
A delivery thread dedicated to processing DDSI built-in data: participant discovery, endpoint discovery, and liveliness assertions;
One or more delivery threads dedicated to the handling of application data: deserialisation and delivery to the DCPS data Reader caches.
The receive thread is responsible for retrieving all incoming network packets, running the protocol state machine, which involves scheduling of AckNack and Heartbeat messages and queueing of samples that must be retransmitted, and for defragmenting and ordering incoming samples.
Fragmented data first enters the defragmentation stage, which is per proxy Writer. The number of samples that can be defragmented simultaneously is limited for reliable data to Internal/DefragReliableMaxSamples and for unreliable data to Internal/DefragUnreliableMaxSamples.
Samples (defragmented if necessary) received out of sequence are buffered, primarily per proxy Writer, but, secondarily, per Reader catching up on historical (transient-local) data. The size of the first is limited to Internal/PrimaryReorderMaxSamples, the size of the second to Internal/SecondaryReorderMaxSamples.
In between the receive thread and the delivery threads sit queues, of which the maximum size is controlled by the Internal/DeliveryQueueMaxSamples setting. Generally there is no need for these queues to be very large (unless one has very small samples in very large messages), their primary function is to smooth out the processing when batches of samples become available at once, for example following a retransmission.
When any of these receive buffers hit their size limit, and it concerns application data, the receive thread will wait for the queue to shrink (a compromise that is the lesser evil within the constraints of various other choices). However, discovery data will never block the receive thread.
Minimising Receive Latency¶
In low-latency environments, a few microseconds can be gained by processing the
application data directly in the receive thread, or synchronously with respect to the
incoming network traffic, instead of queueing it for asynchronous processing by a
delivery thread. This happens for data transmitted with the max_latency QoS setting at
most a configurable value and the transport_priority QoS setting at least a
configurable value. By default, these values are inf
and the maximum transport
priority, effectively enabling synchronous delivery for all data.
Maximum Sample Size¶
Eclipse Cyclone DDS provides a setting, Internal/MaxSampleSize, to control the maximum size of samples the service is willing to process. The size is the size of the (CDR) serialised payload, and the limit holds both for built-in data and for application data. The (CDR) serialised payload is never larger than the in-memory representation of the data.
On the transmitting side, samples larger than Internal/MaxSampleSize are dropped with a warning. Eclipse Cyclone DDS behaves as if the sample never existed.
Similarly, on the receiving side, samples larger than Internal/MaxSampleSize are dropped as early as possible, immediately following the reception of a sample or fragment of one, to prevent any resources from being claimed for longer than strictly necessary. Where the transmitting side completely ignores the sample, the receiving side pretends the sample has been correctly received acknowledges reception to the Writer. This allows communication to continue.
When the receiving side drops a sample, Readers will receive a sample lost notification with the next delivered sample. This notification is easily missed, so ultimately the only reliable way of determining whether samples have been dropped or not is checking the logs.
While dropping samples (or fragments thereof) as early as possible is beneficial from the point of view of reducing resource usage, it can make it hard to decide whether or not dropping a particular sample has been recorded in the log already. Under normal operational circumstances, only a single message will be recorded for each sample dropped, but it may occasionally report multiple events for the same sample.
Finally, it is technically permitted to set Internal/MaxSampleSize to very small sizes, even to the point that the discovery data can no longer be communicated. The dropping of the discovery data will be reported as normal, but the utility of such a configuration seems doubtful.
Network Partition Configuration¶
Network Partition Configuration Overview¶
Network partitions introduce alternative multicast addresses for data. In the DDSI discovery protocol, a Reader can override the default address at which it is reachable, and this feature of the discovery protocol is used for advertising alternative multicast addresses. The DDSI Writers in the network will (also) multicast to such an alternative multicast address when multicasting samples or control data.
The mapping of a DCPS data Reader to a network partition is indirect: first, the DCPS partitions and topic are matched against a table of partition mappings, partition/topic combinations to obtain the name of a network partition, then the network partition name is used to find addressing information. This makes it easier to map many different partition/topic combinations to the same multicast address without specifying the actual multicast address many times over.
In the case no match is found, the default multicast address is used.
Matching Rules¶
The matching of a DCPS partition/topic combination proceeds in the order in which the
partition mappings are specified in the configuration. The first matching mapping is
the one that will be used. The *
and ?
wildcards are available for the DCPS
partition/topic combination in the partition mapping.
As mentioned earlier, Eclipse Cyclone DDS can be instructed to ignore all DCPS data Readers and Writers for certain DCPS partition/topic combinations through the use of Partitioning/IgnoredPartitions. The ignored partitions use the same matching rules as normal mappings, and take precedence over the normal mappings.
Multiple Matching Mappings¶
A single DCPS data Reader can be associated with a set of partitions, and each partition/topic combination can potentially map to different network partitions. In this case, the first matching network partition will be used. This does not affect what data the Reader will receive; it only affects the addressing on the network.
Thread Configuration¶
Eclipse Cyclone DDS creates several threads, each with a number of properties that can be controlled individually. These properties are:
stack size,
scheduling class, and
scheduling priority.
Each thread is uniquely named and using that name with the Threads/Thread[@name] option can be used to set the properties for that thread. Any subset of threads can be given special properties; anything not specified explicitly is left at the default value.
The following threads exist:
gc
Garbage collector, which sleeps until garbage collection is requested for an entity, at which point it starts monitoring the state of Eclipse Cyclone DDS, pushing the entity through whatever state transitions are needed once it is safe to do so, ending with the freeing of the memory.
recv
Accepts incoming network packets from all sockets/ports, performs all protocol processing, queues (nearly) all protocol messages sent in response for handling by the timed-event thread, queues for delivery or, in special cases, delivers it directly to the data Readers.
dq.builtins
Processes all discovery data coming in from the network.
lease
Performs internal liveliness monitoring of Eclipse Cyclone DDS.
tev
Timed-event handling, used for all kinds of things, such as periodic transmission of participant discovery and liveliness messages, transmission of control messages for reliable Writers and Readers (except those that have their own timed-event thread), retransmitting of reliable data on request (except those that have their own timed-event thread), and handling of start-up mode to normal mode transition.
and, for each defined channel:
dq.channel-name
Deserialisation and asynchronous delivery of all user data.
tev.channel-name
Channel-specific “timed-event” handling transmission of control messages for reliable Writers and Readers and retransmission of data on request. Channel-specific threads exist only if the configuration includes an element for it or if an auxiliary bandwidth limit is set for the channel.
When no channels are explicitly defined, there is one channel named user.
Reporting and Tracing¶
Eclipse Cyclone DDS can produce highly detailed traces of all traffic and internal activities. It enables individual categories of information and a simple verbosity level that enables fixed sets of categories.
The categorisation of tracing output is incomplete, hence most of the verbosity levels and categories are useless in the current release. This is an ongoing process and here we describe the target rather than the current situation.
All fatal and error messages are written both to the trace and to the
cyclonedds-error.log
file; similarly all “warning” messages are written to the trace
and the cyclonedds-info.log
file.
The Tracing element has the following sub elements:
Verbosity: selects a tracing level by enabling a pre-defined set of categories. The list below gives the known tracing levels, and the categories they enable:
none
severe
error
,fatal
warning
,info
severe
,warning
config
info
,config
fine
config
,discovery
finer
fine
,traffic
,timing
,info
finest
fine
,trace
EnableCategory: a comma-separated list of keywords, each keyword enabling individual categories. The following keywords are recognised:
fatal
All fatal errors, errors causing immediate termination.
error
Failures probably impacting correctness but not necessarily causing immediate termination.
warning
Abnormal situations that will likely not impact correctness.
config
Full dump of the configuration.
info
General informational notices.
discovery
All discovery activity.
data
Include data content of samples in traces.
timing
Periodic reporting of CPU loads per thread.
traffic
Periodic reporting of total outgoing data.
tcp
Connection and connection cache management for the TCP support.
throttle
Throttling events where the Writer stalls because its WHC hit the high-water mark.
topic
Detailed information on topic interpretation (in particular topic keys).
plist
Dumping of parameter lists encountered in discovery and inline QoS.
radmin
Receive buffer administration.
whc
Very detailed tracing of WHC content management.
In addition, the keyword trace enables everything from fatal to throttle. The topic and plist ones are useful only for particular classes of discovery failures; and radmin and whc only help in analysing the detailed behaviour of those two components and produce significant amounts of output.
OutputFile: the file to write the trace to
AppendToFile: boolean, set to
true
to append to the trace instead of replacing the file.
Currently, the useful verbosity settings are config, fine, and finest.
Config writes the complete configuration to the trace file and any warnings or errors, which can be an effective method to verify that everything is configured and behaving as expected.
Fine adds the complete discovery information in the trace, but nothing related to application data or protocol activities. If a system has a stable topology, this will therefore typically result in a moderate size trace.
Finest provides a detailed trace of everything that occurs and is an indispensable source of information when analysing problems; however, it also requires a significant amount of time and results in huge log files.
Whether these logging levels are set using the verbosity level or by enabling the corresponding categories is immaterial.
Compatibility and Conformance¶
Conformance Modes¶
Eclipse Cyclone DDS operates in one of three modes: pedantic, strict, and lax; the mode is configured using the Compatibility/StandardsConformance setting. The default is lax.
The first, pedantic mode, is of such limited utility that it will be removed.
The second mode, strict, attempts to follow the intent of the specification while staying close to the letter of it. Recent developments at the OMG have resolved these issues, and this mode is no longer of any value.
The default mode, lax, attempts to work around (most of) the deviations of other implementations, and generally provides good interoperability without any further settings. In lax mode, the Eclipse Cyclone DDS not only accepts some invalid messages, but will even transmit them. The consequences for interoperability of not doing this are too severe.
Note
If one configures two Eclipse Cyclone DDS processes with different compliancy modes, the one in the stricter mode will complain about messages sent by the one in the less strict mode.
RTI Compatibility Issues¶
In lax mode, most topic types should not have significant issues when working across a network. Previously within a single host, there was an issue with how RTI DDS uses or attempts to use its shared memory transport to communicate with peers, even when they advertise only UDP/IP addresses. The result is an inability to establish bidirectional communication between the two reliably.
Disposing of data may also cause problems, as RTI DDS leaves out the serialised key value and expects the Reader to rely on an embedded hash of the key value. In the strict modes, Cyclone DDS requires a valid key value to be supplied; in the relaxed mode, it is willing to accept key hash, provided it is of a form that contains the key values in an unmangled form.
If an RTI DDS DataWriter disposes of an instance with a key of which the serialised representation may be larger than 16 bytes, this problem is likely to occur. In practice, the most likely cause is using a key as a string, either unbounded, or with a maximum length larger than 11 bytes. See the DDSI specification for details.
In strict mode, there is interoperation with RTI DDS, but at the cost of incredibly high CPU and network load, caused by a Heartbeats and AckNacks going back-and-forth between a reliable RTI DDS DataWriter and a reliable Cyclone DDS data Reader. The problem is that once Cyclone DDS informs the RTI Writer that it has received all data (using a valid AckNack message), the RTI Writer immediately publishes a message listing the range of available sequence numbers and requesting an acknowledgment, which becomes an endless loop.
Furthermore, there is a difference in interpretation of the meaning of the “autodispose_unregistered_instances” QoS on the Writer. Cyclone DDS aligns with OpenSplice.