For features such as dynamic discovery (used to avoid configuring network addresses for point-to-point connections), Ubik uses a highly configurable group communication framework. Ubik's distributed JNDI implementation makes use of that framework.
Ubik's group communication logic is built upon the org.sapia.ubik.mcast.EventChannel class, which implements group communication on top of two abstractions:
- org.sapia.ubik.mcast.BroadcastDispatcher: handles the broadcast of org.sapia.ubik.mcast.RemoteEvents to all other EventChannel nodes in a domain.
- org.sapia.ubik.mcast.UnicastDispatcher: handles the dispatching of RemoteEvents in a point-to-point manner - from one EventChannel to another.
Internally, an EventChannel uses its BroadcastDispatcher at startup to broadcast its presence to other EventChannel nodes. In turn, these other nodes will publish themselves to their new peer, this time using the UnicastDispatcher.
The BroadcastDispatcher is also used at shutdown: an EventChannel will then notify the other nodes in the domain that it is shutting, and thus disappearing from the network - this allows other nodes to immediately synchronize their view of the domain.
By default, the implementations of the dispatcher abstractions that are used by an EventChannel are org.sapia.ubik.mcast.udp.UDPBroadcastDispatcher and org.sapia.ubik.mcast.tcp.mina.MinaTcpUnicastDispatcher respectively. The former uses IP multicast to broadcast RemoteEvents. It is by default configured to listen to multicast address and port 18.104.22.168 and 5454, respectively. This can be configured through the following JVM properties:
- ubik.rmi.naming.mcast.address: used to specify the multicast address.
- ubik.rmi.naming.mcast.port: used to specifiy the multicast port.
The default dispatcher implementations can be configured otherwise - see further below for the details.
The Control Protocol at a Glance
Among all the event channel nodes that discover each other (and are thus part of the same domain), one is elected master - as the others are assigned the slave status. It is the master that periodically probes the slaves for their status - through "heartbeat requests" (which by default are sent every minute by the master).
The control protocol was designed to be scalable across a large number of nodes, and to minimize chattiness. The protocol does not rely on broadcast - since when using IP multicast, multiple domains can use the same multicast address, resulting in control messages crossing domains.
Rather then, the protocol is implemented over unicast, but using a cascading scheme where slave nodes are put to contribution: a first set of nodes is determined, where each relays the control messages to other nodes, and so on, until no nodes are left to contact. The number of nodes into which to split, at every stage, is determined by the ubik.rmi.naming.mcast.control.split.size JVM property - which defaults to 5.
For example, let's say there are 101 nodes, one of them being the master - it is always the master that initiates a control sequence as part of the protocol. Given a split size of 5, it means that the 100 nodes are divided in 5 sets of 20 nodes. The master node sends a copy of the control message to the first node in each group - each copy of the message holding the IDs of the 20 nodes of its corresponding set. At the next stage, the node receiving the control message removes its own ID from the set, and splits again. There are thus 20-1 nodes (19) to divide by 5, yielding 4 sets of 4 nodes to contact, and 1 group of 3. Since 4 and 3 cannot be further divided by 5, this will be the last time such a split occurs, meaning that each of these nodes will dispatch the control messages to 4 or 3 other nodes - accordingly. If you do the calculations you'll see that it takes 4 stages in the cascade to dispatch a message across a cluster of 1000 event channel nodes, given a split size of 5. Of course the larger that split size, the less dispatch stages you will incur. Let's just try with a split size of 10, with the 101 nodes of our cluster: the first split will yield 10 sets of 10 nodes. When the control message reaches the first node of each set, there are 9 nodes remaining to contact at each level. Therefore, given our split size, no further splits are done, and each of the nodes at that stage will relay the control message to the 9 remaining nodes it has in its set.
By default, since the size of control messages may be larger (given their payload in terms of having to carry node IDs around), the org.sapia.ubik.mcast.tcp.mina.MinaTcpUnicastDispatcher is used: it implements point-to-point communication over TCP, using a NIO-based transport. You can change this to using UDP, although we really do not recommend it. See the UDP Unicast section for details.
Broadcast using Avis
In environments where IP multicast is not supported, an alternate broadcast mechanism can be used: it relies on the Avis group communication framework. In this setup, an Avis "router" provides the broadcast fabric. It acts as a message queue (sort of), and all event channels become clients of it.
You can federate multiple such routers for more reliability - without overblowing it, otherwise you will create yourself a configuration nightmare. Two instances should be sufficient for most cases, since broadcast is sparingly used.
To use Avis broadcast implementation, you have to configure the following JVM properties:
- ubik.rmi.naming.broadcast.provider should have its value set to ubik.rmi.naming.broadcast.avis: this indicates that the org.sapia.ubik.mcast.avis.AvisBroadcastDispatcher should be used for broadcasting.
- ubik.rmi.naming.broadcast.avis.url should be set to the URL to use to connect to the Avis router - For example: elvin://localhost:2917.
The programmatic version would be:
import org.sapia.ubik.mcast.EventChannel; import org.sapia.ubik.rmi.Consts; import org.sapia.ubik.util.Props; Properties properties = new Properties(); properties.setProperty(Consts.BROADCAST_PROVIDER, Consts.BROADCAST_PROVIDER_AVIS); properties.setProperty(Consts.BROADCAST_AVIS_URL, "elvin://localhost:2917"); EventChannel channel = new EventChannel("myDomain", new Props().addProperties(properties));
The format of the URL is entirely Avis' - see its documentation for more info and for instructions detailing how to install the router.
You can use UDP for unicast group communication, but we do not recommend it. You have to be aware that a default packet size of 3.072 kb is set by default (configurable through the ubik.rmi.naming.mcast.bufsize JVM property). That default size may prove unsufficient, depending on the size of messages that are exchanged, and you may risk having some messages not going through.
To activate the org.sapia.ubik.mcast.udp.UDPUnicastDispatcher, you have to set the ubik.rmi.naming.unicast.provider to ubik.rmi.naming.unicast.udp
Programmatically, you would do it as follows:
import org.sapia.ubik.mcast.EventChannel; import org.sapia.ubik.rmi.Consts; import org.sapia.ubik.util.Props; Properties properties = new Properties(); properties.setProperty(Consts.UNICAST_PROVIDER, Consts.UNICAST_PROVIDER_UDP); EventChannel channel = new EventChannel("myDomain", new Props().addProperties(properties));