Problem Statement Problem A: How do you test your multi machine environment to see if multicast traffic is supported? Problem B: You have just installed GlassFish on multiple machines, created a cluster of instances spanning these machines and started the cluster. Later you stop one of the instances. You see that although the admin GUI shows status of the instances, the logs show that GMS view change on the DAS and a few instances of the cluster show all instances except one or more other cluster instances. Why are those instances missing from the view? Problem C : You have in-memory replication enabled but you see that sessions are not getting replicated or more particularly sessions are not being found (barring requests that were in-flight at the time of failure) when an instance is killed or stopped. Explanation The Shoal GMS runtime clustering library requires that the machines involved in the cluster are on the same sub network and multicast traffic should be enabled on this network. When multicast is not enabled or some of the machines involved are not on the same subnetwork, the GMS view change log statements would not include the instances on those machines that were not on the same subnet. Looking in the logs of the instances that are isolated one can confirm this by seeing that the GMS view change log statement would only consist of one or a few of the isolated instances. The Admin GUI codebase relies on a different code path to report instance statuses so one cannot make a determination of this problem at the outset. Solution Before getting into the above situation, when you know you are going to use multiple machines and have GlassFish clustering, perform a basic multicast test to see if multicast support exists for all machines involved. Here's a basic multicast test you can do without involvement of GlassFish or Shoal GMS code : The Shoal GMS test code contains a basic DatagramSocket based test that uses JDK APIs to send and receive messages using multicast without involved Shoal or Jxta codepaths. Check out Shoal sources on two or more machines and do a build (takes a few seconds) following instructions posted <b>here</b> Once you have the Shoal sources built, open one terminal on each machine and change dir to shoal/gms. On one of the terminals, run the ant target ant test-mcastsender This runs the MulticastSender sending messages to the group On the other machines, in the terminal run ant test-mcastsniffer This runs the MulticastSniffer You should see 9 messages on the sniffer to confirm multicast works properly. If you don't see these messages on the sniffer side, then it means multicast traffic is not enabled. You need to contact your network admin to enable multicast traffic within your subnet. Sometimes, it is quite possible for machines to be on the same subnet but the machines may be connected to different switches under the same router. In such cases, if the switches are not enabled to pass multicast traffic to other switches in the sub network, you can experience the same issues as mentioned in Problem B. To ensure all machines are multicast supported, try running the sender test eventually on each of the machines while running the sniffer test on remaining machines. This will ensure that the machines involved can support both sending and receiving of multicast messages. |