If a machine has multiple network interfaces, how does one configure Shoal GMS to only use one network interface?By default, Shoal GMS uses all available working network interfaces on a machine. Thus, to limit Shoal GMS Here is a shell script that demonstrates setting the cluster element gms-bind-interface-address property using asadmin command line. #!/bin/sh -x ASADMIN=${AS_HOME}/bin/asadmin DAS=devtest-cluster-domain DAS_GMS_BIND_ADDRESS=129.148.71.176 CLUSTER=devtest-cluster DASCONFIG=server-config INSTANCE1=instance1 INSTANCE1_ADDRESS=129.148.71.176 INSTANCE2=instance2 INSTANCE3=instance3 PORT="--port 4845" USER="--user admin" ${ASADMIN} start-domain ${DAS} ${ASADMIN} set ${PORT} ${USER} ${CLUSTER}.property.gms-bind-interface-address=\${GMS_${CLUSTER}_BIND_ADDRESS} ${ASADMIN} set ${PORT} ${USER} ${DASCONFIG}.system-property.GMS_${CLUSTER}_BIND_ADDRESS=\${DAS_GMS_BIND_ADDRESS} ${ASADMIN} set ${PORT} ${USER} ${INSTANCE1}.system-property.GMS_${CLUSTER}_BIND_ADDRESS=\${INSTANCE1_ADDRESS} ${ASADMIN} set ${PORT} ${USER} ${INSTANCE2}.system-property.GMS_${CLUSTER}_BIND_ADDRESS=\${INSTANCE1_ADDRESS} ${ASADMIN} set ${PORT} ${USER} ${INSTANCE3}.system-property.GMS_${CLUSTER}_BIND_ADDRESS=\${INSTANCE1_ADDRESS} # following is needed for sailfin communication application server if "default-cluster" is not deleted. # ${ASADMIN} set ${PORT} ${USER} default-cluster.property.gms-bind-interface-address=\${GMS_${CLUSTER}_BIND_ADDRESS} ${ASADMIN} stop-domain ${DAS} Here is what it will look like in domain.xml. In domain.xml of DAS for a cluster named "devtest-cluster", one sets the following. <cluster config-ref="devtest-cluster-config" heartbeat-address="228.8.20.94" heartbeat-enabled="true" heartbeat-port="17227" name="devtest-cluster"> <!-- ... deleted unrelated info ...--> <property name="gms-bind-interface-address" value="${GMS_DEVTEST-CLUSTER_BIND_INTERFACE_ADDR}"/> </cluster> Each server and possibly the DAS can choose to explicitly set the bind interface For example to set it in DAS and instance1 of devtest-cluster, one add the <server config-ref="server-config" lb-weight="100" name="server"> <!-- deleted non-essential info to this issue --> <system-property name="GMS_DEVTEST-CLUSTER_BIND_INTERFACE_ADDR" value="129.148.71.168"/> </server> <server config-ref="devtest-cluster-config" lb-weight="100" name="instance1" node-agent-ref="devtest-agent"> <!-- deleted non-essential info to this issue --> <system-property name="GMS_DEVTEST-CLUSTER_BIND_INTERFACE_ADDR" value="129.148.71.169"/> </server> If the system property is not set, then group management service will perform After the above steps have been taken, all Glassfish processes (domain server(DAS), NodeAgent, clustered instances) should be stopped In order to ensure that the DAS never uses default binding, all cluster defined in DAS domain.xml must have gms-bind-interface-address property set. Configuring GMS Failure Detection in Application Server__To get an intro to GMS Failure Detection and other configuration settings, please see the following Faq entry: To decrease the time it takes for GMS to detect hardware/network failure of a server instance within a cluster, The default value is currently 10 seconds. The total time that GMS takes to detect a server instance has failed due to a Below the configuration shows the value being set to 3000 ms or 3 seconds. Configuration changes in domain.xmlIn domain.xml, this is achieved by adding property failure-detection-tcp-retransmit-timeout to group-management-service of <config dynamic-reconfiguration-enabled="true" name="devtest-cluster-config"> ... <group-management-service fd-protocol-max-tries="3" fd-protocol-timeout-in-millis="2000" merge-protocol-max-interval-in-millis="10000" merge-protocol-min-interval-in-millis="5000" ping-protocol-timeout-in-millis="5000" vs-protocol-timeout-in-millis="1500"> <!-- property below configures gms so when it attempts to connect to a suspected failed server instance, -- the tcp socket creation timeout should be set to 3 seconds. This value is probably too small but was necessary -- to achieve goal of detecting hw failure within 15 seconds. Default value of 10 seconds detects hw failure in 28 seconds. --> <property name="failure-detection-tcp-retransmit-timeout" value="3000"/> </group-management-service> It is also necessary to change this value for domain admin server since it is the GMS Master Node. GMS Failure Detection StatesGMS failure detection algorithm using group-management-service configuration parameters from domain.xml.
GMS attempts a timed TCP operation with the failed machine with a timeout of failure-detection-tcp-retransmit-timeout ms. If this step times out, proceed to Failure Validation.
|