GlassFish User-Managed Clusters Design Spec

This is the design specification for the user-managed clusters feature for GlassFish.

Authors

Note: This is the second major revision of this design.  See this page for the first revision. 



Introduction

In the GlassFish 3.1 release, clusters and clustered instances are managed by a domain administration server (DAS). The DAS is required to create, list, start, stop, and delete a cluster and the instances that are in a cluster. Configuration information is synchronized with members of the cluster by the DAS. The user-managed cluster feature provides the ability to create a cluster of instances without using a separate DAS to manage them. Responsibility for managing the cluster and its instances rests with the user, either by manually updating the configuration of each instance or by providing an external software system that does this management. Since the user provides the management for the cluster, the feature is called user-managed clusters.

The purpose of this document is to describe the design for the feature. This includes the following sections:

  • General information about how the feature works
  • Analysis of the use of the isDas method within the server and how these calls need to be changed
  • Changes to asadmin subcommands changes to existing command for managing instances, and new commands
  • Various issues with containers including handling of the application id for the EJB container, timer management, and transaction management
  • Changes needed for JMS and Message Queue integration with clustering

Additional high level information about the requirements and design, such as packaging, i18n/l10n impact, etc. is available in the one-pager/project page for this feature. That information is not duplicated in this document.

Requirements

The requirements for this feature are specified on the one-pager/project page for this feature.

General Design

The basic idea for the feature is to provide the ability for a DAS to join a cluster as a core member of the cluster. Each member of the cluster is its own domain, i.e., it has an entry in the "domains" directory and would be started with the start-domain command. To simplify the implementation, the domain.xml does not have a <cluster> element for the cluster containing the DAS.  Rather, information is added to other element such as <server> and <group-management-service> to provide the information that is needed to allow the DAS to join a cluster. The DAS would join the GMS group as a core member rather than an observer.

Domains, Clusters, and Instances

When using the user-managed cluster feature, the DAS for a domain can be a member of exactly one cluster.  The DAS can still manage other clusters for which it is not a member, but the DAS itself can only be a member of one cluster.  In 3.1, the DAS could not be a member of a cluster. So this feature changes that. 

To use the user-managed cluster feature, a user creates the DAS using the create-domain command.  To make the DAS a member of a cluster, a cluster-member-name and cluster-name properties are set on the server. 

asadmin create-domain domain1
asadmin start-domain
asadmin set servers.server.server.property.cluster-member-name=domain1-instance1
asadmin set servers.server.server.property.cluster-name=domain1-cluster

// GMS configuration commands follow

The RuntimeType of an instance in a user-managed cluster will be DAS.  This means that all places that call isDas or isInstance to determine behavior that effects cluster membership will need to be changed so that there is instead a check to see if the instance is a member of a cluster. A new Server.isClusteredDas method is available to make this check (see details in the isDas section below).  

As with any other DAS, the files for a member of a user-managed cluster will reside within a "domains" directory. 

Since all members of the cluster should have same domain name, support for multiple instances in a user-managed cluster requires the use of multiple domains directories.  This is specified on the create-domain command using the --domaindir option.

This feature does not guarantee that the configuration for instances in a cluster stays in sync with each other.  It is the responsibility of the user or software provided by the user to do this.  When GlassFish instances communicate with one another, the software must handle potential errors caused by having configuration data, including the list of deployed applications, that is not in sync.

Static vs. Dynamic Cluster Configuration 

There are some modules in GlassFish that depend on knowing information about what instances are in a cluster.  There are two ways of getting this information.  The Cluster config bean contains static cluster configuration information, i.e., the information that is in the <cluster> element of the domain.xml.  This includes instances that may be up or down. The GMS subsystem provides dynamic cluster configuration information, specifically the list of members that are up, or that have been up since this instance was started. All places that need a list of cluster members will be made to use the dynamic cluster configuration information.   

The following classes, which call getCluster()/getClusterForInstance(..) depend on the static information from the <cluster> element and may need modification to get the information from the dynamic cluster configuration or new properties of the Server config bean.   

EJB Container

  • MigrateTimers.java

GMS

  • GMSAdapterService.java

JMS

  • ChangeMasterBrokerCommand.java
  • ActiveJmsResourceAdapter.java
  • JmsRaUtil.java
  • MQAddressList.java

Load Balancer

  • ConfigureLBWeightCommand.java
  • LBCommandsBase.java

Configuration changes for isDas() calls

There are numerous calls to isDas() throughout the code, but most of these do not need to be modified for a user-managed instance/clustered DAS. In the places, where modifications are needed, new duck-typed methods on the Server config bean are required.

To identify if an instance is a clustered DAS, a new duck-typed method, isClusteredDas(), on the Server config bean is required.  A new Server.getClusterMemberName method is used to obtain the cluster member name for an instance in a user-managed cluster.

public static boolean isClusteredDas(Server server) {   <----- CHANGE REQUIRED
            boolean isClusteredDas = false;
            if (isDas(server)) {

                isClusteredDas = server.getProperty("cluster-member-name") == null ? false : true;
            }
            return isClusteredDas;
        }

        public static String getClusterMemberName(Server server) {   <----- CHANGE REQUIRED
            return server.getPropertyValue("cluster-member-name", server.getName());
        }

The remainder of this section analyzes the calls to the isDas method throughout the server.  No changes are needed from any of these areas except for those indicated with "CHANGE REQUIRED".

Admin

CommandRunnerImpl uses env.isDas(), in addition to whether the number of servers is greater than 1 or the cluster size is greater than 0, to ask 'Does this server require replication?'. There is only 1 server in a user-managed cluster, and the cluster element will not exist, so no change is required.

Application

ApplicationConfigListener uses server.isDas() to help determine if the current instance matches the target. In a user-managed cluster, the application is deployed directly to the instance so the ApplicationRef will have Server as its parent. No change should be needed.

private boolean isCurrentInstanceMatchingTarget(Object parent) {
        // DAS receive all the events, so we need to figure out
        // whether we should take action on DAS depending on the event
        if (parent instanceof ApplicationRef) {
            Object grandparent = ((ApplicationRef)parent).getParent();
            if (grandparent instanceof Server) {
                Server gpServer = (Server)grandparent;
                if ( ! server.getName().equals(gpServer.getName())) {
                    return false;
                }
            } else if (grandparent instanceof Cluster) {
                if (server.isDas()) {
                    return false;
                }
            }
        }
        return true;
    }


ApplicationLifecycle.getVirtualServers(String target) returns the virtual server, "server" if the target is "domain" and we are running in DAS.
If a clustered DAS is always named "server", then no change required.

private String getVirtualServers(String target) {
        if (env.isDas() && DeploymentUtils.isDomainTarget(target)) {
            target = "server";
        }


ApplicationLifecycle uses env.isDas() to load system applications on DAS. Currently, the following system applications are available under lib\install\applications: __admingui, __cp_jdbc_ra, __dm_jdbc_ra, __ds_jdbc_ra, __xa_jdbc_ra, jaxr-ra, jmsra, metro, ejb-timer-service-app.war, mejb.jar. Only the __admingui is registered in the domain.xml. If loading system applications on a user-managed instance is ok, then no change is required.

If a system application needs to be loaded on start-up, ApplicationLoaderService will load the system app if the server is DAS or the system app is enabled. If loading a system app on a user-managed is ok, then no change is required.

If a system application, stand-alone resource adapter, or application is enabled, then the ApplicationLoaderService will load the app on the instance and also (partially) load on DAS so the application information is available on DAS. The user-managed instance should have the app loaded when the app is enabled, so no change should be required.

Admin Console

AdminConsoleAdapter uses !env.isDas() to return if it's not running on DAS. Admin Console would be able to run on a clustered DAS. If we want to prevent Admin Console from running on a clustered DAS, a change may be required here.

Config API

Server.isDas() - If the server name remains as "server", then no change is required.

ConfigRefValidator.isValid(..) - Currently, cannot change config-ref of DAS from "server-config". If the server name remains as "server", then no change is required.

ResourceUtil.getTargetsReferringResourceRef(String refName) - If the server isDas(), then SystemPropertyConstants.DAS_SERVER_NAME is added to the target list. SystemPropertyConstants.DAS_SERVER_NAME/DEFAULT_SERVER_INSTANCE_NAME and DAS_SERVER_CONFIG, which equal "server" and "server-config", do not need to change if the server name remains as "server".

Cluster

Some commands (CreateInstanceCommand, ListInstancesCommand, RestartInstanceCommand, StartClusterCommand, StopInstanceCommand) use !env.isDas() to allow the command to run only on DAS. StartInstanceCommand and StopClusterCommand also use env.isDas() to only run on DAS. These commands are not relevant for a clustered DAS. If the commands do not need to be prevented from running on a clustered DAS, no change is required.

Deployment

The EnableCommand\DisableCommand on DAS will, if the target is a clustered instance, replicate the command to all instances in the cluster so they can update their configs. For a clustered DAS, the ClusterOperationUtil.replicateCommand, skips this replication for DAS, so no change is required.

EJB Container

EjbContainerUtilImpl uses !isDas() to set _doDBReadBeforeTimeout = true. In this case, !isDas() is asking 'Is _doDBReadBeforeTimeout default true?' On a clustered instance, the default is true, so a user-managed instance default should also be true. A change is required. Here the isDas() method on EjbContainerUtilImpl is modified to return false for a clustered DAS.

if (!isDas()) {
            // On a clustered instance default is true
            _doDBReadBeforeTimeout = true;
        }

    public boolean isDas() {
        return (env.isDas() && !server.isClusteredDas()) || env.isEmbedded();   <----- CHANGE REQUIRED
    }


If EJBTimerService.createSchedules(..) is called on a deploy in a clustered deployment, only persistent schedule based timers will be created. And no timers will be scheduled. If it is called from deploy on a non-clustered instance, both persistent and non-persistent timers will be created. Otherwise only non-persistent timers are created by this method. The method createSchedules(containerId, applicationId, schedules, result, ownerIdOfThisServer_, true, (deploy && isDas)) is called by recoverAndCreateSchedules(..). A clustered deployment is defined by 'deploy && isDas'. This will be true also for deploy on a clustered DAS. No change is required. For the timer deployment to a cluster of DAS's, EJB container would need to distinguish between the 1st deployment to a DAS instance and the subsequent deployment (and the last undeploy), not whether it is a DAS or an instance. The deploy/undeploy command will change to pass in those flags.

DistributedEJBTimerServiceImpl uses !isDas() to ask 'Is the server required to 1) register for Planned Shutdown event, 2) set DB read before timeout to true, 3) register for transaction recovery events?'.
A user-managed instance may require the above steps as well. A change is required to allow the same behavior on a user-managed instance.

public void postConstruct() {
        if (!ejbContainerUtil.isDas() || server.isClusteredDas()) {   <----- CHANGE REQUIRED

            if (gmsAdapterService != null) {
                GMSAdapter gmsAdapter = gmsAdapterService.getGMSAdapter();
                if (gmsAdapter != null) {
                    // We only register interest in the Planned Shutdown event here.
                    // Because of the dependency between transaction recovery and
                    // timer migration, the timer migration operation during an
                    // unexpected failure is initiated by the transaction recovery
                    // subsystem.
                    gmsAdapter.registerPlannedShutdownListener(this);
                }
            }
            // Do DB read before timeout in a cluster
            setPerformDBReadBeforeTimeout(true);

            // Register for transaction recovery events
            recoveryResourceRegistry.addEventListener(this);
        }
    }


ReadOnlyBeanMessageCallBack uses !ejbContainerUtil.isDas() to ask 'Does this server require to 1) register as GMS adapter Message Listener 2) set as DistributedReadOnlyBeanNotifier ?'.
A user managed-instance is a clustered instance, and may also require these actions. A change is required to allow the same behavior on a user-managed instance.

public void postConstruct() {
        if (!ejbContainerUtil.isDas() || server.isClusteredDas()) {   <----- CHANGE REQUIRED
            if (gmsAdapterService != null) {
                GMSAdapter gmsAdapter = gmsAdapterService.getGMSAdapter();
                if (gmsAdapter != null) {
                    gms = gmsAdapter.getModule();
                    gmsAdapter.registerMessageListener(GMS_READ_ONLY_COMPONENT_NAME, this);
                    _readOnlyBeanService.setDistributedReadOnlyBeanNotifier(this);
                }
            }
        }
    }


GMS

GMSAdapterImpl uses isDas to determine the member type as spectator for DAS or core for non-DAS. A change is required to set the clustered DAS as a core member, not a spectator. (Technical Requirement High Availability 3.6 P1 GMS clusters Support all methods that GMS provides for forming the cluster, i.e., multicast, non-multicast, etc.). The HealthHistory constructor currently takes a Cluster object to iterate over the instances. We'll add another constructor for the clustered DAS case so that the instance name is passed in instead and move the creation of the concurrent map out of the constructors. The current isDas() call can remain as-is.

private void readGMSConfigProps(Properties configProps) {
        configProps.put(MEMBERTYPE_STRING, (isDas && !server.isClusteredDas()) ? SPECTATOR : CORE); <--- CHANGE REQUIRED


GMSAdapterImpl uses isDAS to determine whether it is a bootstrapping node, where DAS is a bootstrapping node. A bootstrapping node refers to a node that was used to bootstrap finding the cluster when multicast is not enabled. This is currently not being used in 3.x. The isDAS call may not be the only way to determine if a self-managed member is considered a bootstrap node. But at this time we are not using the concept at all, so it is okay to just comment that info out and we will work on it when implementing non-multicast support. (from Joe F.)

private void readGMSConfigProps(Properties configProps) {
...................................................
                    case IS_BOOTSTRAPPING_NODE:
                    configProps.put(keyName, isDas ? Boolean.TRUE.toString() : Boolean.FALSE.toString());
                    break;


GMSAdapterImpl has the following which may apply for a clustered DAS.

//fix gf it 12905
                if (testFailureRecoveryHandler && ! env.isDas() || server.isClusteredDas()) {   <----- CHANGE REQUIRED

                    // this must be here or appointed recovery server notification is not printed out for automated testing.
                    registerFailureRecoveryListener("GlassfishFailureRecoveryHandlerTest", this);
                }


(Technical Requirement Administration 4.6 P2 GMS get-health Support the get-health command on any instance in a user-managed cluster). The HealthHistory constructor currently takes a Cluster object to iterate over the instances. We'll add another constructor for the clustered DAS case so that the instance name is passed in instead and move the creation of the concurrent map out of the constructors. The current isDas() call can remain as-is.

public HealthHistory(Cluster cluster) {
        // move this constructor to static initializer and leave out the size
        healthMap = new ConcurrentHashMap<String, InstanceHealth>(
            cluster.getInstances().size());

        for (Server server : cluster.getInstances()) {
            if (server.isDas()) {   <----- LEAVE AS-IS
                continue;
            }
            // etc
        }
}
public HealthHistory(String instanceName) {
    // add instance to health table
}


This will require a change in GMSAdapterImpl to create the HealthHistory object differently in the case of a user-managed cluster.

if (cluster == null) {
            if (server.isClusteredDas()) {            <----- ADD THIS CHECK AND NEW METHOD
                initializeHealthHistory(server.getClusterMemberName());
            } else {
                logger.log(Level.WARNING, "gmsservice.nocluster.warning");
                return false;       //don't enable GMS
            }
        } else if (isDas) {
            // only want to do this in the case of the DAS
            initializeHealthHistory(cluster);
        }


The HealthHistory object is a GMS client and so is notified dynamically of any changes to the cluster. So it should be ok for the nth instance to come up with only itself in the table, but this will need to be tested. The HealthHistory object is also a listener for changes on the Cluster object when present; this won't happen in the clustered DAS case.

MBeanServer

DynamicInterceptor uses MbeanService.getInstance().isDas() to add "server" as a target. No change required.

JMS

JMSConfigListener uses thisServer.isDas() to ask 'Does this server not need to update the cluster broker list on the active JMS resource adapter when there is a config change event on the Server config?'. A change is required to allow the same behavior for a user-managed instance.

public UnprocessedChangeEvents changed(PropertyChangeEvent[] events) {
..........
if (eventName.equals(ServerTags.SERVER_REF)){
    String oldServerRef = oldValue != null ? oldValue.toString() : null;
    String newServerRef = newValue != null ? newValue.toString(): null;
    if (oldServerRef  != null && newServerRef == null && !thisServer.isDas() || thisServer.isClusteredDas()) {//instance has been deleted  <--- CHANGE REQUIRED
        _logger.log(Level.FINE, "Got Cluster change event for server_ref" + event.getSource() + " " + eventName + " " + oldServerRef + " " + newServerRef);
        String url = getBrokerList();
        aresourceAdapter.setClusterBrokerList(url);
        break;
    }//
} // else skip
if (event.getSource() instanceof Server) {
    _logger.log(Level.FINE, "In JMSConfigListener - recieved cluster event " + event.getSource());
    Server changedServer = (Server) event.getSource();
    if (thisServer.isDas() && !thisServer.isClusteredDas() )return null;   <----- CHANGE REQUIRED

        if(jmsProviderPort != null){
            String nodeName = changedServer.getNodeRef();
            String nodeHost = null;

            if(nodeName != null)
               nodeHost = domain.getNodeNamed(nodeName).getNodeHost();
            String url = getBrokerList();
            url = url + ",mq://" + nodeHost + ":" + jmsProviderPort;
            aresourceAdapter.setClusterBrokerList(url);
            break;
        }

    }


Security

AdminAccessController authenticate(GrizzlyRequest req) - With more recent changes to Grizzly, this check is going to be removed by another project. No change required.

if (authenticator != null) {
            /*
             * If an admin request includes a large payload and secure admin is
             * enabled and the request does NOT include a client cert, then
             * the getUsePrincipal invocation can cause problems.  So normally
             * the DAS will not look for a client cert. To override this, the user can
             * set org.glassfish.admin.DASCheckAdminCert=true but s/he should realize
             * that this can cause problems with large uploads if secure admin
             * is enabled and no client cert is present.
             */
            final Principal sslPrincipal = ! env.isDas() ||
                    Boolean.getBoolean(DAS_LOOK_FOR_CERT_PROPERTY_NAME) ? req.getUserPrincipal() : null;
            return authenticator.loginAsAdmin(user, password, as.getAuthRealmName(),
                    req.getRemoteHost(), authRelatedHeaders(req), sslPrincipal);
        }


Web Services

MetroContainer.isCluster() returns true if this is not DAS, not embedded, and GMS is enabled. MetroContainer will initialize the HA environment for a cluster if availability is enabled. Since a user-managed instance is a clustered DAS, a change is required for this initialization to happen for a user-managed instance. (Technical Requirement 3.3 High Availability P1 HA Messaging Metro HA aspect must work)

public void postConstruct() {
        ................
        if (isCluster() && isHaEnabled()) {
            final String clusterName = gmsAdapterService.getGMSAdapter().getClusterName();
            final String instanceName = gmsAdapterService.getGMSAdapter().getModule().getInstanceName();

            HighAvailabilityProvider.INSTANCE.initHaEnvironment(clusterName, instanceName);
            logger.info("metro.ha.environemt.initialized");
        }
        ...............
    }

    private boolean isCluster() {
        return (!env.isDas() || server.isClusteredDas()) && !env.isEmbedded() && gmsAdapterService.isGmsEnabled();   <----- CHANGE
    }


Command Changes

No command changes are needed to to support creation and lifecycle for user-managed cluster instances since all configuration for this is done through properties that can be set with the set command. 

Security Changes

The changes required for security are describe in the section for Configuration change to remove isDas() calls

Container Issues

Application Id

As part of high availability of stateful session beans, the application id's that are generated at deployment, of stateful session beans the application id needs to be the same across the cluster. Today the application id is generated by using the System.currentTimeMillis() and then replicated by the DAS to the cluster deployment.
However as we move to the user managed cluster design approach, since the deployment will be done on each unmanaged instance, we need to ensure that the app id continues to be the same on all the instances. To make it happen, the generation of application id needs to change from using System.currentTimeMillis(). As part of the change, one way to ensure the same id is generated is to use MD5 hash of the application to generate the application id.
A corner case today is that at redeployment, even if the application has not changed, a new application id is generated. With the MD5 Hash approach, if the application has not changed, but for some reason the application has be redeployed, the application id will remain the same. Details of why the application id needs to change on redployment still needs to be discussed. However except for this corner case, the MD5 hash should work for most cases. We will base the solution on MD5 Hash with some changes to address the corner case described above.

Timers

Look at migrate timers (D: Chris)

Transactions

and transactions (D: Chris)

Clustering for JMS/MQ

Work with JMS/MQ team as to how JMS cluster will work with this feature (D:Chris)

Upgrade