GlassFish Server Open Source Edition 3.1 - Clustering Design Spec

This is the design specification for the 3.1 clustering support.

  • Bill Shannon (bill.shannon@sun.com)
  • Kedar Mhaswade (km@dev.java.net)
  • Jerome Dochez (dochez@dev.java.net)
  • Tom Mueller (tmueller@dev.java.net)

Introduction

Adding clustering support to GlassFish v3 involves a number of separate tasks. Here's a partial list:

  • Centralized administration support. This is the primary topic of this design spec.
  • Maintaining configurations for multiple servers in the Domain Admin Server. By carrying over the domain.xml design from v2, this is largely done.
  • Propagating changes to those configurations to individual servers. Jerome is designing this aspect, which will be written up elsewhere.
  • Synchronizing the state of individual servers with the DAS after restart.
  • Server lifecycle support - install, create, start, stop, restart.
  • Cluster membership and messaging support. This is largely a matter of porting and cleaning up the GMS API and implementation from v2.
  • Session state replication for HA. This is largely a matter of porting and cleaning up the replication SPI and implementation from v2.
  • Java EE spec required management support. This is still TBD.
  • JMS clustering support. This is still TBD.
  • AMX support. Current thinking is that we're going to deprecate AMX, but we need to assess the impact on the admin GUI, which is the biggest (maybe only) user of AMX.

Detailed Discussion of Architecture Components

High-level Components

The overall architecture for cluster support in GlassFish v3 is largely the same as it was for GlassFish v2. One significant change is that we will make the node agent optional. There is a single admin server for the entire cluster (DAS - Domain Admin Server). The individual instances that are part of the same or different "clusters" run on the same or different nodes. A domain corresponds to a running process called Domain Admin Server and different domains can share the same installation. A domain's file system is termed the "central repository" and it is the one that should be protected at all costs. Various nodes make copies of a subset of this structure. This is a subset because not all application/configuration applies to every cluster/node. A node replicates only the portion that it needs. DAS is the single entry point into administration and it is a single point of failure for cluster administration, but not application operation.

Following things are (still) not planned:

  • DAS failover. Several customers have asked for this, and it may be possible to do this later if we do the infrastructure right.
  • An efficient configuration backup/restore solution.

Following changes can be considered:

  • Monitoring of server instances can be done without going through DAS. In GlassFish v2, a high-overhead cascading solution was present to cascade the MBeans from server instances onto DAS. We may rethink that solution. At the same time, it may be desirable to let sysadmins monitor a particular cluster instance individually.

Installation

In GlassFish v2, every machine (node) that participates in clustering needs to be managed separately as far as product installation is concerned. This is probably manageable for a small number of nodes, but when the cluster size grows, it becomes unmanabeable. We don't plan to change this approach for 3.1.

Data Synchronization

The application server data is of three types:

a) server software (modules and libraries)
b) server configuration (regular files)
c) application data and configuration (regular files)

In order to avoid the need to install the software on each node, we would need to be able to synchronize a); that's something to consider
for a future release. To support cluster management, we have to support synchronization of b) and c). This section talks about b) and c).

All the clients (server instances) and the DAS have to agree on the relative paths.

The domain's folder looks like the following:

(| implies a file, || implies a folder)

||---- <domain-name>
     ||---- config (config files common to all servers including the DAS)
            |---- admin-keyfile
            |---- cacerts.jks
            |---- default-web.xml
            |---- domain-passwords
            |---- domain.xml
            |---- keyfile
            |---- keystore.jks
            |---- logging.properties
            |---- login.conf
            |---- server.policy
            |---- sun-acc.xml
            |---- wss-server-config-1.0.xml
            |---- wss-server-config-2.0.xml
            |---- [XXX - any others?]
            ||---- <server/cluster-name>-config (cluster/server-specific data
                                         copied to instance's config folder)
     ||---- addons (only if domain was a v2 domain)
     ||---- applications
            ||---- <application-name>
                   | ---- <application-specific-paths>
     ||---- autodeploy (apps deployed to DAS only)
     ||---- bin (currently empty, used to contain startserv and stopserv)
     ||---- docroot (the default web-container docroot, files are copied to
		     instance's docroot)
     ||---- generated
            ||---- ejb
                   ||---- <application-name>
            ||---- jsp
                   ||---- <application-name>
            ||---- policy
                   ||---- <application-name>
            ||---- xml
                   ||---- <application-name>
     ||---- imq
     ||---- java-web-start
	    ||---- <application-name>
     ||---- jbi
     ||---- lib (libraries common to all servers including the DAS)
     ||---- logs
     |----  master-password
     ||---- session-store

The filesystem under the nodeagents directory (at the same level as the domains directory) is:

||---- <node-agent-name>
       ||---- agent
              ||---- config
                     |---- das.properties
       ||---- <server-instance-1>
              ||---- config
              ||---- applications
              ||---- generated
              ||---- (etc.)
       ||---- <server-instance-2>
              ||---- config
              ||---- applications
              ||---- generated
              ||---- (etc.)

Details of the synchronization algorithm on server startup:

This simplistic synchronization algorithm is based on file's modtime and the java.io.File.setLastModifiedTime(long time) API. DAS does not keep any delta-information as far as modtimes are concerned. All it has is a specific modtime on all the files it manages. DAS does no calculations so that a "difference" between two modtimes is presented in terms of a patch (traditional Unix term) that can be applied at the client's file system. The API however, is kept independent of actual criterion applied. A file's (or folder's) modtime is the criterion we use for this release, but it is possible that a better, more suitable criterion is chosen for subsequent releases. It's intended that the criterion is configurable, though it is not a must for this release.

The synchronization algorithm is file-based for files in the config directory, and directory-based in other cases (e.g., application directories). In the directory-based cases, only the modtime of the top level directory is considered. Thus, any management operations the effect the contents of one of these directories needs to be sure to change the mod time of the directory itself.

Another important aspect of the algorithm is that by default, contents of the entire folder (recursive traversal) are sent when requested. Standard compression schemes are employed when sending the contents.

As you probably know, we "upload" the archive (.jar/.war/.ear) when we do the archive deployment to the DAS. Till now, this archive was "uploaded" to a temporary location, which means it was thrown away. Going ahead we will be retaining this archive because the exploded view of an application is a "true extraction" of its archive. When the archive is not available, one will be copied (e.g. when asadmin deploy --upload=false myapp.war). This is a simple optimization that we must do.

  • Client synchronization algorithm
Is there a DAS configuration available?
	no, then done

    Collect the mod times of the known files in the config directory.
    Send them to the DAS, asking for anything newer in the config directory.
    Was the DAS down?
	yes, then done
    Save the newer files the DAS returns.
    Was the domain.xml file updated?
	no, then done

    Synchronize other content as described below.
  • Server synchronization algorithm
Get requested filenames from client.
	All filenames are assumed to be relative to the specified directory.
	Ignore any filenames that start with non-alphanumeric characters.
	[XXX - what are the limitations on application names?]

    Is synchronization disabled for this client?
	yes, then done

    Is the request for the config directory?
	Is domain.xml out of date?
	    no, then done
	Compare the config files with what the client sent.
	Only <server/cluster-name>-config for *this* client is considered.
	Return missing or out-of-date items.

    Is this the applications directory?
	Figure out which applications the client *should* have.
	    Only applications deployed to *this* server are considered.
	    Skip any directory-deployed applications.
	    XXX - what if the application directory is a symlink?
	Compare the existing applications with what the client sent.
	Return missing or out-of-date items.
	Return some sort of flag for applications that the client has
		that should be removed.

	XXX - return the application directory, or the original archive?
	XXX - what if the user modifies the contents of the application dir?

Synchronization criteria

To avoid unnecessary overhead in the common cases, different criteria are used for different content for synchronization. In particular, we don't want the client, and especially the server, to spend a lot of time collecting modification times for files if we can avoid it. The compromise is that in some cases extra steps may be required to notify the instance that data needs to be synchronized, and unchanged data may be sent as part of that synchronization. This is an area for future optimization if needed.

As described above, the primary optimization is based on whether the domain.xml file is out of date or not. The following table describes the synchronization approach for other content:

Content Synchronization approach
domain.xml Key special case, see above.
config files File by file mod time check. Only modified files are sent.
applications Check mod time of each top level application directory. If the application has changed, all the application files are sent, as well as all the generated content related to the application.
docroot Check the mod time of each file or directory in the docroot directory, but not subdirectories. For each modified file or directory, send that file or (recursively) directory. The docroot directory might be very large, so we don't want to check every file all the time. By checking files in the docroot directory, we pick up changes to index.html. See the Open Issue section for some more discussion about docroot synchronization.
lib Recursively check mod time of each file. For each modified file, send that file. We assume that typically there are relatively few files in the lib directory (less than 20).
config-specific directory The config-specific directory is an optional subdirectory of the config directory, with the name of the instance's config. We check the mod time of any files or directories in the config-specifig directory, but not any subdirectories. The config-specific directory may commonly contain lib and docroot subdirectories, and so might be very large.

In all cases, if the instance has a file or directory in the check list that the server does not, the server will tell the instance to remove that file or directory. The same approach applies to applications (which includes all their generated content).

The specific list of config files to consider specified in an internal default file, config-files.  This file can be overridden by supplying a config-files file in the config directory of the domain.  The file contains the list of config files to be synchronized, with one file name per line.

XXX - What about commands that touch other files but don't change domain.xml,
such as create-file-user?

XXX - What about add-on modules that need to modify or extend the synchronization algorithm?

XXX - Should we externalize the entire synchronization algorithm in an xml file, as v2 did?

When to do startup synchronization?

I've considered two different approaches to startup synchronization:

  1. Synchronization is done early in the server process, before it reads domain.xml.
  2. Synchronization is done in the asadmin start-local-instance command,before the server process is started.

A big advantage of #2 is that the asadmin client already has all the infrastructure necessary to talk to the DAS. #1 would require a special startup service that was sure to run before anything else that might use the configuration information. That seems fragile at best.

However, one of the issues is that the DAS will likely need to know that an instance is "starting", but not yet fully "started". This seems like something GMS could help with. I don't know whether we could run GMS in the asadmin process and notify the group that the server is starting, and then run GMS in the server process, continue in the "starting" state, and then finally enter the "started" state. Alternatively, with approach #1, can GMS be started very early in the server startup process? GMS gets its configuration data from the domain.xml file, which is a problem with this proposal.

More research is needed to determine the best approach, and to determine if GMS can be used during startup synchronization.

Communication (Transport) Layer

The synchronization protocol uses the standard asadmin remote command facility. We need one command, something like "_get_newer_files". The body of the request should be an XML document of the form:

<get-files>
	<instance>name</instance>
	<directory>dir</directory>
	<files>
	    <file>
		<name>name</name>
		<time>time</time>
	    </file>
	    ...
	</files>
    </get-files>

The response to this command is a zip file containing the newer files. It may contain more files than were requested. The mod times for the files are included in the zip file metadata. The existing mechanism for returning files (e.g., for get-client-stubs) can be used.

Command Replication

Administrative commands that are executed on the DAS are replicated to the effected server instances. This is done by sending to the server instances the same admin command request that was sent to the DAS. Thus, each server instance will need the same admin command listener as the DAS. This wiki page goes into the finer details of command replication feature. As a result replicating commands on DAS and individual instances, the DAS and the instances will make the same changes to the domain's configuration. There are two aspects of this approach that interact with what we've described here.

domain.xml synchronization

If the server instance is updating and rewriting its version of domain.xml, the mod time of its domain.xml will likely be different than the version on the DAS. When the server instance restarts, it would find out that its domain.xml is out of date, which would trigger the full synchronization algorithm.

To prevent this, the DAS will need to send additional metadata with each request, describing the mod time that the domain.xml file must have after the command completes.

Application deployment

To allow deployment operations to be executed on the server instances as well as the DAS, the DAS needs to keep the original application archive, and send it to the instances so they can do the same deployment operation. See also the next section on application deployment issues.

XXX - are there any operations that are done during deployment that could not safely be executed identically on each server instance? For instance, does EJB CMP deployment access the database? We probably don't want every instance accessing the database to generate EJB CMP classes based on the database tables.

There are also interesting issues with handling application references, which Jerome will need to describe.

Synchronization Issues

Application Deployment Issues

Note that the application synchronization approach will not allow for (e.g.) hand-editing of deployment descriptors after deployment if we switch to synchronizing the original archive instead of the expanded application directory, e.g., for performance reasons.

docroot Directory

It's not clear how heavily used this directory is. Certainly people are expecting to modify the content in this directory directly. If there is typically lots of content here, synchronization might be expensive. Similar to above, we may need a command that says "please synchronize the docroot now".

Cold Start of a Server Instance Without Node Agent

A node agent is a process that controls the life cycle of the server instances. On each node (machine) we have a node agent process per GlassFish domain. For example, if a GlassFish domain d1 contains a cluster c1 spanning machines m1, m2 and m3 with three server instances s1, s2, s3 on each of them, we need three node agents n1, n2 and n3. This is how it was in GlassFish v2.

____________________
                  |    _____________   |
                  |   |             |  |
                  |   |  s1 <--> n1 |  |
                  |   |             |  |
     _______      |   |_____________|  |
    |       |     |        m1          |
    |       |     |    _____________   |
    |  DAS  |     |   |             |  |
    |   d1  |     |   |  s2 <--> n1 |  |
    |_______|     |   |             |  |
                  |   |_____________|  |  c1 = {s1, s2, s3}
    d1 contains c1|        m2          |
    d1 contacts   |    _____________   |
    n1, n2, n3    |   |             |  |
                  |   |  s3 <--> n1 |  |
                  |   |             |  |
                  |   |_____________|  |
                  |        m3          |
                  |____________________|

Since n1, n2 and n3 are separate processes themselves, their life cycle needs to be managed by human administrators. Since we are making node agents optional for this release, we need an alternate mechanism for situations like:

  • start-cluster, which starts all the cluster instances
  • start-instance, which starts a clustered or non-clustered server instance

As a first step, for GlassFish v3.1, we will assume that server instances are managed manually, or by using the platform-specific facilities
(Windows Servers, Linux rc files, Solaris SMF).

In a future release we will consider an approach such as the following:

To remotely start the server processes from a DAS process, we propose a solution that depends on the ubiquitous sshd which is both standard and secure. Thus, when we want to start a process remotely from a DAS process, we contact the ssh daemon running on a given port (default: 22) on a given machine and ask it to start the GlassFish server process. If sshd is not running, administrator needs to manually start the server (by using a local asadmin command start-server) or manually restart the sshd.

For this to happen, DAS needs to be an ssh client and pure Java libraries are available for the same in the public domain. In fact, Hudson project uses this technique to remotely configure the secondary Hudson machines. Thus, the above picture now looks like:

____________________
                  |    _____________   |
                  |   |             |  |
                  |   |  s1 (sshd)  |  |
                  |   |             |  |
     _______      |   |_____________|  |
    |       |     |        m1          |
    |       |     |    _____________   |
    |  DAS  |     |   |             |  |
    |   d1  |     |   |  s2 (sshd)  |  |
    |_______|     |   |             |  |
                  |   |_____________|  |  c1 = {s1, s2, s3}
    d1 contains c1|        m2          |
    d1 contacts   |    _____________   |
    sshd on each  |   |             |  |
    of m1,m2,m3   |   |  s3 (sshd)  |  |
                  |   |             |  |
                  |   |_____________|  |
                  |        m3          |
                  |____________________|

Once a server is started on a machine, it follows the synchronization algorithm as described above.

Server Software Upgrade Implications

It is possible that when a cluster size grows, not all nodes in the cluster are upgraded simultaneously because that means service downtime. In order to cut the downtime and ensure service availability, the system should be designed in such a way that for limited period, different nodes can be running slightly different versions of GlassFish. This means that we need to carefully manage compatibility of the synchronization protocol and the config files.

Cluster Installation and Configuration

To install and configure a cluster, the following steps are needed:

Install the software on the DAS machine.
    Create the domain.
    Create a cluster with no members.
    Install the software on an instance machine (if a different machine).
    Create a local instance, providing:
	DAS host, port
	admin username, password
	instance name (defaults to local host name)
	optionally, name of cluster to join
    Optionally, use "asadmin create-service" to manage the instance.

Stand-alone Instances

Instances can be created that are not part of a cluster. These are called stand-alone instances. Stand-alone instances share some behaviors with instances that are part of a cluster, but they are different in other ways.

Like clustered instances, a stand-alone instance:

  • is synchronized with the DAS when it starts,
  • can be the target of instance commands such as start-instance, stop-instance, list-instances, etc.
  • can be the target of various dynamic configuration commands, e.g., ping-connection-pool.
  • is configured with an application by deploying the application via the DAS. However, with a stand-alone instance, the target would be the instance rather than the cluster. An application cannot be deployed directly to a stand-alone instance because DeployCommand only runs on the DAS. This is a HiddenDeployCommand that runs on the instance.

Stand-alone instances are implemented primarily to preserve conceptual compatibility with GlassFish 2 and to provide simpler operation for people that want multiple independent instances without having to deal with the configuration of a cluster. A stand-alone instance cannot become part of a cluster at a later time.

The design consequence of supporting stand-alone instances is when administration software is performing on operation on an instance and it retrieves the information about the cluster for the instance, the cluster may be null.

Instance States

This section gives details how DAS keeps track of the states of instances and how various events force an instance's state to change. Since there are various commands / subsystems that need to know the state of an instance before taking an action, a new service, called InstanceStateService, will be made available which can be used by those interested by doing

@Inject InstanceStateService states;
in their code. Once this service is available, it is hoped that all those commands and subsystems that are using various ways (like executing uptime, version commands) to ping an instance from DAS can move away from that and use this service to get to known the state of an instance at any time.

The Big Picture

The following image shows the instance state diagram. The oval / circular boxes are various events and the rectangular / square boxes are the state of an instance as held in DAS.

The following image explains the various events that force state changes.

Here are is a broad overview of how the DAS will keep track of the state of the instances :

  • The DAS will use two files (.instancestate and .replicationstate) to keep track of
    • the state of instances and
    • the state of a command that is going through replication
  • The above files will be present in the $GF_HOME/domains/<domain-name>/config directory and are required to ensure that the DAS can keep track of the state of the instance across DAS crashes / reboots.
  • The .instancestate file will have one line of information per instance (for those instances that were started at least once after their creation). The info stored will be the
    • state of an instance
    • details of commands that failed in that instance, if any
    • details of commands that are waiting to be executed on the instance, if any
  • The .replicationstate file will either be empty (which indicates that the DAS did not get restarted while a command was getting replicated) or it will have details of the command that was getting replicated, the target specifed for the command by the user, the instances where the command had already succeeded by the time DAS crashed / was brought down.
  • When DAS is started, it will read the above two state files, do a GMS poll and reconcile the state of instances based on events 1, 2, 3, 15, 16
  • When DAS receives the _synchronize-files command, the state moves to STARTING state and at this time a timer is started for that instance. If the GMS JOIN_AND_READY event is not received within the timeout, then it is assumed that the sync of instances with DAS failed and the instance goes back to NOT_RUNNING state
  • All commands received while an instance is in STARTING state are queued and the queue is emptied once the JOIN_AND_READY event is received from GMS for that instance. In case some error happens while the commands are being dequeued, then those commands are moved to failed commands and state of that instance is changed as per diagram

To make life easy for the potential users of this service :

  • The current instance states defined in ServerEnvironment will be removed; the states defined in InstanceState will be only one
  • Those who are interested in displaying the state of an instance (like list-instances command or GUI) should just use the
    InstanceState.StateType.getDescription()
    to display the state of an instance
  • Those who are interested in a live instance only may want to focus only on instances that are in RUNNING or RESTART_REQUIRED states, in that order of preference. For example, the recover-transactions command (which has to move an existing transaction from one instance to another) may want to move the transaction from source to destination only if the destination is in RUNNING or RESTART_REQUIRED state.

The one known drawback of this approach is that there is a chance that an instance is put in the RESTART_REQUIRED state even though a command got replicated at that instance successfully. This can happen in the following scenario :

  • The command went through OK on DAS
  • The DAS was trying to replicate the command on applicable instance(s)
  • The instances received the replicated command and the replicated commands executed successfully at the instance(s) and the response was sent back to DAS
  • For whatever reason DAS never got the response

In the above sequence, the DAS will mark the instance as RETART_REQUIRED which is actually redundant. We will live with this drawback for now (and, if possible, find a solution also) because the alternative approach of keeping the state at instances has potential issues such as :

  1. If a replicated command never reaches an instance, then the instance can never put itself in RESTART_REQUIRED state unless the DAS also keeps track of such missing commands. If DAS starts keeping track of missing commands, then the state calculation becomes complicated because everytime the state at DAS an instance have to be reconciled with each other
  2. Storing state at instance is definitely light weight but if there is a temporary n/w error when DAS pings an instance, DAS has to mark that instance as NOT_REACHABLE and there is no clear way of getting that instance out of NOT_REACHABLE state unless DAS periodically pings that instance. Also, if an instance is in NOT_REACHABLE state, what to do with command replication is also not a straightforward decision (we can either not replicate to that instance which means penalizing the instance unnecessarily or ping again at the time of replication which means more time for command replication and not scalable)
  3. Storing all state at DAS will require DAS to read the .instancestate file at startup and this file will have the last known state of all instances (one line per instance). For a developer who will (most likely) not have any instances, this will not add to the startup time (because the file will be empty). In a large installation with many instances, this will add a very small time (time to read a file with number of lines = number of instances). If state was maintained in instances, then DAS will have to ping all instances at startup to get their state and this will likely take more time than reading a file and not that scalable. Even if we can assume availability of GMS, still DAS will have to ping the instances to see if the instance required RESTART or not.

File format and restrictions

Since the state is going to be saved in file .instancestate

Phased Implementation

Due to time and resource constraints, the above will be implemented in different phases :

  1. Phase 1 : No command queueing (for commands that arrive when an instance is in STARTING state); No .replicationstate file (so state change if DAS crashes during replication of a command will not be covered); No GMS events (so state changes because of unplanned instance shutdowns will not happen) ; Basically this will be a simple implementation that won't take care of all failure scenarios
  2. Phase 2 : Support for GMS events (if GMS is present); Look at additional useful info that can be added to state based on different GMS event types
  3. Phase 3 : Add support for queueing of commands, storing state in .replicationstate

Note 1 : TBD : Can we queue deploy command; Is there a way to invoke the postDeploy/create-app/ref command alone on instances where a queued deploy command has to be executed; needs more investigation
Note 2 : TBD : A first look at resource commands indicates that they can be queued and executed on instances later; needs more investigation before we can take the final decision
Note 3 : TBD : Versioning the state files - need to take care of it ? How important is it ?

Upgrade

The clustering implementation for 3.1 must support upgrades of clusters from a GlassFish v2 installation. This section describes design details for accomplishing an upgrade. 

Generally, the steps for upgrading a domain with a cluster from v2 to 3.1 are:

  1. Upgrade the DAS for the domain.
  2. Establish the instances.

There are several options for step (2):

2a) Use the manual synchronization commands, export-sync-bundle and import-sync-bundle to establish the instances on each node.

2b) Use a sequence of create-local-instance commands to establish the instances on each node.

2c) Copy the nodeagents directory tree from the existing node, and run some command to cause the instances to be upgrade for 3.1

2d) Use a new "recreate-cluster-instances" command that would use the SSH capability to establish the instances on each node.

Of these, the 3.1 release will support (2a) and (2b). If application-specific data must be preserved from the v2 instances, it will be up to the user to get that data copied to the instances under 3.1. Note: having instance-specific data is discouraged. An enhancement for (2b) would be to have a single command, "recreate-local-node", that would run the the right create- local-instance commands for a node so that the user doesn't have to worry about what instances are on which nodes. A recreate-local-node command is not planned for 3.1. 

To upgrade the DAS for the domain, the clustering data from v2 must be converted to the format for 3.1.  This includes the clusters, servers, configs, and node-agent elements in the domain.xml file, and all elements that they reference. Conversion of this data is implemented using the upgrade framework that is already in place for GlassFish.

Since the supported options do not make use of the v2 nodeagents directory tree, there is no need to copy that over from the v2 installation. The node directory tree is recreated when the instances are reestablished in step 2.

Software Partitioning

The clustering software is partitioned into the following modules within the GlassFish source tree:

Module Name Source Tree Subdirectory Jar Purpose
Cluster Admin CLI cluster/cli cluster-cli.jar Contains only local asadmin commands. This module should never be loaded by the server.
Cluster Admin cluster/admin cluster-admin.jar Contains (among other things) the remote admin commands that run in the server.
Cluster Common cluster/commmon cluster-common.jar Contains classes that are shared among the other modules.
Cluster SSH Provisioning cluster/ssh cluster-ssh.jar Contains software related to ssh support for remote nodes.
GlassFish GMS Boostrap Module cluster/gms-bootstrap gms-bootstrap.jar Software for detecting whether there are clusters that require loading the GMS software.
GlassFish GMS Module cluster/gms-adapter gms-adapter.jar Software that integrates the Shoal module into GlassFish for use in the group management service.

Cluster Admin Commands

The following new commands are used to configure and manage a cluster:

create-cluster

Usage: create-cluster
        [--config <config>]
        [--systemproperties (name=value)[:name=name]*]
        [--multicastport <multicastport>]
        [--multicastaddress <multicastaddress>]
        cluster_name

--multicastaddress and --multicastport are used to broadcast messages to all instances in the cluster. GMS uses it to monitor health of instances in the cluster.

--multicastaddress is a renaming of undocumented v2.x --heartbeataddress. --heartbeataddress is an alias of --multicastaddress. Valid values range from 224.0.0.0 through 239.255.255.255. Default is "228.9.XX.YY" where XX and YY are independent values between 0..255.

--multicastport is a renaming of undocumented v2.1 --heartbeatport. --heartbeatport is an alias of --multicastport. Valid values are from 2048 to 32000. Default value is a generated value between the valid ranges.

HADB options are no longer needed. They are ignored, with a warning that the option has been deprecated and may not be supported in a future release.

[--hosts hadb-host-list]
[--haagentport port_number]
[--haadminpassword password]
[--haadminpasswordfile file_name] [--devicesize devicesize ]
[--haproperty (name=value)[:name=value]*]
[--autohadb=false]

create-instance

Usage: create-instance
	--node <node_name>
       [--config <config_name> | --cluster <cluster_name>]
       [--systemproperties (name=value)[:name=name]*]
       [--portbase <port_number>]
       [--checkports[=<checkports(default:true)>]]
        instance_name

See discussion on admin@glassfish.dev.jav.net. Most likely this command will not be present in 3.1, but will be added later when we have node agent support, or the ssh-based equivalent.

create-local-instance

Usage: create-local-instance
       [--node <node_name>]
       [--nodedir <node_path>]
       [--savemasterpassword[=<savemasterpassword(default:false)>]]
       [--config <config_name> | --cluster <cluster_name>]
       [--systemproperties (name=value)[:name=name]*]
       [--portbase <portbase>]
       [--checkports[=<checkports(default:true)>]]
       instance_name

See discussion on admin@glassfish.dev.jav.net. This new local command will be used on a node to initialize a server instance on that node.

The create-local-instance command also performs part of the function that was performed by create-node-agent in v2. It creates the file system structure, including the das.properties file so that instances have the information that they need to contact the DAS.

Many of those options wouldn't apply to the remote create-instance command, so that's progbably a good reason to have two separate commands instead of a --local option on create-instance.

The --portbase and --checkports options work just like the corresponding arguments to create-domain.

create-service

Usage: create-service [--name <name>] 
        [--serviceproperties <serviceproperties>] 
        [--dry-run[=<dry-run(default:false)>]] 
        [--force[=<force(default:false)>]] [--domaindir <domaindir>] 
        [--serviceuser <serviceuser>] [--nodedir <nodedir>] [--node <node>] 
        [-?|--help[=<help(default:false)>]] [server_name]

Changes to the create-service command include:

  • added the generic parameters that all commands that work with local instances use namely 

--nodedir
--node

  •  the operand was changed from "domain_name(sp?)" to "server_name". 
    • The command is smart enough to figure out if the server is a domain or an instance. I.e. the same asadmin command will create services for both DAS's and instances. 
  •  --serviceuser 
    • This applies only to Linux.  You *must* run the command itself with root privileges.  This tells the command to set things up so that the specified user is who will be running GlassFish at runtime.  This is very useful for security reasons.  Users don't always (ever?) want root running GlassFish.

copy-config

Usage: copy-config
 	[--systemproperties  (name=value)[:name=value]*]
	source_configuration_name destination_configuration_name

delete-cluster

Usage: delete-cluster
 	cluster_name

--autohadboverride option is to be ignored, with a warning that the option has been deprecated and may not be supported in a future release.

[ --autohadboverride={true|false} ]

delete-instance

Usage: delete-instance
 	instance_name

delete-local-instance

Usage: delete-local-instance
       [--node <node_name>]
       [--nodedir <node_path>]
       instance_name

Unregisters an instance from domain.xml and deletes the filesystem structure for a local instance. The instance must be stopped. If this is the last instance using the node agent directory structure, then then that directory structure is also removed. 

delete-config

Usage: delete-config
	configuration_name

export-sync-bundle

Usage: export-sync-bundle
	--target cluster_std-alone-instance [--retrieve true|false] [file_name]

import-sync-bundle

Usage: export-sync-bundle
	[--node node_name] [--nodedir node_path] --file xyz-sync-bundle.zip instance_name

start-cluster

Usage: start-cluster
	[--verbose[=<verbose(default:false)>]] ?
	cluster_name

Always and only a remote command, as in v2.

--autohadboverride option is to be ignored, with a warning that the option has been deprecated and may not be supported in a future release.

[ --autohadboverride={true|false} ]

start-instance

Usage: start-instance
        [--nosync[=<nosync(default:false)>]]
        [--fullsync[=<fullsync(default:false)>]]
        [--debug={true|false}]
	instance_name

Need a variant of start-instance that works locally. It should probably mirror create-instance, either having a --local option or a start-local-instance command.

--setenv option is to be ignored, with a warning that the option has been deprecated and may not be supported in a future release.

[--setenv (name=value)[:name=name]*]
setenv added the given properties to the environment of the instance. Not needed in 3.1.

restart-instance

Usage: restart-instance
	[--debug={true|false}]
	instance_name

--debug is a boolean argument.

true --> restart the server with JPDA debugging enabled
false--> restart the server with JPDA debugging disabled
not set: restart with whatever it is set to now in the running server

Restarts itself. If you call restart-instance on DAS it restarts itself.

restart-local-instance

Usage: restart-local-instance
        [--node <node>]
        [--nodedir <nodedir>]
	instance_name

Uses the instance_name to get host:port. Then calls restart-instance on the instance itself

start-local-instance

Usage: start-local-instance
       [--verbose[=<verbose(default:false)>]]
       [--debug[=<debug(default:false)>]]
       [--nosync[=<nosync(default:false)>]]
       [--syncfull[=<syncfull(default:false)>]]
       [--node <node_name>]
       [--nodedir <node_path>]
       instance_name

stop-cluster

Usage: stop-cluster
        [--verbose[=<verbose(default:false)>]
	cluster_name

--autohadboverride option is to be ignored, with a warning that the option has been deprecated and may not be supported in a future release.

[ --autohadboverride={true|false} ]

stop-instance

Usage: stop-instance
        [--force[=<force(default:true)>]]
        instance_name

Always and only a remote command, as in v2.

stop-local-instance

Usage: stop-local-instance
        [--node <node>]
        [--nodedir <nodedir>]
        [--force[=<force(default:true)>]]
	instance_name

Do we need this? How does it work? Or do we just leave it to "kill" and the local services facility to stop the instance?

list-clusters

Usage: list-clusters [target]

list-instances

Usage: list-instances [--timeoutmsec=n] [--nostatus] [--standaloneonly] [target]

The three options to the list-instances command are new for 3.1. The timeoutmsec option limits the time that will be spent trying to determine the status of instances. The default is 2 sec. The nostatus option causes list-instances to not display status about whether instances are running. And --standaloneonly option lists only stand-alone instances.

list-configs

Usage: list-configs [target]

create-node-ssh

Usage: create-node-ssh
        --nodehost <nodehost>
        [--installdir <installdir>]
        [--nodedir <nodedir>]
        [--sshport <sshport>]
        [--sshuser <sshuser>]
        [--sshkeyfile <sshkeyfile>]
        [--force[=<force(default:false)>]]
        node_name

--installdir default is the DAS installation directory.

create-node-config

Usage: create-node-config
        [--nodedir <nodedir>]
        [--nodehost <nodehost>]
        [--installdir <installdir>]
        node_name

Changes to Existing Commands

This section describes changes (or the lack of needed changes) for administrative commands that already exist in GlassFish.

create-domain

Support for the --template domain_template option will be added to the command as part of this feature.

delete-domain

No changes to delete-domain are required to support clusters.

stop-domain

As a stretch goal, it would be nice to have an option to stop-domain that would stop all clusters and instances in the domain. However, no changes to stop-domain are required to support clusters.

restart-domain

Add the following argument:

[--debug={color}{true|false}]

--debug is a boolean argument.

true--> restart the server with JPDA debugging enabled
false--> restart the server with JPDA debugging disabled
not set: restart with whatever it is set to now in the running server

Deployment in clustered environment

Deploy

Deployment to the DAS will be largely unmodified as compared to the non clustered version of the application server.  Deployment to remote instances from the DAS will be handled by a hidden (glassfish private) deploy command. This command will take 2 parameters :

  • Original unmodified archive that was originally deployed
  • Zipped up generated directory for the deployed bits on the DAS

The deploy command on the DAS will first perform a normal deployment on the DAS (without loading like in v2) and a supplemental command will be registered for this DeployCommand. This supplemental command will be responsible for sending the hidden command invocation on all necessary (as determined by the application-refs added). The following diagram shows the invocation path.

Client                            DAS                            DAS                    Remote Instance            Remote Instance
                              Admin Framework               deploy backend              hidden __deploy            deploy backend

asadmin deploy X.war
          |--------------->   Receives deploy
                                  |-------------------->  @ExecuteOn(runtime=DAS)
                                                          DeployCommand
                                                              |--> unpack archive
                                                              |--> deploy phases
                                                              |--> optional start
                                                              |--> writes domain.xml
                                   if failure<----------------|
       failure   <--------------------|
                                   if success
                                     |-> for each server
                                          |--------------------------------------------->@ExecuteOn(Runtime=Server)
                                                                                           HiddenDeployCommand
                                                                                             |--> unpack archive
                                                                                             |--> unpack generated
                                                                                             |--> writes domain.xml
                                                                                             |--> invokes load
                                                                                                        |-------------> loadApp()
                                          result <---------------------------------------------------------------------|
                                     |-> collect all results
     results    <----------------------------|

Undeploy

Nothing specific should be put in place to support undeployment in clustered mode. The undeploy command implementation will be annotated with the @ExecuteOn annotation to run on both the DAS and remote instances.

@ExecuteOn(runtimes={DAS, INSTANCE}
@Service
public UndeployCommand implements AdminCommand {
...
}

Redeploy

On the DAS the redeploy command is basically implemented today as a succession of undeploy command followed by a deploy command. As seen above the undeploy() command invocation will naturally take care of cleaning up the deployed bits on both the DAS and the remote instances. The followup deploy() command will be unchanged from first time deployment.

Open Issues

  • Need to describe how the master password is handled on server instances.
  • Can/should we optimize the synchronization algorithms for the case where the DAS and the server instance is on the same machine, e.g., to avoid copying the application data? Maybe it's enough to just support "directory deployed" applications that are deployed to a shared filesystem?
  • Describe how a rolling upgrade is done, both for applications and for the server code itself. See this separate page for early discussion on rolling upgrade.

Regarding docroot synchronization, here are some more ideas from Nazrul:

This was an open issue during our AS ARCH review. Current implementation does not pick up changes to files (ex. in docroot) if they are two or more level down in the directory hierarchy.

Here are some thoughts on how we may want to position this with end users..

We could have something like a synchronize command that user will have to invoke to get their content synchronized.

% asadmin synchronize --target <cluster | std-alone-instance>

synchronize command may also have additional option to help us identify a sub-set of content that needs to be synchronized.

For example,
%asadmin synchronize --application <name-of-application>     OR

%asadmin synchronize --docroot       OR

%asadmin synchronize --config

It would be great if we can synchronize while the server is running so that user can shutdown DAS afterwards and/or does not have to wait until the instances are shutdown.

One option would be to use the SSH based SCP feature that Rajiv is adding. We could save the synchronization zip in a known location that server startup can pickup.

User may also use --fullsync option during instance startup. But, this may be expensive in certain cases to do full synchronization.

docroot location
============
Since our current implementation is not compatible with GlassFish v2.x (docroot, changes to deployment descriptors, JSPs, etc. are not picked up), we may also want to take this opportunity to point to config specific directory for docroot. For example,

Currently, docroot is global. Every config points to <instanceRoot>/docroot
All the contents are globally synchronized across the domain. This can be a problem when we have 100 instances (10 cluster with 10 instances each)

Alternative to consider: Use <instanceRoot>/config/<config-name>/docroot

Example of config-name is "cluster1-config". In this scheme, only associated clustered instances get the docroot content.

A more complex method of synchronizing docroot has been deferred for the 3.1 release.  See issue 12029 for the rational. 


I put this comment here, because I found the  Glassfish "OS-process-view" optimisation here in this document. I wait it for long time, so I appreciate it.

But I have two other recommendations (wishes) to make more easy-to-use the instance OS processes:

    • Please don't plan any other process-hierarchy in the future  (like the earlier nodeagent and messagebroker)
    • Please provide a commandline option for a system.propeties file, where can be put several system properties which are currently put one by one on the commandline, so an adminstrator can keep clear the commandline, holding there only those things, which are necessary see in ps command's output.

All these things are important if a man has lot of instances on one host, to able easely overview them.

Thanks

Posted by rezsekzs at Sep 04, 2010 21:33