GlassFish Server Open Source Edition - User-Managed Clusters

1. Introduction

This project will focus on providing a User-Managed Clusters capability in GlassFish based on requirement CLUST-2 from the 3.2 PRD (internal link).

1.1. Project/Component Working Name

User-Managed Clusters

1.2. Names and E-mail Address of Document Authors

Tom Mueller <tmueller@java.net>
Rajiv Mordani <mode@java.net>
Chris Kasso <kasso@java.net>
Jennifer Chou <jc129909@java.net>

2. Project Summary

2.1. Project Description

The purpose of this project is to provide a user-managed clusters feature for GlassFish that allows clusters to be created without a domain administration server (DAS). 

2.2. Risks and Assumptions

Risks

  • Depending on how the requirements are defined for this feature and which requirements are selected for implementation, this may or may not require a fairly large change to the GlassFish implementation.  A large change represents greater risk.
  • This feature may be difficult to test due to the need to test it with an external management infrastructure. 

Assumptions

  • Some solution ideas for this feature assume the availability of shared storage between the instances in the cluster.

3. Problem Summary

3.1. Feature Overview

The requirement for this feature is: 

CLUST-2 Domain Admins must be able to establish homogeneous clusters without a DAS. Instances must have a homogenous configuration. Configurations can be kept in sync through out of band mechanisms like file copying and shared disks. This is a form of manual scaling that can delegate to other autoscaling environments like EC2 AutoScale.

The idea of having a cluster "without a DAS" is motivated from several perspectives:

  1. A GlassFish deployment may have another (non-GlassFish) management infrastructure that is already available to manage instances.  The goal in this case is to perform the operations that would normally be done via a DAS directly with this management infrastructure. This "management infrastructure" may be a software system or it may be a person. Reasons for using another management infrastructure include:
    1. It may be able to outperform the DAS, thereby allowing a larger cluster.
    2. It may be better tuned for the environment where the cluster is running (for example, operate clusters in different domains from a single management process)
    3. It may be managing services other than Java EE containers, e.g., databases, etc. and may be provided a single interface for that
    4. The need for a dedicated host for the DAS can be avoided.
  2. The DAS currently is a single point of failure for administration in the clustered architecture. To provide a higher level of availability, either DAS availability can be increased, or the DAS can be eliminated, and responsibility for providing highly available management can be delegated to another management infrastructure (see #1).  

Environment Considerations

Some of the motivation for having an user-managed cluster is due to environmental considerations:

  • VMs (or even processes) are relatively expensive so we want to limit the need for them.  For example, given the need for 100 clusters in 100 logical domains, i.e., separate domain.xml files, auth realms, etc., with no sharing between the instances in the clusters, it might be undesirable to have to have 100 DAS processes to manage those clusters.  Instead, a user of this feature could develop a management tool that would manage the all 100 clusters in a single process. 

Use Cases

This section describes some sample use cases for this feature.  This is not a complete list of the ways this feature can be used.  This is meant to provide an illustration of typical use cases to assist in making design and implementation decisions.

GlassFish Clusters in an EC2 AutoScale Environment

The Amazon EC2 AutoScale feature allows a user to configure a group of virtual machine instances that are created and deleted automatically based on scaling rules. With this feature, it will be possible for the instances in an autoscale group to form a GlassFish cluster, such that the application provides session failover and other high availability features that are available with standard GlassFish clusters. Here is how it is expected to work:

  1. The user creates an AMI that contains the GlassFish installation and additional scripts.   These scripts use user data (a cluster name and application WAR file) associated with the AutoScale launch configuration created in the next step to create a cluster and install the application.  Alternatively this user data could be embedded into the AMI itself if a custom AMI is created for each autoscale group.
  2. The user creates an AutoScale launch configuration that specifies AMI from step 1 and the name of the cluster and the application to be deployed. 
  3. The user creates an AutoScale group, specifying the  launch configuration from step 2. 
  4. This results in the first VM being created. When the VM boots, a user-provided script within the AMI uses commands provided by this feature to create a GlassFish instance in the specified cluster (see req. 1.1 and 1.3 below), start the instance (see req. 1.7), and deploy the application to the instance (see req. 2.1). When the instance is created, the GMS information needed to form the cluster is provided (see req. 3.6). At this point, a single instance is running the application.
  5. When AutoScale causes another VM to be created, the same process as in step 4 happens. However, this time, the GMS group is formed between the two instances, and they start sharing session failover information. Amazon has updated the load balancer information for the AutoScale group so that requests are being routed to both instances, and if one of the instances fails, sessions will automatically fail over to the other instance (see req. 3.1 and 3.2)  

Custom Solution for Independent Cluster Management for Multi-Tenancy

Company Example.com wants to offer a hosting solution based on GlassFish that provides customers with the ability to deploy highly-available applications to clusters. They are expecting to serve hundreds of customers, each needing relatively small clusters (2-10 instances each).  Example.com already has an infrastructure to create and manage virtual machines (VMs), with the ability to create VMs for other services such as a database. Each of Example.com's customers (the tenants) must have an independent configuration, i.e., it is not acceptable for tenant A and tenant B to have configuration data in the same domain.xml file. With GlassFish 3.1, Example.com could create a separate domain for each tenant, but this would mean running a DAS for each tenant.  To avoid this, Example.com uses this feature to implement a custom cluster management solution that creates clustered GlassFish instances on virtual machines, each with completely independent configuration information. This custom cluster management solution runs in its own GlassFish cluster (a standard 3.1 cluster). It maintains isolation of tenant data.  When a customer deploys an application, the following steps happen:

  1. The required number of VMs are allocated for the instances.  Each VM instance already has GlassFish installed and is accessible via SSH.
  2. The custom cluster management program uses SSH to run commands on each VM that create a GlassFish instance in a user-managed cluster (see req 1.1 and 1.3), start the instance (see req. 1.7), and deploy the application to the instance (see req. 2.1). When the instance is created, the GMS information needed to form the cluster is provided (see req. 3.6). At this point, a single instance is running the application.

The custom cluster management solution provides the ability to monitor the instances (see req. 4.3), redeploy new versions of the application (see req. 2.3), add and remove instances from the cluster (see req. 1.3 and 1.4), and make other configuration changes to the cluster (see req. 4.1). 

Terms

The concept of a domain as it relates to a user-managed cluster is unclear. When one has a DAS, the domain is the thing that the DAS manages. If there is no DAS, then what is the domain?

The concept of a node as it relates to a user-managed cluster is unclear. Nodes are the entities used by a DAS to manage instances. If there is no DAS, does there have to be a node? 

The term standard cluster refers to a normal 3.1 cluster that is managed by a DAS.  

The term user-managed cluster refers to a cluster that is being provided based on these requirements.

All of the requirements in the list below are to be interpreted within the context of the CLUST-2 requirement, i.e., without a DAS (whatever that means).

In each case, the ability to provide a certain feature may require a person or the (non-GlassFish) administration infrastructure to interact with one or more of the instances according to some protocol that is specified by the implementation.

3.2. Feature List (Technical Requirements)

Umbrella RFE: GLASSFISH-16431

1. Cluster Management

# Priority Name Description
1.1 P1 create one user-managed cluster Provide the ability to create a cluster consisting of instances that can be started
1.2 P2 create multiple user-managed clusters Provide the ability to create multiple clusters, i.e., clusters that have different names, different sets of applications, and which each have their own HA environment. This builds on the previous requirement (so if this is met, then the previous one is met too).
1.3 P1 add instances to cluster Add an instance to a cluster.  This requires passing in sufficient information so that the instance joins the cluster. For the EC2 Autoscale case, this is what would be used as part of the launch config.

Should there be a limit on how many?
1.4 P1 delete instances from cluster Delete an instance from a cluster
1.5 P2 choose instance names Allow the names of the instances to be of the user's choosing.
1.6 P2 choose cluster names Allow the names of clusters to be of the user's choosing
1.7 P1 instance start Allow instances in a user-managed cluster to be started
1.8 P1 instance stop Allow instances in a user-managed cluster to be stopped
1.9 P3 list local instances List the user-managed cluster instances that have been defined on the local system (similar to list-domains in that it is a local command). This command should provide information about the cluster the instance is in.

2. Application Deployment

# Priority Name Description
2.1 P1 single application deployment Allow an application to be deployed to an instance in a user-managed cluster
2.2 P2 multiple application deployment Allow multiple applications to be deployed to an instance in a user-managed cluster. This builds on the previous requirement (so if this is met, then the previous one is met too).
2.3 P1 application redeploy Allow an application that has been deployed to the user-managed cluster to be redeployed, i.e., undeployed and a new version deployed (or deploy --force) without recreating the cluster. This means that any cluster-specific configuration is retained.  JMS messages are preserved, etc.
2.4 P1 application undeploy Allow an application to be removed from a user-managed cluster.
2.5 P2 get-client-stubs Support get-client-stubs and other app client related commands for an application deployed to a user-managed cluster

3. High Availability

# Priority Name Description
3.1 P1 session failover Sessions state must failover from instance to instance as it does with standard clusters
3.2 P1 SFSB failover State for stateful session beans must failover from instance to instance as it does with standard clusters
3.3 P1 HA messaging Metro HA aspect must work
3.4 P2 migrate timers Move timers from one instance to another
3.5 P2 transactions Support the various transaction related commands (freeze, etc.)
3.6 P1 GMS clusters Support all methods that GMS provides for forming the cluster, i.e., multicast, non-multicast, etc.
3.7 P3 load balancer configuration Support configuration of a load balancer from the user-managed cluster.
3.8 P1 JMS conventional clusters Support JMS  conventional clusters
3.9 P2 JMS enhanced clusters Support JMS enhanced clusters (supports high availability of the messages)
3.10 P3 span EC2 AZ Support forming a user-managed cluster with instances in multiple EC2 availability zones

4. Administration

4.1 P2 configuration changes Allow any configuration change that is possible with asadmin on a standard cluster to be accomplished on a user-managed cluster without recreating the cluster.
4.2 P2 non-homogeneous error handling The system must handle the situation where the user-managed cluster of instances is not homogeneous gracefully with error messages indicating that an inconsistency has been detected.
4.3 P2 monitoring Gather monitoring data from instances
4.4 P2 collect log files Collect log files from instances
4.5 P3 backup Provide backup for an instance within a user-managed cluster
4.6 P2 GMS get-health Support the get-health command on any instance in a user-managed cluster
4.7 P3 console Allow the console to be used with a user-managed cluster (need to define what needs to work within the console)

5. Installation

5.1 P1 separate installs Instances can be created on different hosts using different installs of GlassFish, i.e., GlassFish does not need to be installed into a shared directory.
5.2 P3 remote install and create instance Provide a command similar to the setup-ssh, install-node, create-node-ssh, create-instance sequence which would allow a user to copy the GlassFish bits to another host and create a cluster for a user-managed instance there.
5.3 P3 installer Offer option in installer to create instance for user-managed cluster
5.4 P1 supported environments The feature must be usable in a virtualized environment such as with EC2 Autoscale, but it also must be usable on a physical server or in a manually created virtualized environment.  The feature must be usable with all of the operating systems that GlassFish supports.

6. Miscellaneous

6.1 P1 security Appropriate for safeguards must be provided for preventing unauthorized modifications to an instance configuration.
6.2 P3 create-service Support the create-service functionality for an instance in user-managed cluster
6.3 P2 uptime Support uptime for an instance in a user-managed cluster.

Non-Requirements

  1. Data for instances in a user-managed cluster may or may not be stored in directories associated with domains and nodes. This is an implementation decision.
  2. The implementation may or may not require a shared file system or some other form of shared storage to work over and above what is required for a standard cluster. This is an implementation decision.

3.3. Justification

This is a P1 requirement from the PRD.

4. Technical Description

For the details about the design for user-managed clusters, see the design specification.  The information below is preliminary information that led to the eventual design.

4.1. Details

Possible Design Approaches

There are several basic approaches for implementing a user-managed cluster (not sure if these all qualify as "without a DAS"):

  1. Independent Instances: Delegate responsibility for instance synchronization (the process of keeping the cluster homogeneous) to the non-GlassFish management infrastructure. This means that each instance is administered independently and the instances do not communicate with one another for administration. This is how instances operate in 3.1 when a DAS is present.  With this approach, the management infrastructure is responsible for keeping the overall administrative state for the cluster and the domain, as an individual instance may not have the complete state. From a GlassFish perspective, instances are a hybrid between the DAS and clustered instances of 3.1.  They are like the DAS in that they can execute admin commands, but they are like clustered instances in that they can join GMS groups for clustering. 
  2. On-Demand DAS: Use the DAS as needed to configure the cluster. This might involve moving the DAS functionality to an "on-demand" DAS that could run as part of the asadmin command itself. The console would not be available with this approach.  The on-demand DAS would need to have access to the complete domain state, possible stored on shared storage that is highly available. The on-demand DAS would be started as needed to perform management of the cluster, essentially acting as a configuration editor. If the configuration of the cluster doesn't need to be modified after initial creation, then the DAS can be used to create the initial cluster (with deployment of an application) and then the DAS is not needed.
    1. One way to doing this is to perform the config update on one of the instances, export the sync-bundle again, and reimport the sync-bundle on all other instances.
    2. Another way to do this is to have the DAS do command replication (this has problems if the instances aren't up, and the node hosts have to be right)  
  3. Distributed DAS: Move the DAS functionality to a "floating DAS" that runs in an instance.  With this approach, each instance that is capable of becoming the DAS has to have access to the complete state for the domain. The DAS can be elected using a service such as GMS. Instance-to-instance communication must be possible to perform instance synchronization (at startup and for command replication).  An asadmin command could be targeted at any instance, and if that instance isn't the DAS (at the moment), then it would forward the command to the one that is the DAS. The console too could run on any instance, with commands being forwarded to the instance that is operating as the DAS. One concern of this approach is that the instances in a cluster are no longer equal in the sense that the one that is the DAS has more work to do.
  4. Clustered DAS: Modify a DAS so that is can be a member of a cluster. Each member of the cluster would be its own domain, i.e., it would have an entry in the "domains" directory and would be started with the start-domain command.  However, the domain.xml would have a <cluster> element, with the same cluster name and configuration for each member of the cluster. The DAS would join the GMS group as a core member rather than an observer.

With the second and third approaches, GlassFish still provides centralized management for the cluster; it is just that the single point of failure for the DAS has been eliminated.

The sections below provide design ideas for the first two approaches. The "Unmanaged Instances" design is a form of the independent instance approach is a stepping stone towards the Distributed DAS approach. For the 3.2 release, the Distributed DAS approach is too much to attempt.  The "Simple Approach for On-Demand DAS" provides a very limited functionality.

Current Situation with GlassFish 3.1

With GlassFish 3.1, there are only two types of GlassFish servers, the DAS and an instance (either clustered or stand-alone).  The DAS is not able to be part of a cluster for HA, only a clustered instance can do that.  An application cannot be directly deployed to a clustered instance.  The data for a DAS, e.g., config, log, autodeploy, generated directories, is in a "domains directory", by default the "glassfish/domains" directory in a GlassFish install.  The data for instances is organized under "nodes" that represent servers where the instances are running.  The default nodes directory is "glassfish/nodes" in a GlassFish install.

Option 1: Unmanaged Instances

Define a new type of instance that has the clustering characteristics of a clustered instance, but the deployment and admin command processing characteristics of a DAS.  Since the clustered instances in GlassFish are managed by a DAS, this new type of instance can be called an "unmanaged instance" because it isn't managed by a DAS. This type of instance would accept asadmin commands (such as deploy) and the autodeploy directory would work, but it would also have the capability of being in a cluster, i.e., it would join a GMS group as a "CORE" member, HA sessions and other HA features would work, etc. What one would not get with an "unmanaged instance" is centralized management from the GlassFish DAS and the guarantee that the configuration of the instance stays in sync with other unmanaged instances that are in the same cluster. That guarantee would have to come from something else, such as the (non-GlassFish) management infrastructure.

Name ideas for this type of instance:

  • "unmanaged instance" because the instance isn't managed by a GlassFish DAS.  The current types of instances would be called "managed instances", either clustered or standalone.
  • "independent instance" because the instances do not communicate with one another for administration purposes
  • "user-managed instances" because the instances are part of user-managed clusters
  • "self managed instance" is a more positive spin, but the instance really isn't managing itself.  There's the non-GlassFish management infrastructre that is actually managing it.  
  • "externally-managed instance"  But that seems to be too much of a mouthful. 
  • "pe instance that can join a cluster" This is a relic from the V2,  should we use this in v3?

For the remainder of this proposal, this new instance type will be called an "unmanaged instance".

There are (at least) two areas of the system that would be effected by this. 

  • One is the ServerEnvironment.isDas() and Server.isDas() calls that are spread throughout the system. We would need to look at each of those to determine whether an unmanaged instance should return true or false. 
  • The other area is in the RuntimeType class that is used with the command infrastructure.  We would need to add a new type, RuntimeType.UNMANAGED_INSTANCE, and then we would need to look at everywhere RuntimeType.DAS is used and determine whether RuntimeType.UNMANAGED_INSTANCE should be added.  Also, the default value for @ExecuteOn might have to be changed to include an unmanaged instance.

When a command is annotated with @ExecuteOn(DAS, INSTANCE) it means that the command should be executed on both a DAS and an INSTANCE. But since an instance cannot be both, it does not mean to execute it twice. Adding UNMANAGED_INSTANCE to this would mean that the command should be executed on an UNMANAGED_INSTANCE too.

For an unmanaged instance, the deploy subcommand would work similarly to how it works when the target is the DAS (with no replication). Generally, whenever a command is executed on an unmanaged instance, the command would not be replicated to other unmanaged instances in the same cluster.  It is the responsibility of the EBS software and the host manager to make sure that every command that modifies the configuration of the instance is executed on every instance. For example, if an instance is down at the time that a configuration change is made, it is the responsibility of the host manager to make sure that any configuration updates that were missed while the instance was done are applied to the instance when it is restarted.

Domain Concept and Unmanaged Instances

In the current system, a DAS and instances, whether stand-alone or clustered are organized into domains. Logically, the domain.xml in the instances contains the same information as the domain.xml in the DAS, and it is the DAS's responsibility to keep this synchronized. This is also true of the other data associated with a domain (logging.properties, file user files, deployed applications, password files, etc.).  Since there can be only one DAS for a domain, the configuration data for the DAS itself is in the domains/domainname directory. Configuration data for the instances associated with a domain are in various node directories.

With unmanaged instances, there is still the concept of a domain (at least the instances will still have a "domain.xml" file), but how this is represented on disk is not as clear.  There isn't a DAS for a set of unmanaged instances, to it doesn't make sense to have something under the "domains" directory. 

Data Location

The data for unmanaged instances would need to go into a different area under the glassfish directory.  This doesn't really belong under domains and it doesn't belong under nodes. Maybe an "instances" directory?

DAS Determination

I'm not yet sure what to do with the isDas method.  I expect that it will need to return false. However, we would need to add an isUnmanagedInstance method that would need to be called in many places where isDas is called, i.e.,

if (serverenv.isDas() || serverenv.isUnmanagedInstance()) ...

Currently, there is some inconsistency in how "DAS determination" is done.  The Server config bean, which has an isDas method, just compares the instance name to "server".  However, there is also a "-type" option that is passed to the server when it is stared, and it can take the value of DAS or INSTANCE.  The ServerEnvironment class uses this value to determine the return value for its isDas method. This needs to be cleaned up.

For example, in an experiment that I ran yesterday, I created an instance called "cl2" and passed in the -type DAS and -instancename cl2 arguments, and it would happily run commands and it acted like a DAS, even though server.isDas was returning false.  (I did have to have a <server name="server"/> in the domain.xml.) When I changed the argument to -type INSTANCE, then it wouldn't run commands from asadmin and it formed a cluster with other similar instances. 

For an unmanaged instance, the only information one needs is for the instance you are running in. And that is available via:

@Inject Server server;

server.isDas()
server.isInstance()
etc.

But it is also available via ServerEnvironment methods.

Instance type information is available in the domain.xml.

In the deployment code, there are checks like isDASTarget by checking if the target name equals to "server" and then handle things differently. We would have to change that too. And we need to be able to find out if an instance is an unmanaged server instance with the target name.
IMHO, we need to eliminate any hardcoded references to "server" as the name of the DAS.

User-Managed Clustering and GMS

Currently, a DAS joins a GMS group as a SPECTATOR while clustered instances join the GMS group a CORE member. Only the CORE members are involved in session replication and other HA functions. An unmanaged instance must:

  • Be part of a cluster from the perspective of the GMS service.  So there must be a <cluster> element in the domain.xml to provide the data to GMS on what cluster to join. 
  • Join the cluster as a CORE member so that it can participate in the HA functions of the cluster. 

Since multicast may not be supported within the user's environment, there will need to be changes to GMS to provide the ability for cluster members to find each other through another mechanism. This design is being done by another project.  But the effect on this design is that the creator of the unmanaged instance will be expected to provide coordinates that are then stored in the <cluster> information, similarly to how the multicast bind address is stored today.  GMS will then use this to be able to connect with other members of the cluster. It is assumed that the management infrastructure or the person creating the instance will have information that uniquely identifies the cluster so that GMS can associate cluster members with each other.

To make an unmanaged instance part of a cluster, the domain.xml template that is used to create a GMS cluster will have a <cluster> element in it, prepopulated with this information that is passed in from the management infrastructure.

Option 2: Simple Approach for On-Demand DAS

(Proposal from Bill Shannon).

When an instance is being created, it needs to figure out whether it's the first instance for this application or not. This assumes that we have writable shared storage for the cluster.

If it's the first instance, do this:

  • create a domain
  • start the domain
  • create a cluster with the maximum number of instances
  • fetch the app
  • deploy the app to the cluster
  • export the sync bundle for the cluster
  • save the sync bundle to shared storage
  • create the first instance
  • start the instance
  • stop the domain

For other instances, do this:

  • wait for the first instance to finish creating the sync bundle
  • fetch the sync bundle from shared storage
  • create an instance from the sync bundle
  • start the instance with --nosync

Once the instance is running, it works (almost) exactly as it does today. No need to handle a different type of instance or some DAS/instance hybrid.

Obviously there's some important details glossed over...

  • GMS has to work in this environment with no multicast and with some other mechanism for finding the group master. Perhaps the group master should write its address to shared storage.
  • Each instance needs to find other instances by asking GMS instead of relying on configured host names.
  • When each instance is created, a name for the instance must be chosen that doesn't conflict with any running instance from the set of names that were used when the first instance was created. GMS could help with that, or some other data stored in shared storage could be used.

You can imagine some ways to optimize this around the edges, especially for large applications. The key aspect of this approach is that the core of GlassFish is (almost) unchanged. Pretty much all the code continues to be used in the same environment it's used in today, which we know already works. The changes are almost all at the GMS level.

This may not be the most efficient or elegant approach, but I think it's the least risky approach and should be relatively cheap to implement.

Comparison of the Two Approaches

Note: both approaches can be made to meet all of the requirements.

Advantages of Option 1

  • WRT admin, dovetails into what we had with v3 before clusters. This is where the product has been before.  V3 had PE instances only. Clustering tho, is new.
  • Doesn't depend on shared storage
  • Faster for configuration changes (no need to start/stop DAS, do backups/exports)
  • Faster for other asadmin commands (such as uptime) because there is no need for the DAS
  • Instance configuration via asadmin is just like for a DAS, i.e., just run asadmin with the address of the instance (no need for front end as in option 2)
  • Implementation of the console for one of these instances would be easier (P3)
  • Potentially, the software that needs to be distributed to each host could be smaller
  • In the longer term, this might be easier to extend to provide the distributed DAS capability

Advantages of Option 2

  • Uses the GlassFish software as it is currently designed, in a way that we already know works
  • Don't need to take care of application id syncing (need for EJBs) because it is already handled via domain.xml
  • Implementing export of load balancer data would be easier with this option

Design/Implementation tasks for Option 1

  • Setup FSD outline (D:Tom)
  • Use INSTANCE  RuntimeType  for new type of instance (D: Tom)
  • Check the isDas() calls to see what question is really being asked, and identify new configuration information needed to answer question for these instances.  (about 128 calls) (D: Jennifer)
  • Look at DAS-only @ExecuteOn for asadmin commands to see which ones apply for the new instance type (D: Chris)
  • Figure out how deploy and _deploy can be merged, and how the extra options for first/last instance (D: Rajiv)
  • Change the security such that asadmin can work against the new type (D: Tom)
  • Plan to use the existing nodes directory structure for storing data files for the new instance type, cluster/domain name the same, one cluster per domain, instances put into nodes/localhost-<domainname> (D: Tom)
  • create asadmin local commands for create/delete/start/stop of new instance type (D: Tom)
    • create the template that is used with create-local-instance
    • work with GMS team to define create-local-instance options for GMS configuration for non-multicast (D: Tom)
  • Need to make sure that application id stays the same for all instances (needed for EJBs) - investigate using hash on app content to determine id (D: Rajiv)
  • Need to make sure that stuff that happens only on DAS only happen in one instance (e.g., creating a database instance) (D: Rajiv)
  • Modify list-commands so that it only returns the commands that can actually be executed on the new instance type (D: Tom)
  • Work with JMS/MQ team as to how JMS cluster will work with this feature (D:Chris)
  • Look at migrate timers and transactions (D: Chris)
  • Backup for an instance has to be implemented (P3) (D: Chris)
  • Need to modify create-service to support the new instance type (P3) (D: Jennifer)

Implementation tasks for Option 2 (draft)

  • create asadmin local commands for create/delete/start/stop of new instance type
  • Need a start instance that will import from a sync bundle
  • for any configuration command, a front-end or something would need to be written for asadmin in order to start the DAS, run the command, export, stop the DAS, etc.

Team recommendation: Option 1

However, there was a meeting on June 27, where Option 4 was discussed, and the decision was made to go with option 4 rather than option 1.

4.2. Bug/RFE Number(s)

GLASSFISH-16431 user-managed Clusters

4.3. In Scope

The P1 requirements are "must haves".  The P3 requirements are generally extra credit and will only be permitted if time permits.  The P2 requirements are "nice to have".

4.4. Out of Scope

The following commands will not be supported for user-managed clusters (at least not in their current form).  If it is possible to run asadmin commands on an instance in a user-managed cluster (see req. 4.1), then these commands will not be available.

  1. start/stop-cluster 
  2. start/stop-instance for the other instances in the cluster
  3. list-instances
  4. list-clusters
  5. create/delete-cluster

4.5. Interfaces

4.5.1 Public Interfaces

CLI

For specifics on the CLI interface changes for this feature, see the design specification .

All of these are local asadmin commands. This means there is no REST or GUI interface for these commands.

4.5.2 Private Interfaces

GMS

The user-managed clusters administration software has an interface with GMS for setting up the cluster. Specifically, this is....

4.5.3 Deprecated/Removed Interfaces

N/A

4.6. Doc Impact

The documentation impact is in the following areas:

  1. New man pages for new asadmin commands in the Reference Manual
  2. New sections in High Availability Administration Guide explaining user-managed clusters.
  3. Administration Guide overview section to introduce user-managed clusters where domains are explained
  4. Deployment Planning Guide to talk about user-managed clusters where clusters are introduced
  5. If req 5.3 is implemented, then the Installation Guide would be impacted.

4.7. Admin/Config Impact

This entire feature is about administration and configuration impact.

4.8. HA Impact

The following area related to HA are impacted:

  • GMS to support the new ways of forming clusters without multicast
  • Potentially in the IIOP area to provide the ability to create clusters without each cluster have to have the list of other instances in the domain.xml configuration data.

4.9. I18N/L10N Impact

Moderate, as needed for new admin commands.

4.10. Packaging, Delivery & Upgrade

4.10.1. Packaging

The changes due to this feature are expected to be in the Cluster Admin CLI module, cluster-cli.jar, in the glassfish-cluster IPS package. No new packages will be created.

4.10.2. Delivery

The feature will be delivered with the GlassFish download files that already include the glassfish-cluster package.

4.10.3. Upgrade and Migration

There are no requirements for being able to take any existing instance, cluster or otherwise, and covert it to be an instance in a user-managed cluster. So there is no upgrade or migration aspect of this feature.

4.11. Security Impact

The implementation of requirement 6.1 may have a security impact. It is TBD as to exactly what that impact is.

4.12. Compatibility Impact

If any existing commands, such as start-local-instance, are reused, compatibility will be maintained. 

4.13. Dependencies

4.13.1 Internal Dependencies

TBD

4.13.2 External Dependencies

TBD

4.14. Testing Impact and Dev Tests

The admin devtest will be enhanced to create test cases for this feature. a user-managed cluster will be created with multiple instances on a single machine, and an application will be deployed to each instance, HA tested, etc. 

A similar structure can be used for QA testing.  At least one test should be written to demonstrate that the feature can be used in the Amazon AutoScale environment.

The tests from 3.1 that were used to test clustering cannot be used because they depend on the availability of the DAS.  So essentially all clustering tests that perform administration have to be rewritten. 

The admin devtests are already automated in the hudson environment.

5. Reference Documents

Information on Other Cluster Implementations

6. Schedule

6.1. Projected Availability

See the JIRA issues listed in 4.2.

7. Other Information

Demos

  • To be filled in as demos come up each milestone or as appropriate.

Workspace

This feature is being developed in the following branch:

https://svn.java.net/svn/glassfish~svn/branches/umc

This branch was created from the 3.1.1 branch on June 12, 2011 based on revision 47473.

Research

Notes

Email Alias

Open Issues

  1. What should instances that are part of a user-managed cluster be called? 
  2. Which commands should be supported in these instances? (with floating DAS, everything?)
  3. What's the security model for having 4848 open on every instance with redirection to the floating DAS?
  4. How do you determine which instance is the DAS from start-local-instance?
  5. With option 3, how do you solve all of the "distributed database" problems?
  6. Should the floating DAS be based on storing configuration data in an HA database?
  7. When a DAS fails, sessions as well as the DAS functionality itself has to migrate to another instance.  How is this done in such a way that one instance doesn't get overloaded?