v1.0

1.0 Group Management Service(GMS) module Requirements

Feature # Priority Description Comments Milestone
GMS-1.0.01 P4 Administrators shall be able to configure a GMS group discovery mechanism for a site. Probably being replaced by environment-based automated generation of VIRTUAL_MULTICAST_URI_LIST. Mechanism is used to enable a GMS cluster when UDP multicast is unavailable between clustered instances.
Provide CLI to install a group discovery service as an OS service at a Well Known Address.
Provide CLI to configure VM template to reference a site-wide group discovery mechanism.
Provide CLI to configure S3-based group discovery.
(Derived from GlassFish 3.2 PRD Feature ID CLUST-1)
(See GF-3636 for issue that GMS requires UDP multicast.)
https://github.com/javaee/glassfish/issues/16413
4
GMS-1.0.02 P1 Administrator shall be able to configure a cluster to not require UDP multicast.
Provide new Cluster properties to asadmin create-cluster subcommand to use non-multicast rather than default of UDP multicast.
(Derived from GlassFish 3.2 PRD Feauture ID CLUST-1)
https://github.com/javaee/glassfish/issues/16414
3
GMS-1.0.03 P2 Administrator shall be able to configure GMS TCP point to point messages to use SSL.
Enable administrator to secure application session data when it is being replicated due to availability of session data being enabled.
Reuse existing ssl element defined in domain.xml and use it to configure SLLConfig object to use with Grizzly 2.0 filter.
See GF-14664
5
GMS-1.0.04 P2 Administrators shall be able to configure clustered instances to potentially be separated by a firewall.
Support for hybrid cloud (part private and part public cloud).
Default heartbeat failure detection configuration will require adjustment to account for potentially slower network throughput across the firewall.
https://github.com/javaee/glassfish/issues/16415
6
GMS-1.0.05   (previously deleted)
   
GMS-1.0.06 P2 GMS Monitoring Stats Provider
Monitor GMS messaging by target component.
Monitor heartbeat overhead.
Monitor GMS notifications and how often a rebroadcast was necessary due to a dropped UDP multicast message.
See GF-12194 .
Task will be ongoing. Some stats will be ready earlier -- milestone date is for all stats.
5
GMS-1.0.07 P3 Administrator shall be able to create up to 10 instances in a cluster
No known issues supporting this with or without multicast.

GlassFish 3.1 GMS QE tests for 9 instances due to limited machine resources.
(Derived from GlassFish 3.2 PRD Feauture ID CLUST-3 Scalable Clusters) 
 
GMS-1.0.08 P3 Administrator shall be able to create 20 clusters in a domain. This is a non-issue when DAS is not running and none of the clusters are experiencing high application load.

However, if DAS is running and is master for all 20 clusters, need to investigate this scenario for both UDP multicast and non-multicast configurations. Redesign of centralized master processing may be needed to support this case properly.
(Derived from GlassFish 3.2 PRD Feauture ID CLUST-3 Scalable Clusters)

GlassFish 3.1 GMS QE has scenarios for 2 clusters.
 
GMS-1.0.09 P2 Administrator shall be able to configure heartbeats to be sent over UDP unicast transport when multicast is disabled.
https://github.com/javaee/glassfish/issues/16416
TBD
GMS-1.0.10 P3 asadmin get-health needs to work in self configuring(ad hoc clusters) cluster env
Derived from Cluster Management Requirement 1.0.7.
In GlassFish 3.1, this command only runs against the DAS and there is no DAS in self configuring cluster env.
Other commands that run against the DAS (list-instance, start-cluster) are non-requirements.
So need to evaluate if this command should be implemented for self-configuring cluster.

Health info could be stored in GMS master.
Command needs to be able to locate the master via cluster name.
(investigate means to associate clustername with configuration info such as GMS_DISCOVERY_URI_LIST)
(Derived from GF 3.2 PRD Feature ID CLUST-2 Ad Hoc Clusters)
https://github.com/javaee/glassfish/issues/16417
 
GMS-1.0,11 P1 New Heartbeat Failure Detection implementation optimized for non-multicast and no DAS
Self-configuring cluster case.  (Note: This item's priority should track priority of self-configuring cluster priority.)
https://github.com/javaee/glassfish/issues/16418
5
GMS-1.0.12 P3  Support for OS configured with IPv6 only
When multicast is not disabled, an IPv4 format multicast address is being generated.  This does not work in a IPv6 env.
See workaround described in GF-16103 .
(specify an explicit IPv6 multicastaddress using --mulicastaddress parameter to create-cluster) 
Typically, one runs in a dual stack environment and ipv4 mapped addresses over ipv6
sockets is the java default for working in these env.
 
GMS-1.0.13 P2
Virtual multicast optimization to send messages concurrently.
Only reason to wait for completion of send is to be notified of failed delivery.
With Grizzly 2.0 using async send, should be easy to have a nowait mode for delivery.
Point 2 point messges could be sent synchronous and unicast sends that are
part of a broadcast could be sent without waiting for send to complete.
https://github.com/javaee/glassfish/issues/16419
5
GMS-1.0.14 P1 New GMS configuration info on cluster and group-management-service element in domain.xml
Cluster properties DISCOVERY_REGISTRATION_URI_LIST, VIRTUAL_MULTICAST_URI_LIST
New heartbeat failure detection implementation may need alternative configuration parameters. (given different algorithm) (unknown at this point)
SSL configuration for GMS TCP.
GMS Member authentication.
(Impacts GMSAdapterImpl config processing, asadmin create-cluster subcommand parameters)
If using S3 for group discovery, users will need to be able to specify AWS credentials as well. This should probably live in a separate file (e.g. a properties file in config directory).
https://github.com/javaee/glassfish/issues/16420
Can be done before MS4, but without GMS-1.0.01 there is nothing to test.
3
GMS-1.0.15 P3 GMS Member authentication when member is joining.
See GF-14663.  This would be an optional capability that would need to be configured by GlassFish administrator.
 
GMS-1.0.16 P1
Factor Shoal GMS grizzly transport dependent classes into shoal-gms-grizzly-1_9.jar and shoal-gms-grizzly-2_0.jar.
Completes transition from grizzly 1.9 to grizzly 2.0 for shoal gms impl jar. When completed shoal-gms-grizzly-2_0.jar would be integrated into GlassFish 3.2 branch. Currently shoal gms grizzly transport support for both 1.9 and 2.0 are in shoal-gms-impl.jar integrated with glassfish 3.2 workspace.
https://github.com/javaee/glassfish/issues/16421
3
GMS-1.0.17 P3 Support secure communications with discovery service.
With our own REST implementation of the discovery service, it would be nice to have it be secured so that only GMS members can talk to it and the information is confidential. But this isn't necessary for having GMS work, and the service can live behind the firewall. The only information stored there is the location of the group master.
With help from the Jersey team, this may be easy to implement. But it's lower priority than getting everything working.
 

1.1 GMS module NON REQUIREMENTS

# Description Comments
GMS-1.1.1 Will not support network interface changes while cluster is running.
Will be identifying IPV4 only, IPV6 only or dual stack at startup and work for that mode.
GMS-1.1.2 Will not support network fragmentation that isolates group discovery service from clustered instances.
There should be a group discovery service per subnet if it is a concern that routers/network can fail.
GMS-1.1.3 Will not support hybrid approach of using UDP multicast within subnets and unicast between subnets.
Simplification to consider all members reachable by UDP multicast or to consider all members communicating via unicast.
GMS-1.1.4
For no DAS case (self-configuring clusters) when there are no instances running,
supporting multiple GMS clustered instances all being started concurrently.
DAS always was first instance in cluster started and it would become master before any other clustered instances were started.
Thus, GMS Master Collision Resolution technique has never been tested in glassfish environment. There are no existing tests for
this case.  To support this, tests would need to be written and this is the type of concurrent, non-repeatable testing scenario where
something like this would work most of time and intermittently fail.  Thus, this is a simplifying non-requirement request. If one simply
ensures that after the first instance in cluster is started, that there is a 2 second delay before the next instance is started, it would
avoid the chance of issues in the master collision resolution algorithm.
GMS-1.1.5 Will not support hybrid list of master discovery services.
It is not expected that the GMS_DISCOVERY_REGISTRY_URI_LIST cluster property will contain the address of one or more REST discovery services and an Amazon S3 service. We will not try to synchronize our master discovery information with an S3 bucket.

1.2 GMS module Dependencies on other subsystems

# Description Comments
GMS-1.2.1 Each instance name in a cluster must be unique.
Providing a requirement for Self-configuring clusters.
GMS-1.2.2 A DAS API to provide a UUID for a GlassFish domain.
Provides scoping for cluster name when DAS is present.

In GlassFish 3.1, clusters are scoped to a domain. One can have the same cluster name across multiple domains.
The intended use for this UUID is to provide a namespace scoping for clusters in the GMS Group Discovery mechanism.
P1 issues GMS-1.0.1 and GMS-1.0.2 require this.
GMS-1.2.3 A Self-Configuring Cluster API to provide a namespace scoping for cluster names.
Scoping for cluster name when no DAS.

Alternative is that the GMS group discovery URI provides the namespace context.  For example,
one would specify an option to "asadmin get-health" that would provide the context to evaluate
the get-health clustername within. Using this approach, the Group Discovery service would not be able to be site wide.
P1 issues GMS-1.0.1 and GMS-1.0.2 require this.
GMS-1.2.4 Ability to associate GMS configuration info with cluster name for self-configuring clusters Uncertain if this info belongs in VM template or is specified at asadmin create-cluster time and associated with the cluster.
When there is no multicast, there needs to be a GROUP_REGISTRATION_URI_LIST for GMS  to dynamically locate its group.
Additionally, default GMS heartbeat failure detection parameters may need adjusting in VM environment (to reflect additional
processing or network latencies that may occur in virtual machine env.  There was such an observation
in the glassfish v2.1 timeframe (over jxta) See GF-5827 for context.)
GMS-1.2.5
Assistance from security team on how to incorporate GlassFish authentication into Shoal GMS  membership join
High level thoughts are there is some authentication class and perhaps token passed in as a Shoal GMS property.
Shoal GMS would use this info as a plugin authentication to authenticate a GMS member when it is trying to
join group.
GMS-1.2.6 Request a preference to not select GMS Master as instance to stop when elasticity manager is shrinking the cluster.
Unsure if this request is achievable or not.  The GMS Master is definitely not a single point of failure so if this recommendation can not be honored, it will not break anything.  However, if GMS Master was always the longest running cluster member and the EC2 environment favored stopping the longest running instance, there would be thrashing
of the GMS Master (constantly having to migrate GMS Master).
GMS-1.2.7 Assistance from Jersey team.
We will need help from Jersey team for adding security to our discovery service implementation. May need help as well to implement the S3 API using Jersey so we don't depend on Amazon (3rd party) AWS client library.
GMS-1.2.8
In Virtual Environment, orderly shutdown of GF app server with GlassFish event PREPARE_SHUTDOWN being sent is required.
GMS registers a Glassfish event handler for PREPARE_SHUTDOWN that results in instance orderly leaving GMS group with GMS Notification of PLANNED_SHUTDOWN being sent to all running clustered instances.  If the virtual machine is just shut down for the clustered instance AND the Glassfish event PREPARE_SHUTDOWN is not generated, GMS will probably detect the instance leaving as a FAILURE. (and the failure detection may take a while if attempting to create a TCP socket to a VM that was shutdown is similar to powering off a machine.)
GMS-1.2.9 Some way to access AWS S3 credentials.
For working in AWS, users will need to be able to specify their credentials. Presumably, other subsystems besides GMS will need this kind of information. GMS will only need the S3 information: access key and secret key. (If no other system needs any user AWS information, then this data could be in a properties file in domain/config, but we would need to coordinate with the admin team on this.)

2.0 List of Metrics

Feature # Priority Description Comments
GMS-2.0.1 P1 Support for monitoring rebroadcasted UDP messages
Aid administrator in identifying UDP buffer is too small.
GMS-2.0.X   To Be Identified

3.0 List of Actions that can be specified / taken by administrator

Feature # Priority Description Comments
GMS-3.0.1 P1 Start and stop a group discovery service.
 
GMS-3.0.2 P1 Configure cluster to use group discovery service.
Could be a Cluster property provided to asadmin subcommand create-cluster.
GMS-3.0.3
P2
Configure UDP unicast to be used as virtual broadcast transport

GMS-3.0.4
P2
Configure SSL for GMS TCP transport.
Provides administrator capability to secure application session data when HA is enabled.
GMS-3.0.5
P2
Validate GMS configuration using "asadmin validate-gms-cluster"
Validate GMS configuration for instances in a cluster. Ensures that all cluster members are able to see each other. Relates to GF-12056
GMS-3.0.6 P2
Configure GMS member authentication for a cluster.
Configured at the cluster element level.  Could be generic cluster properties or a child element of cluster.

4.0 Support for GMS with no multicast on various Virtualization Providers and Host Operating Systems

PaaS Case

Provider N/A
OEL 64 bit
Linux Amazon Linux AMI_[1]_
Solaris 11
Windows
Mac
Aix
Comments
No virtualization
- P1
P1
  - P?
-
-
Oracle VM 2.2
- P1 -   - - - -

With Ad-Hoc Clusters

Provider N/A
OEL 64 bit
Linux Amazon Linux AMI_[1]_
Solaris 11
Windows
Mac
Aix
Comments
No virtualization
- P1
P1
  - P?
-
- Will work with manually created bootstrap member list from user.
Amazon EC2
- P2
- P1 - - - - Will automatically generate bootstrap member list through EC2-specific mechanism.
Oracle VM 2.2
- - -   - - - - Not supported except through manually created bootstrap member list if OVM supports static IP addresses or dynamic DNS. (Would work the same as "No virtualization" case above.)

[1] - Amazon Linux 32/64 bit AMI with GlassFish 3.2 provisioned.  (derived from EC2 AutoScale Requirements for Self-Configuring Clusters)