Status: this work will be moved to different future versions.... GMS 3.2 One Pager 1. Introduction 1.1. Project/Component Working Name: Dynamic Runtime Clustering Support via Shoal Group Management Service 1.2. Name(s) and e-mail address of Document Author(s)/Supplier: Joe Fialli, joe.fialli@oracle.com Bobby Bissett, bobby.bissett@oracle.com 1.3. Date of This Document: 05/05/2011 adapt to non-multicast support design change away from group discovery service and towards auto generation of VIRTUAL_MULTICAST_URI_LIST. 04/29/2011 incorporate impact of DAS-less Ad Hoc Clustering 04/25/2011 adding RFEs 04/11/2011 initial pass, extract info from GMS presentation at GlassFish 3.2 kickoff 2. Project Summary 2.1. Project Description: Shoal Group Management Service provides the following functionality in a Glassfish cluster:
- other Glassfish application services register a handler for GMS
notification events. GMS notifications include events for clustered member joining, ready, leaving, GroupLeadership change, FailureRecoveryAgent and FailureSuspected.
- get the current member status of any member in the cluster.
- sending a message to one, some or all clustered instances in the cluster
- HA uses to replicate session data within the cluster
- get a list of Core or All clustered instances in the glassfish cluster.
The above functionality must continue to function despite changes in GlassFish 3.2 environment.
- A non-multicast configuration mode is being added to enable GMS to run
in environments that do not support UDP multicast or it is undesirable.
- Implementation changes may become necessary due to ad hoc clustering's removal of the DAS.
2.2. Risks and Assumptions: Risks:
- Ad hoc clustering is eliminating the DAS. The DAS was a member of a GMS group(GlassFish cluster) as a SPECTATOR and initial GMS Master.
- GMS MASTER_COLLISION_RESOLUTION requires additional testing if multiple glassfish instances for a single cluster start at once.
Simply having a DAS around temporarily for configuring and starting a cluster would remove this risk.
- The namespace context that a DAS provided for a cluster needs to be considered. Two developers on same subnet could have a GlassFish cluster
with same name as long as those clusters were relative to different domains. What provides cluster name space context in this environment.
- DAS as the GMS Master was isolated from heavy client load. With no DAS, heavy client load on a GMS Master that is a CORE member of GlassFish
cluster (providing HA replication backing storage) could impact GMS system operations such as sending out GMS Notifications, detecting other clustered instances have failed.
- New environments(both ad hoc clustering and virtual machine env) and non-multicast mode introduced in GlassFish 3.2 increase testing matrix.
Assumptions:
- No need to be able to selectively enable security for GMS. Either all GMS TCP communications are secured or they are not.
(highly recommended in non-multicast mode to enable UDP unicast when TCP communications is secured. Then GMS heartbeats will not have to be over a secure transport.)
3. Problem Summary Enable GMS to operate in environments to be supported in GlassFish 3.2. These environments include running application server in virtual machine environments that the network interface does not have a static IP address as typical server environments do and to run in environments that do not support UDP multicast, such as in Cloud computing. See complete technical requirements at [ GlassFish 3.2 GMS Technical Requirements | GlassFish+3.2+Group+Management+Service+Requirements]. 3.1. Problem Area: In past releases, GMS has required UDP multicast traffic in order to dynamically find cluster members at runtime. An alternative GMS configuration mode that does not require UDP multicast traffic between clustered instances is being added for this release. Additionally, an asadmin subcommand validate-cluster(placeholder name subject to change) to validate that this configuration is working properly. See risks section for a description of impact of ad hoc clustering removing the DAS. The DAS has always been the GMS master that performed centralized GMS processing for the cluster isolated from the client load on the core cluster instances. To minimize impact of high client load on GMS system processing, some of GMS centralized processing needs to be made more decentralized and spread evenly among the clustered instances. An administrator should be able to enable security for GMS messaging. This option will enable session data replicated within GlassFish cluster by GMS messaging to be securely transported over the network. 3.2. Justification: High Availability of session data in a Glassfish cluster is implemented using GMS messaging between clustered instances. The HA module for session data relies on GMS notifications of clustered instances joining and leaving the cluster. Other cluster services relying on GMS include IIOP, IIOP load balancer, EJB timer migration, delegated transaction recovery, Metro RM. 4. Technical Description: 4.1. Details: In prior GlassFish releases, GMS required UDP multicast between clustered instances comprising a Glassfish cluster. If "asadmin validate-multicast" was not able to validate multicast traffic was being sent and received between two systems hosting glassfish clustered instances, GMS was not be able to work properly for that GlassFish cluster. To address GlassFish 3.2 PRD feature CLUST-1, clustering must not require multicast, a non-multicast configuration mode is being added to GMS. UDP multicast mode will also continue to be supported and will be the default mode of operation in environments that support UDP multicast. New cluster or group management service configuration property(s) will be added to enable non-multicast broadcast. See GMS non-multicast design for details. The document describes configuration for non-multicast mode and possible optimizations to compensate for loss of UDP multicast efficiencies. New cluster or group management service configuration property(s) will need to be added to support security. The existing ssl element in domain.xml will be leveraged to address GF-14664. This ssl element will be used as a child in cluster or group-management-service to build a SSLConfig java object that Grizzly 2.0 allows to be provided as a filter to enable SSL for TCP Point2Point messages. HA only uses unicast for replicating session data so enabling SSL for TCP results in all replicated session data being secured over network transport. This SSLConfig object is created in cluster/gms-adapter/GMSAdapterImpl initialization and passed as a GMS property for the GMS group and then used within GMS to set the Grizzly 2.0 filter for enabling SSL on TCP connection. 4.2. Bug/RFE Number(s): GF-16413 Administrator shall be able to configure a GMS group discovery mechanism for site. Addresses PRD feature: Clust-1 No Multicast. GF-16414 GMS work without UDP multicast - P1 GMS on Virtual Machine with non-static IP address - P1 Addresses PRD feature: Clust-1 No Multicast GF-16415 Work through firewalls in hybrid cloud - P2 GF-12056 asadmin validate-cluster using existing GMS configurations - P? GF-14664 Security RFE: Configure GMS to use SSL for p2p msgs. - P2 GF-14663 Security RFE: Configure GMS to use member authentication. - P3 GF-12194 Monitoring Stats Provider GF-16418 Minimize degradation of GMS level of Svc when no DAS - P2 Related to PRD feature CLUST-2 Ad Hoc Clusters. GF-16421 Factor Shoal GMS grizzly transport dependent classes into one for grizzly 1.9 and one for grizzly 2.0. (default is grizzly 2.0) 4.3. In Scope: See possible optimizations in GMS no-multicast design document. 4.4. Out of Scope: Encryption for GMS notifications and heartbeats messages configured to be sent over UDP transport. Default GMS failure detection config parameters tuned for a local subnet. No automated tuning to adapt to network latencies of geo distributed cluster members or virtual machine environments. Manual configuration of GMS failure detection parameters is required to adapt to these environments in this release. While optimizations are going to be made to compensate when UDP multicast is not used to broadcast between instances in the cluster, certain GMS operations will have higher performance and require less system resources when using UDP multicast rather than simulating broadcast over a unicast transport. Ability to configure some GMS operations/messaging to use security and others to not use it. Security is enabled or disabled for all GMS operations using TCP. 4.5. Interfaces: // Interfaces may be commands, files, directory structure, ports, // DTD/Schema, tools, APIs, CLIs, etc. // Note: In lieu of listing the interfaces in the one pager, providing // a link to another specification which defines the interfaces // is acceptable. GF-12056 asadmin validate-cluster validates GMS configuration for instances in the cluster. - P2 or P3 GF-16417 asadmin get-health in ad hoc clustering env - P3 GlassFish 3.2 GMS Configuration in domain.xml 4.5.1. Public Interfaces: NOTE: Auto-generation of a cluster's VIRTUAL_MULTICAST_URI_LIST is being considered to replace Group Discovery Service. Group Discovery Service implemented as a restful service. Group Discovery Service implemented using Amazon S3. Shoal GMS in GlassFish 3.2 Discovery service page 4.5.2. Private Interfaces: // List private interfaces which are externally observable. 4.5.3. Deprecated/Removed Interfaces: None at this time. 4.6. Doc Impact: // List any Documentation (man pages, manuals, service guides...) // that will be impacted by this proposal. Documentation will need to be updated to describe new GMS configurations for:
- non-multicast mode
- secured TCP communication
- enable unicast UDP for heartbeats when in non-multicast mode
- new asadmin command "validate-cluster" - validates current gms configuration is working for the cluster.
- deploying group discovery service
- specifying AWS S3 credentials (hope is that other GlassFish subsystems will need this info and GMS will acquire this info from GlassFish 3.2 commons.
Also, cluster attributes multicast address and multicast port were required to have a value in GlassFish 3.1. Either values were provided when creating the cluster or were generated at cluster creation time. For GlassFish 3.2, these attributes are not required to be set when non-multicast configuration is provided. 4.7. Admin/Config Impact:
- Administrator can configure a cluster for GMS to use non-multicast mode.
- For PaaS, asadmin subcommand create-cluster will get new parameters or generic cluster property. - For ad hoc clustering, default configuration in VM template hopefully will be enough. For EC2 VM template, non-multicast with EC2 auto generation of VIRTUAL_MULTICAST_URI_LIST will be enabled. For non-EC2 environment, TBD.
- Administrator can configure GMS messaging over TCP to be secure. (use ssl).
Design document: GlassFish 3.2 GMS Configuration in domain.xml New CLI command: validate-cluster 4.8. HA Impact: HA requires GMS. GMS relies on UDP multicast by default. If that transport mechanism is not available between clustered instances, then this proposal requires the administrator to configure an alternative mechanism that enables GMS to work without UDP multicast. 4.9. I18N/L10N Impact: No 4.10. Packaging, Delivery & Upgrade: 4.10.1. Packaging No. 4.10.2. Delivery None 4.10.3. Upgrade and Migration: Upgrade only needs to properly initialize new properties introduced to support non-multicast. Since non-multicast mode did not exist in GlassFish 3.1, these new properties will be initialized to appropriate values so UDP multicast is still enabled. 4.11. Security Impact: // How does this proposal interact with security-related APIs // or interfaces? Does it rely on any Java policy or platform // user/permissions implication? If the feature exposes any // new ports, Or any similar communication points which may // have security implications, note these here. Introducing capability to specify SSL for TCP connections. Introducing capability to specify using UDP unicast instead of TCP for virtual multicast. Introducing GMS member authentication. (Now a P3 so probably will not make this release with current schedule.) 4.12. Compatibility Impact No incompatibilities have been identified at this time. 4.13. Dependencies: 4.13.1 Internal Dependencies // List all internal dependencies this proposal has on other // software. Include component version requirements if necessary.
- GlassFish 3.2 commons (to get AWS credentials)
- Admin changes removing the DAS for ad hoc clustering.
4.13.2 External Dependencies Amazon Web Service(AWS) Java SDK Shoal GMS 1.6 may require this to run in Amazon EC2 environment. 4.14. Testing Impact // How will the new feature(s) introduced by this project be tested? // Do tests exist from prior releases (e.g. v2) that can be reused? Existing QE and developer tests from prior release can be reused to validate non-multicast configuration. Probably will only need to change QE scripts that configure a cluster or 2 clusters to configure those clusters with non-multicast mode. Developer and QE scripts for initial configuration and creation of the cluster for the OVM environment and ad hoc clustering env need to change. Gathering of glassfish 3.2 server log files by QE in OVM and ad hoc clustering env are impacted and will require adjustments. // Will new tests need to be written? Can they be automated? No new GMS notifications are planned for this release. So new tests are needed for GMS notifications. However, the ability to configure GMS messaging to be sent over secure TCP needs testing. There are no existing GMS QE test that send messages. This functionality could be tested as part of HA QE testing. The goal of enabling secure TCP for GMS messaging is to ensure that session data (http and ejb) that is replicated to another instance in the cluster is performed over a secure transport. Additionally, new tests for GMS monitoring statistics are needed. 5. Reference Documents: GlassFish 3.2 GMS Configuration in domain.xml Shoal GMS in GlassFish 3.2 Design Document - non-multicast and neighbor heartbeat failure detection OLD: Shoal GMS in GlassFish 3.2 Discovery Service Page: This page documents some of the options we had considered for doing group discovery through an external service when multicast was not available. We are moving away from this now (see next link). Shoal GMS in GlassFish 3.2 Group Discovery Page: This page documents our options for doing GMS group discovery in the environments that we will support. Shoal GMS in GlassFish 3.2 Requirements 6. Schedule: See GlassFish 3.2 GMS Project Milestone Schedule 6.1. Projected Availability: GlassFish 3.2 GMS Project page. |