GlassFish Server Open Source Edition 3.1 - Dynamic Reconfiguration

This wiki page / document details how dynamic reconfiguration will work when cluster support is added to GlassFish 3.1.

  • Jerome Dochez (dochez@dev.java.net)
  • Vijay Ramachandran (vijaysr@dev.java.net)

Scope of the Project

Dynamic reconfiguration feature for clustering in GlassFish Server Open Source Edition 3.1 refers to the way CLI commands (for example, asadmin create-jdbc-resource some-options --target some-target) from user to DAS gets reflected in the server instance (if the user specified target is a server instance) or all server instances that are part of the cluster (if user specified target is a cluster).

Feature ID Priority Description Eng Response Owner(s) Estimate (Man Days) Source of Requirement Status / Comments
DYREC-001 P1 Provide infrastructure to apply configuration changes dynamically (--target option in CLI) across a cluster Yes Vijay, Jerome, Sheetal   Feature parity Started, Issue 12030
DYREC-002 P1 Show 'restart required' status for each server instance. If a server instance needs to be re-started, this status will reflect that Yes Vijay   Feature parity This status should survive across DAS re-start until the server instance is synchronized with DAS; if DAS is unavailable during re-start synchronization will be skipped, Issue 12034
DYREC-003 P1 Support for dynamic-reconfiguration-enabled=false flag. This is needed for rolling upgrade Yes Vijay   Feature parity Issue 12035
DYREC-004 P1 Clearly show status at the end of each operations. For all server instances where dynamic reconfiguration failed, please identify the server instances and suggest next steps to users to recover from this error (example, restart the server instances) Yes Vijay   Feature parity Issue 12031
DYREC-005 See issue
Provide support (status, --target) for infrastructure related CLIs Yes Vijay   Feature parity list-* commands show the status such as running, restart required, etc. We need to add that support, Issue 12038
DYREC-006 See issue
Show a list of changes that were not dynamic and/or failed for some reason during dynamic re-config. This list will help system administrators to determine why a server instance should be re-started Yes Vijay     Issue 12037
DYREC-008 See issue
Provide detailed diagnostics to debug dynamic reconfiguration related failures in customer environment Yes Vijay   Feature parity Issue 12036 , Issue 12039

High Level Overview of the Features

To be able to effect the user requested changes (like creating resources as mentioned above) on a server instance / or all instances of a cluster, the following steps have to happen.

  1. Execute the command on DAS; ensure that the command goes through; If the command succeeds on DAS, then
  2. Replicate the same command on the target server instance or all server instances that are part of a specified cluster; report progress status back to client
  3. Ensure that all replicated commands go through fine. If there are errors, report back errors to user while marking instances where the command failed for restart; Handle all error scenarios; Differentiate complete failure of a command (which may trigger a rollback) from a partial failure (the command failed in a small subset of target only which may not require a rollback); In failure scenarios, ensure that information on what failed is clearly available
  4. Ensure that, when commands are replicated successfully on all instances, subsequent restart of any instance does not result in unnecessary synchronization (one scenario is the mismatch of timestamps of domain.xml between DAS and instance)
  5. Take care of supplemental commands (these are commands that cannot be replicated just as they were done in DAS. For example creation of references in instances will be done as a supplemental command to the deploy command - click here for more information on this example).
  6. From a user perspective, the behavior from CLI/console should remain the same except for the need to specify the target cluster / instance for the command. The CLI/console will continue to talk to DAS (just as in GlassFish v3) and DAS will take decision on whether to replicate the command or not (depending on the command, the RuntimeType specified etc.) The command replication infrastructure will ensure that the instances obey only the commands from DAS and no direct commands to instances from anywhere else will be obeyed.

The first step listed above is already happening in GlassFish 3.0 and it is common whether we are in a clustered environment or a stand alone environment. Since the rest of the steps listed above are essentially kicked off after step #1 is completed successfully, rest of this document will only go into the details of the step 2 onwards.

Implementation details

TBD : API reference below is internal javadoc - got to point it to an external javadoc - how?
A new set of annotations and interfaces have been added to GlassFish 3.1 towards implementation of the dynamic reconfiguration feature. The implementation approach is as follows :

Hooking into GlassFish 3.0

This section gives a high level view of how new code for dynamic reconfiguration will hook into / affect existing (GlassFish 3.0) command execution code on the server side.

  • Every CLI command should annotate itself with @ExecuteOn. It should also specify the RuntimeType of this command as part of @ExecuteOn. If the command is expected to be executed only on DAS, RuntimeType should be DAS. If the command has to executed on DAS and replicated on applicable instances, RuntimeType for @ExecuteOn of this command should be set to DAS and INSTANCE.
  • If a command does not have @ExecuteOn, then the implementation will consider that command to have RuntimeType set to DAS and INSTANCE. This way all existing non-local commands in GlassFish 3.0 will not have to be changed and will be executed on DAS and all applicable server instances.
  • The RemoteCommand facility, currently used for communication between user/cli and DAS, will be used (after some refactoring) for communication between DAS and instances for command replication.
  • The GlassfishClusterExecutor will plugged into the current server-side command execution as follows :
    • Currently, a non-local CLI command reaches the server through the AdminAdapter and, after authentication and other checks, the command flows into CommanRunnerImpl.doCommand where parameters are injected and the actual command is executed.
    • Once the command is executed and is successful, instead of returning and if the server environment is RunType.DAS, we will give control to the ClusterExecutor which will take over control. Details of ClusterExecutor are present in the next section.

Command Replication using ClusterExecutor

  • GlassFish 3.1 will have an implementation of ClusterExecutor interface, the GlassfishClusterExecutor. This will be the default ClusterExecutor implementation that will be used if @ExecuteOn is not specified with a ClusterExecutor type. (Which in turn means, GlassFishClusterExecutor will be used for doing command replication for all GlassFish3.0 remote commands).
  • As detailed in the previous section, control comes to the ClusterExecutor only for DAS - not on the instances.
  • It will be the job of the cluster executor to
    • decode the target type and extract all instances on which the command has to be replicated
    • change the parameters (if required)
    • Add DAS timestamp as part of the parameters and ensure that instance has updated timestamp of its domain.xml
    • send the command across to all instances
    • collect progress status / result from instances to which command was sent
    • Take care of errors (more details on this in the next section)
    • report success / error to the user
    • Upon successful completion, execute supplemental commands
    • In case of failures, set restart required state as appropriate
    • In case of failures, if the command is an UndoableCommand, try to undo the command

Supplemental commands

Supplemental commands are commands that are specified by the user to be executed upon successful completion of a CLI command.For example, deployment in a cluster will be using supplemental command to deploy the bits in various instances as explained here. The GlassfishClusterExecutor will look up the habitat to see if supplemental commands have been registered for the current CLI command.

  • If supplemental commands are specified for execution before CLI execution (by using Supplemental.Before, then the supplemental command will be executed on all nodes and the primary CLI command will be executed if and only if the supplemental command's FailurePolicy is FailurePolicy.Ignore or FailurePolicy.Warn.
  • The behaviour of supplemental commands that are specified to be executed after that primary CLI command (and the effect of supplemental command execution on overall command replication result) is detailed in the table in the next section.

Undoable commands

Undoable commands give the command implementor a way to rollback a failed command execution. The table in the next section details the behaviour of command replication if the command implementor chooses to implement the UndoableCommand interface.

Command replication results and action taken

The table below gives details on the action taken upon completion of command replication for various scenarios

Result of Command Replication Is command undoable ? Supplemental commands present ? Overall result Action taken
Failure on DAS Does not matter Does not matter Failure  
Success on all instances Does not matter None Success Report success
Success on all instances Does not matter Supplemental.After Result of supplemental command(s) execution and its FailurePolicy Merged report of primary and supplemental command execution
Failure on all instances Does not matter Does not matter Report failure Try to make error report as specific as possible (network error, instances down etc)
Failure on one or more instances No No Failure Report reason for each command failure; Set server-restart
Failure on one or more instances No Supplemental.After Failure Execute Supplemental command only on the instances where command succeeded; Report reason for each command failure; Set server-restart
Failure on one or more instances Yes No Failure Try to undo command on instances where it succeeded; Report reason for each command failure; If undo fails, report that also
Failure on one or more instances Yes Supplemental.After Failure Do not execute Supplemental command anywhere; Try to undo command on instances where it succeeded; Report reason for each command failure; If undo fails, report that also

Securing DAS-Instance communcations

TBD : How DAS will authenticate itself with instance while doing command replication; Do we use SSL by default for communicating with instances or the way admin adapter is configured; Currently investigation various possibilities

Milestone Schedule

Note: for deliverables that have a link to an issue i the status column, see IssueTracker for the status.

Item # Date/Milestone Feature ID Description QA/Docs Handover Status Comments
01. MS1 N/A Detailed investigation and impl spec No Completed Impl details ready; Review completed
02. MS1 DYREC-001 Provide infrastructure to apply configuration changes dynamically (--target option in CLI) across a cluster No Completed Will enable basic command replication; Completes 40-50% of DYREC-001; Partial support for Supplemental commands to enable deployment team make progress; Will not support features like undoable commands etc
03. MS1 DYREC-005 Provide --target support for infrastructure related CLIs No Completed Will enable basic command replication; We will select 2 commands (create-iiop-listener, createJDBCConnectionPool) and show it getting replicated end to end
04. MS1 DYREC-004 Clearly show status at the end of each operations. For all server instances where dynamic reconfiguration failed, please identify the server instances and suggest next steps to users to recover from this error (example, restart the server instances) No Completed Will have some amount of consolidated results; Completes 30-40% of DYREC-004
05. MS2 DYREC-001 Provide infrastructure to apply configuration changes dynamically (--target option in CLI) across a cluster Yes Issue 12030 QA will be able test proper command replication with all scenarios handled for createJDBCResource and createJDBCConnectionPool commands
06. MS2 DYREC-004 Clearly show status at the end of each operations. For all server instances where dynamic reconfiguration failed, please identify the server instances and suggest next steps to users to recover from this error (example, restart the server instances) yes Issue 12031  
08. See issue DYREC-002 Show 'restart required' status for each server instance. If a server instance needs to be re-started, this status will reflect that Yes Issue 12034  
09. MS3 DYREC-003 Support for dynamic-reconfiguration-enabled=false flag. This is needed for rolling upgrade Yes Issue 12035  
10. MS3 DYREC-008 Provide detailed diagnostics to debug dynamic reconfiguration related failures in customer environment No Issue 12036 Provide proposal
12. See issue
DYREC-006 Show a list of changes that were not dynamic and/or failed for some reason during dynamic re-config. This list will help system administrators to determine why a server instance should be re-started Yes Issue 12037  
13. See issue
DYREC-005 Provide --target support for infrastructure related CLIs Yes Issue 12038 Complete test and verification for all listed infrastructure commands
14. See issue
DYREC-008 Provide detailed diagnostics to debug dynamic reconfiguration related failures in customer environment Yes Issue 12039 Complete Implementation

Task List

Task Target Milestone Start End Date Owner(s) Feature ID Status / Comments
Refactor existing CLI-DAS communication mechanism for use in DAS-instance communication Milestone 1 (5/24) 4/19/10 4/23/10 Bill shannon   Completed
Get basic communication working between DAS and instances Milestone 1 (5/24) 4/26/10 4/30/10 Vijay   Completed; Works with everything hardcoded
Send a command to remote instances and get response back Milestone 1 (5/24) 5/3/10 5/10/10 Vijay   Completed
Remove hardcoded instance values and get command replication working with domain lookup Milestone 1 (5/24) 5/11/10 5/17/10 Vijay   Completed
Collect response status from all instances and report back to CLI Milestone 1 (5/24) 5/18/10 5/21/10 Vijay   Completed
Complete command replication infrastructure and error reporting Milestone 2     Vijay    
Design persistent way to report server-restart Milestone 3     Vijay    
Complete support for restart server required status Milestone 3     Vijay    
Support for dynamic-reconfig-enabled flag Milestone 3     Vijay    
Support a queue for commands Milestone 4     Vijay    
Domain.xml timestamp updates in instances Milestone 4     Vijay    
Test infrastructure, iiop commands Milestone 4     Vijay    
Take care of failure recovery cases Milestone 4     Vijay    
Study authentication and security requirements for DAS-Instance communication       Tim    
Complete implementation of DAS-Instance security       Tim    
Propose solution for securing DAS-Instance communication and get it approved       Tim    
Prelim implementation for securing DAS-Instance communication       Tim    

Demo

Command replication demo for MS1

Documentation

Dev Tests

References / Links

GlassFish Server Open Source Edition 3.1 Plan
GlassFish Server Open Source Edition 3.1 Clustering design Spec

Email Alias (Where Discussions are Taking Place)