GlassFish Server Open Source Edition 3.1 - Dynamic Reconfiguration This wiki page / document details how dynamic reconfiguration will work when cluster support is added to GlassFish 3.1.
- Jerome Dochez (dochez@dev.java.net)
- Vijay Ramachandran (vijaysr@dev.java.net)
Scope of the Project Dynamic reconfiguration feature for clustering in GlassFish Server Open Source Edition 3.1 refers to the way CLI commands (for example, asadmin create-jdbc-resource some-options --target some-target) from user to DAS gets reflected in the server instance (if the user specified target is a server instance) or all server instances that are part of the cluster (if user specified target is a cluster).
Feature ID |
Priority |
Description |
Eng Response |
Owner(s) |
Estimate (Man Days) |
Source of Requirement |
Status / Comments |
DYREC-001 |
P1 |
Provide infrastructure to apply configuration changes dynamically (--target option in CLI) across a cluster |
Yes |
Vijay, Jerome, Sheetal |
|
Feature parity |
Started, Issue 12030 |
DYREC-002 |
P1 |
Show 'restart required' status for each server instance. If a server instance needs to be re-started, this status will reflect that |
Yes |
Vijay |
|
Feature parity |
This status should survive across DAS re-start until the server instance is synchronized with DAS; if DAS is unavailable during re-start synchronization will be skipped, Issue 12034 |
DYREC-003 |
P1 |
Support for dynamic-reconfiguration-enabled=false flag. This is needed for rolling upgrade |
Yes |
Vijay |
|
Feature parity |
Issue 12035 |
DYREC-004 |
P1 |
Clearly show status at the end of each operations. For all server instances where dynamic reconfiguration failed, please identify the server instances and suggest next steps to users to recover from this error (example, restart the server instances) |
Yes |
Vijay |
|
Feature parity |
Issue 12031 |
DYREC-005 |
See issue |
Provide support (status, --target) for infrastructure related CLIs |
Yes |
Vijay |
|
Feature parity |
list-* commands show the status such as running, restart required, etc. We need to add that support, Issue 12038 |
DYREC-006 |
See issue |
Show a list of changes that were not dynamic and/or failed for some reason during dynamic re-config. This list will help system administrators to determine why a server instance should be re-started |
Yes |
Vijay |
|
|
Issue 12037 |
DYREC-008 |
See issue |
Provide detailed diagnostics to debug dynamic reconfiguration related failures in customer environment |
Yes |
Vijay |
|
Feature parity |
Issue 12036 , Issue 12039 |
High Level Overview of the Features To be able to effect the user requested changes (like creating resources as mentioned above) on a server instance / or all instances of a cluster, the following steps have to happen.
- Execute the command on DAS; ensure that the command goes through; If the command succeeds on DAS, then
- Replicate the same command on the target server instance or all server instances that are part of a specified cluster; report progress status back to client
- Ensure that all replicated commands go through fine. If there are errors, report back errors to user while marking instances where the command failed for restart; Handle all error scenarios; Differentiate complete failure of a command (which may trigger a rollback) from a partial failure (the command failed in a small subset of target only which may not require a rollback); In failure scenarios, ensure that information on what failed is clearly available
- Ensure that, when commands are replicated successfully on all instances, subsequent restart of any instance does not result in unnecessary synchronization (one scenario is the mismatch of timestamps of domain.xml between DAS and instance)
- Take care of supplemental commands (these are commands that cannot be replicated just as they were done in DAS. For example creation of references in instances will be done as a supplemental command to the deploy command - click here for more information on this example).
- From a user perspective, the behavior from CLI/console should remain the same except for the need to specify the target cluster / instance for the command. The CLI/console will continue to talk to DAS (just as in GlassFish v3) and DAS will take decision on whether to replicate the command or not (depending on the command, the RuntimeType specified etc.) The command replication infrastructure will ensure that the instances obey only the commands from DAS and no direct commands to instances from anywhere else will be obeyed.
The first step listed above is already happening in GlassFish 3.0 and it is common whether we are in a clustered environment or a stand alone environment. Since the rest of the steps listed above are essentially kicked off after step #1 is completed successfully, rest of this document will only go into the details of the step 2 onwards. Implementation details TBD : API reference below is internal javadoc - got to point it to an external javadoc - how? A new set of annotations and interfaces have been added to GlassFish 3.1 towards implementation of the dynamic reconfiguration feature. The implementation approach is as follows : Hooking into GlassFish 3.0 This section gives a high level view of how new code for dynamic reconfiguration will hook into / affect existing (GlassFish 3.0) command execution code on the server side.
- Every CLI command should annotate itself with @ExecuteOn. It should also specify the RuntimeType of this command as part of @ExecuteOn. If the command is expected to be executed only on DAS, RuntimeType should be DAS. If the command has to executed on DAS and replicated on applicable instances, RuntimeType for @ExecuteOn of this command should be set to DAS and INSTANCE.
- If a command does not have @ExecuteOn, then the implementation will consider that command to have RuntimeType set to DAS and INSTANCE. This way all existing non-local commands in GlassFish 3.0 will not have to be changed and will be executed on DAS and all applicable server instances.
- The RemoteCommand facility, currently used for communication between user/cli and DAS, will be used (after some refactoring) for communication between DAS and instances for command replication.
- The GlassfishClusterExecutor will plugged into the current server-side command execution as follows :
- Currently, a non-local CLI command reaches the server through the AdminAdapter and, after authentication and other checks, the command flows into CommanRunnerImpl.doCommand where parameters are injected and the actual command is executed.
- Once the command is executed and is successful, instead of returning and if the server environment is RunType.DAS, we will give control to the ClusterExecutor which will take over control. Details of ClusterExecutor are present in the next section.
Command Replication using ClusterExecutor
- GlassFish 3.1 will have an implementation of ClusterExecutor interface, the GlassfishClusterExecutor. This will be the default ClusterExecutor implementation that will be used if @ExecuteOn is not specified with a ClusterExecutor type. (Which in turn means, GlassFishClusterExecutor will be used for doing command replication for all GlassFish3.0 remote commands).
- As detailed in the previous section, control comes to the ClusterExecutor only for DAS - not on the instances.
- It will be the job of the cluster executor to
- decode the target type and extract all instances on which the command has to be replicated
- change the parameters (if required)
- Add DAS timestamp as part of the parameters and ensure that instance has updated timestamp of its domain.xml
- send the command across to all instances
- collect progress status / result from instances to which command was sent
- Take care of errors (more details on this in the next section)
- report success / error to the user
- Upon successful completion, execute supplemental commands
- In case of failures, set restart required state as appropriate
- In case of failures, if the command is an UndoableCommand, try to undo the command
Supplemental commands Supplemental commands are commands that are specified by the user to be executed upon successful completion of a CLI command.For example, deployment in a cluster will be using supplemental command to deploy the bits in various instances as explained here. The GlassfishClusterExecutor will look up the habitat to see if supplemental commands have been registered for the current CLI command.
- If supplemental commands are specified for execution before CLI execution (by using Supplemental.Before, then the supplemental command will be executed on all nodes and the primary CLI command will be executed if and only if the supplemental command's FailurePolicy is FailurePolicy.Ignore or FailurePolicy.Warn.
- The behaviour of supplemental commands that are specified to be executed after that primary CLI command (and the effect of supplemental command execution on overall command replication result) is detailed in the table in the next section.
Undoable commands Undoable commands give the command implementor a way to rollback a failed command execution. The table in the next section details the behaviour of command replication if the command implementor chooses to implement the UndoableCommand interface. Command replication results and action taken The table below gives details on the action taken upon completion of command replication for various scenarios
Result of Command Replication |
Is command undoable ? |
Supplemental commands present ? |
Overall result |
Action taken |
Failure on DAS |
Does not matter |
Does not matter |
Failure |
|
Success on all instances |
Does not matter |
None |
Success |
Report success |
Success on all instances |
Does not matter |
Supplemental.After |
Result of supplemental command(s) execution and its FailurePolicy |
Merged report of primary and supplemental command execution |
Failure on all instances |
Does not matter |
Does not matter |
Report failure |
Try to make error report as specific as possible (network error, instances down etc) |
Failure on one or more instances |
No |
No |
Failure |
Report reason for each command failure; Set server-restart |
Failure on one or more instances |
No |
Supplemental.After |
Failure |
Execute Supplemental command only on the instances where command succeeded; Report reason for each command failure; Set server-restart |
Failure on one or more instances |
Yes |
No |
Failure |
Try to undo command on instances where it succeeded; Report reason for each command failure; If undo fails, report that also |
Failure on one or more instances |
Yes |
Supplemental.After |
Failure |
Do not execute Supplemental command anywhere; Try to undo command on instances where it succeeded; Report reason for each command failure; If undo fails, report that also |
Securing DAS-Instance communcations TBD : How DAS will authenticate itself with instance while doing command replication; Do we use SSL by default for communicating with instances or the way admin adapter is configured; Currently investigation various possibilities Milestone Schedule Note: for deliverables that have a link to an issue i the status column, see IssueTracker for the status.
Item # |
Date/Milestone |
Feature ID |
Description |
QA/Docs Handover |
Status |
Comments |
01. |
MS1 |
N/A |
Detailed investigation and impl spec |
No |
Completed |
Impl details ready; Review completed |
02. |
MS1 |
DYREC-001 |
Provide infrastructure to apply configuration changes dynamically (--target option in CLI) across a cluster |
No |
Completed |
Will enable basic command replication; Completes 40-50% of DYREC-001; Partial support for Supplemental commands to enable deployment team make progress; Will not support features like undoable commands etc |
03. |
MS1 |
DYREC-005 |
Provide --target support for infrastructure related CLIs |
No |
Completed |
Will enable basic command replication; We will select 2 commands (create-iiop-listener, createJDBCConnectionPool) and show it getting replicated end to end |
04. |
MS1 |
DYREC-004 |
Clearly show status at the end of each operations. For all server instances where dynamic reconfiguration failed, please identify the server instances and suggest next steps to users to recover from this error (example, restart the server instances) |
No |
Completed |
Will have some amount of consolidated results; Completes 30-40% of DYREC-004 |
05. |
MS2 |
DYREC-001 |
Provide infrastructure to apply configuration changes dynamically (--target option in CLI) across a cluster |
Yes |
Issue 12030 |
QA will be able test proper command replication with all scenarios handled for createJDBCResource and createJDBCConnectionPool commands |
06. |
MS2 |
DYREC-004 |
Clearly show status at the end of each operations. For all server instances where dynamic reconfiguration failed, please identify the server instances and suggest next steps to users to recover from this error (example, restart the server instances) |
yes |
Issue 12031 |
|
08. |
See issue |
DYREC-002 |
Show 'restart required' status for each server instance. If a server instance needs to be re-started, this status will reflect that |
Yes |
Issue 12034 |
|
09. |
MS3 |
DYREC-003 |
Support for dynamic-reconfiguration-enabled=false flag. This is needed for rolling upgrade |
Yes |
Issue 12035 |
|
10. |
MS3 |
DYREC-008 |
Provide detailed diagnostics to debug dynamic reconfiguration related failures in customer environment |
No |
Issue 12036 |
Provide proposal |
12. |
See issue |
DYREC-006 |
Show a list of changes that were not dynamic and/or failed for some reason during dynamic re-config. This list will help system administrators to determine why a server instance should be re-started |
Yes |
Issue 12037 |
|
13. |
See issue |
DYREC-005 |
Provide --target support for infrastructure related CLIs |
Yes |
Issue 12038 |
Complete test and verification for all listed infrastructure commands |
14. |
See issue |
DYREC-008 |
Provide detailed diagnostics to debug dynamic reconfiguration related failures in customer environment |
Yes |
Issue 12039 |
Complete Implementation |
Task List
Task |
Target Milestone |
Start |
End Date |
Owner(s) |
Feature ID |
Status / Comments |
Refactor existing CLI-DAS communication mechanism for use in DAS-instance communication |
Milestone 1 (5/24) |
4/19/10 |
4/23/10 |
Bill shannon |
|
Completed |
Get basic communication working between DAS and instances |
Milestone 1 (5/24) |
4/26/10 |
4/30/10 |
Vijay |
|
Completed; Works with everything hardcoded |
Send a command to remote instances and get response back |
Milestone 1 (5/24) |
5/3/10 |
5/10/10 |
Vijay |
|
Completed |
Remove hardcoded instance values and get command replication working with domain lookup |
Milestone 1 (5/24) |
5/11/10 |
5/17/10 |
Vijay |
|
Completed |
Collect response status from all instances and report back to CLI |
Milestone 1 (5/24) |
5/18/10 |
5/21/10 |
Vijay |
|
Completed |
Complete command replication infrastructure and error reporting |
Milestone 2 |
|
|
Vijay |
|
|
Design persistent way to report server-restart |
Milestone 3 |
|
|
Vijay |
|
|
Complete support for restart server required status |
Milestone 3 |
|
|
Vijay |
|
|
Support for dynamic-reconfig-enabled flag |
Milestone 3 |
|
|
Vijay |
|
|
Support a queue for commands |
Milestone 4 |
|
|
Vijay |
|
|
Domain.xml timestamp updates in instances |
Milestone 4 |
|
|
Vijay |
|
|
Test infrastructure, iiop commands |
Milestone 4 |
|
|
Vijay |
|
|
Take care of failure recovery cases |
Milestone 4 |
|
|
Vijay |
|
|
Study authentication and security requirements for DAS-Instance communication |
|
|
|
Tim |
|
|
Complete implementation of DAS-Instance security |
|
|
|
Tim |
|
|
Propose solution for securing DAS-Instance communication and get it approved |
|
|
|
Tim |
|
|
Prelim implementation for securing DAS-Instance communication |
|
|
|
Tim |
|
|
Demo Command replication demo for MS1 GlassFish Server Open Source Edition 3.1 Plan GlassFish Server Open Source Edition 3.1 Clustering design Spec Email Alias (Where Discussions are Taking Place)
|