GlassFish Wiki : 3.1 Clustering Infrastructure and Monitoring Sustaining TOI

Clustering Infrastructure and Monitoring Sustaining TOI

This is a page to capture information for the Clustering Infrastructure and Monitoring TOI for Sustaining.

Schedule for all TOIs
Sign-off copy of GlassFish 3.1 docs

The clustering infrastructure feature consists of the command line interfaces (CLIs) for creating and managing clusters. The commands include:

Although not part of this section, these commands also depend on the node configuration and SSH feature.

These commands allow the user to create a cluster and clustered instances, or stand-alone instances.

There is a new synchronization algorithm that copies the data from the DAS to the instances. Details about this algorithm are in the design specification.
Once a cluster is running, commands that change the cluster configuration trigger a command replication algorithm that results in dynamic reconfiguration of the cluster.
The dynamic reconfiguration algorithm maintains state information about each instance in a .instancestate file.

To upgrade a GlassFish 2.1 installation that uses clusters, the upgrade logic converts node-agents to nodes. After upgrading the DAS, the instances
are recreated using the create-local-instance command.

Information about the monitoring feature in GlassFish 3.1 can be found here:

Monitoring reference

Clustering Infrastructure Implementation

Developer Name(s}: Byron Nevins, Jennifer Chou, Bhakti Mheta, Bill Shannon, Tom Mueller, Vijay Ramachandran, Tim Quinn, Sheetal Vartaket, Jerome Dochez, et. al.

Description

This feature allows you to create clusters and clustered and stand-alone instances, start and stop clusters and instances, and manage configurations. This feature is based on similar functionality from GlassFish 2.1. The main differences from 2.1 are:

There are no longer any node-agents. Centralized management of instances is accomplished using SSH.
Dynamic reconfiguration is accomplish through command replication rather than data replication. Whenever an asadmin command is executed on the DAS, it's execution is replicated on the required instances, bringing about the reconfiguration change that is needed for that instance.

Feature Details

The feature design specification describes the following asadmin subcommands:

create/delete/list-cluster
create/delete/list-instance
create/delete-local-instance
copy/delete/list-config
start/stop-instance
start/stop-local-instance
restart-instance
restart-local-instance
start/stop-cluster
export-sync-bundle/import-sync-bundle

In addition, there are several internal "hidden" commands:

_bootstrap-secure-admin
_create-config
_post-register-instance
_post-unregister-instance
_register-instance
_register-instance-at-instance
_restart-instance
_stop-instance
_synchronize-files
_unregister-instance

There are several new annotations to support dynamic replication:

@ExecuteOn - indicates which type of a node a command should be executed on.

@TargetType - indicates the valid types of targets for the --target option to a command.

Documentation

Engineering Documents

Product Documentation

Documentation Project (contains pointers to all documents that relate to clustering)
High Availability Administration Guide , Chapters 4, 5, and 6
Administration Guide , Chapter 3
Reference Manual , see manual pages for commands listed above

Demos

Cluster Infrastructure Demo
Cluster Scalability Demo (internal link)
GlassFish 3.1 Clustering, High Availability and Centralized Administration screencast by Arun Gupta (shows more than just clustering infrastructure

Source code information

v3/cluster - main module for clustering implementation
- v3/cluster/admin (cluster-admin.jar) - remote commands that run within the server and code for command replication
- v3/cluster/cli (cluster-cli.jar) - local commands that run within asadmin
- v3/cluster/common (cluster-common.jar) - shared code for the clustering modules
v3/admin/config-api (config-api.jar) - config beans for domain.xml elements such as cluster, config, server, node, etc.

Design information

See the design specification for design details, especially the synchronization algorithm and the design for command replication.

Restart Info

All servers can be restarted remotely and locally. This was rather hairy to implement just right. Here are some highlights:

The server is restarted precisely the way it was started. E.g. if it was started with "java -jar" then that's nhow it is restarted. We bend over backwards to support any weird way you want to start the server. All commandline args, classname, classpath etc. are remembered and reused. In addition you can change the debug flag when restarting.
--verbose In this case the server owns a window. The trick is to keep owning that window. very difficult to do correctly. What happens:
- asadmin is sitting there with a handle to the running server. When the server is restarted it will exit with a special return value (10). asadmin notices this and simply redoes exactly what it did to start the server in the first place. The asadmin JVM remains alive which is the only way we can keep the window connected. Try it like so:

*# asadmin start-domain --verbose
1. asadmin restart-domain
non-verbose. In this case we do the following:
- At the end of the stop sequence, just before the JVM dies we fire off a new JVM that will start the reincarnated server. The new JVM doesn't just start. That would have guaranteed a ton of race-condition bugs. The new JVM sits there and reads the stdout. This blocks until the stream closes at which time the server is completely dead.

Defects

JIRA Sub-component name: admin (there isn't a separate category for clustering)
See the Admin Iteam Dashboard for admin related issue information

Debugging

NetBeans works very well for setting breakpoints and stepping through the clustering code.

To see how command replication is working, see the following key classes:

CommandRunnerImpl (this implement the remote command processing)
GlassfishClusterExecutor (this figures out where the command needs to be executed)
ClusterOperationUtil (this actually executes the command remotely)

The logger to set to see admin log messages, including those for clustering, is:

javax.enterprise.system.tools.admin

To see debugging information for command execution, set the AS_DEBUG environment variable to "true".

set AS_LOGFILE=somefilename and every CLI command will be automatically logged to this file. Very handy for debugging, especially when you get tons of irrelevant information from a customer. This will cut to the chase. Set it once an forget it.

set AS_SUPER_DEBUG=true and you can attach a debugger to a restarting server. Without this - it would be difficult to debug.

Startup problems: The logging service starts long after the JVM fires up. You will NOT SEE most or all error messages that are generated before the logging service starts. You will definitely not see any fatal JVM errors. E.g. add a bad JVM option and you will not have a clue what happened.
UNLESS you start the server with the --verbose flag. Then you are guaranteed to see everything. Remember this!

Tests

admin devtests information page
hudson jobs for clustering-related dev tests
- admin-devtests-trunk
- cluster-devtests-v3.1