GlassFish Wiki : V3.1ClusteringInfraOnePagerReview

Document: V3.1ClusteringInfraOnePager

Reviewers: Mahesh Kannan, Roberto Chinnici, Joe Fialli

Review date: June 02, 2010

Response date: June 07, 2010

Item	Section	Comment	Response
MK-1	2.2	Are there any risks due to the absence of node agent?	The main risk is the lack of compatibility that results from this. I'll add this to the document. - DONE
MK-2	4.3	Do we have create-node-agent in V3.1?	No. The one-pager and the design spec are being updated to reflect this. - DONE
MK-3	4.5.2	What exactly does das.properties contain?	It is a Java properties file. I added an example of it's content to the document. - DONE
MK-3	4.5.3	Can you list the commands that are not supported in V3.1? Also, it will be great if you can document if there are alternatives that the user must / can do. For example, administrators are expected to use rc scripts to monitor health of instances (because node agent is not present in v3.1)	Will add a list of unsupported commands to the out-of-scope section. - DONE
MK-4	4.13	This section needs to list the dependencies.	Whoops - will fill that in. - DONE
MK-5	GEN	The design doc describes how time stamp of directories and files are used to perform sync between DAS and nodes. Does this mean that all instances in the cluster are assumed to be in time sync?	No. When a file is synchronized, the modification time for the file on the DAS is set on the instance. When checking to see if a file need to be updated, the algorithm checks if the modification time is different (either before or after), and if it is, it synchronizes the file again.
MK-6	DESIGN DOC	create-local-instance take a bunch of node-agent related options. Are these still required in v3.1?	Yes, the following node agent-related properties can be passed to create-local-instance: --agentport --agentproperties --nodeagent --agentdir. The node agent directory structure is created automatically by create-local-instance if it is not there.
MK-7	DESIGN DOC	Deployment in clustered environment: The diagram seem to suggest that the hidden command will be run on each server instance. What happens if there is a failure in one of the servers? Does the entire deployment fails? Or the deployer has to manually redeploy the application on that one node?	Although discussed in the design doc, application deployment is actually out of scope for this one pager because it is being handled by another sub-project. The design spec is shared among several projects. My understanding of the plan generally for replicated command, including deploy, is that if there is a failure on any instance, then the command will be rolled back on all instances.
RC-1	4.10.3	Maybe it's a meta-issue, but I looked at the linked page for the project on upgrading to 3.1 and there is little information there about any of the clustering-specific issues in upgrading. I'd like to make sure that information doesn't fall through the cracks between the various one-pagers.	Agreed. This is an area that needs more work. We've been in contact with Bobby Bisset about upgrade issues and have submitted several issues to the upgrade dashboard that is referenced in the spec. This coordination will continue throughout the implementation.
RC-2	DESIGN DOC	"Synchronization criteria" section: what's the rationale for treating lib and docroot directories under a config-specific directory differently from those under config? The same arguments used for the other case would apply here (libs can be large, docroot can contain many files), so I expected the two cases to be handled the same way. Another argument for that is that the distinction between the main config section and the per-server/cluster one is in principle invisible to clients.	(From Bill) The docroot directory may contain a large number of small files. The lib directory may contain a small number of large files. The config-specific directory may contain both or either, which makes it hard to tell. We could handle "lib" and "docroot" subdirectories differently, but since the use of this directory, and its contents, are entirely unspecified and at best by convention, it's hard to know what exactly to do here. The goal is to avoid wasting time checking lots of files that probably didn't change, so the different behavior is based on the expected number of files, not the size of the files.
RC-3	DESIGN DOC	"When to do startup synchronization?" section: while I agree that it might take more work to implement, I'd think that doing sync-on-startup in the server itself would be less fragile than doing it in the client, in that the server, however started, will know how to get itself back in sync with the DAS. I'm a little wary about having asadmin join the group with GMS, as it would introduce the notion of a non-server member of the group. Maybe it's possible to use an in-between approach, where asadmin synchronized critical info and then lets the server do the rest.	(From Bill) We don't plan to have asadmin join the GMS group. I've got a more complete design for this that I plan to send to the admin alias soon. It's actually much harder to do in the server after the server is fully started. You effectively have to "diff" the old and new domain.xml and figure out how to apply the changes to a running system. Doing it in the server before the server itself reads domain.xml is easier, but doesn't seem to have any obvious advantages over doing it in the code that starts the server.
RC-4	DESIGN DOC	"Communication (Transport) Layer" section: this business of retrieving a representation of a resource if not changed smells like REST (HTTP) to me. Issues like caching, compression, invalidation, etc. have pretty much standard answers there. I was wondering if the authors had spent some time looking at these issues while wearing a REST hat.	(From Bill) I think the difference here is that we're batching multiple resources in a single request, as well as returning resources that you didn't even know to ask for. Possibly this could be cast in a more RESTful style, but given the limited use that doesn't seem important.
RC-5	DESIGN DOC	"Server Software Upgrade Implications" section: re the "need to carefully manage compatibility of the synchronization protocol and the config files", while reading the earlier sections I was expecting explicit version information to be included in both the protocol and the config files, since managing backward and forward compatibility without explicit version information can get tricky.	(From Bill) This issue probably hasn't been given enough consideration. Designing a protocol to handle this sort of evolution gracefully is hard, and it's not clear that adding version numbers is either necessary or sufficient.
JF-1	DESIGN DOC	Under "When to do startup synchronization?" there is a question "Where does GMS get its configuration information?" It gets it from domain.xml. In fact, shoal-gms.jar can not be loaded if there is not a cluster in the domain.xml that has gms-enabled set to true. See GMS configuration design doc for more info on GMS configuration in domain.xml	I've updated the document to indicate that GMS gets its configuration from the domain.xml file and added a reference to the configuration design doc for GMS.
JF-2	DESIGN DOC	this is a question about the propagation of cluster configuration info when a cluster is running. if one has 2 instances, instance1 and instance2 in a glassfish cluster clusterA and that cluster is started via "asadmin start-cluster", what happens when one dynamically adds a new instance to clusterA, namely executing "asadmin create-instance instance3" while DAS, instance1 and instance2 are already running? When will the configuration info for the newly created instance3 be introduced to in-memory config/domain.xml of running instances instance1 and instance2. When the newly created instance "instance3" is started and joins the running cluster, we are trying to gauge if config lookup of "clusterA" within the GMS "join" event of instance3 inside of already running instance1 and instance2 will see config info created for instance3 when it was created. We are trying to determine if IIOP could get its endpoint info from the cluster config. The IIOP config information is definitley for each clustered instance in domain.xml for instances created in the cluster before the members of the cluster are started.	The already running instances are updated by the command replication mechanism, i.e., the create-instance command is replicated to all running instances, and the in-memory config is updated that way.

Here are the details of the the discussion about RC's comments with Bill Shannon (WS):

 
        (RC)>>> "Synchronization criteria" section: what's the rationale for
>>> treating lib and docroot directories under a config-specific
>>> directory differently from those under config? The same arguments
>>> used for the other case would apply here (libs can be large, docroot
>>> can contain many files), so I expected the two cases to be handled
>>> the same way. Another argument for that is that the distinction
>>> between the main config section and the per-server/cluster one is in
>>> principle invisible to clients.
>>
(WS)>> The docroot directory may contain a large number of small files.
>>
>> The lib directory may contain a small number of large files.
>>
>> The config-specific directory may contain both or either, which makes
>> it hard to tell. We could handle "lib" and "docroot" subdirectories
>> differently, but since the use of this directory, and its contents,
>> are entirely unspecified and at best by convention, it's hard to know
>> what exactly to do here.
>>
>> The goal is to avoid wasting time checking lots of files that probably
>> didn't change, so the different behavior is based on the expected number
>> of files, not the size of the files.
>
(RC)> I was going by this comment in the design document: "The config-specific
> directory may commonly contain lib and docroot subdirectories, and so might
> be very large". In such a situation, users may update docroot/index.html
> without realizing it will cause all contents of the parent directory to
> be sent
> over to the client, possibly including lots of files in docroot and some
> large
> files in lib.
>
> If that is not a common use case, then it doesn't matter.

(WS)I've asked, and people don't really know how often the config-specific
directory is used.  People seem to think that it might be used for
libraries that you don't want to distribute to all machines.  Using it
for machine or cluster specific docroots was less clear.

I originally thought there was some automatic, transparent override
associated with entries in the config-specific directory, but that's
not the case.  Apparently you have to manually reconfigure the
virtual server to use a docroot located there.

Without more data on how people are actually using it, I'm not sure
what to do here.  Do you have any better suggestions?

(RC)No. We may go back to it later if we ever get more data on how it's used. 

(RC)>>> "When to do startup synchronization?" section: while I agree that it
>>> might take more work to implement, I'd think that doing
>>> sync-on-startup in the server itself would be less fragile than
>>> doing it in the client, in that the server, however started, will
>>> know how to get itself back in sync with the DAS.
>>
(WS)>> It's actually *much* harder to do in the server after the server is
>> fully started. You effectively have to "diff" the old and new domain.xml
>> and figure out how to apply the changes to a running system.
>
(RC)> I agree.
>
(WS)>> Doing it in the server before the server itself reads domain.xml is
>> easier,
>> but doesn't seem to have any obvious advantages over doing it in the code
>> that starts the server.
>
(RC)> It depends on how much you have to "grow" the asadmin command to do that.
> If it needs to grow a lot, it may be easier to do the work in the server
> itself.

(WS)One of the advantages of doing this in asadmin instead of the server
itself was that all the infrastructure that was needed to interact with
the DAS was already there.  I only had to add one relatively small new
local command (and improve the infrastructure in a few ways).

(RC)> I'd also imagine that modularity would help get the server to enter a
> kind of
> "update mode" at startup, before going fully operational (reading
> domain.xml,
> etc.), but in reality I don't know how deeply wired domain.xml is in the
> current
> code.

(WS)It's not impossible to insert new code that gets run before domain.xml
is read.  The real issue was the lack of supporting infrastructure for
talking to the DAS.

I've since had to extract much of that infrastructure and make it
available for use in the DAS, so the DAS can talk to an instance.
So, the choice may not be as clear-cut now, but the current approach
seems to be working well.

(RC)>> > I'm a little wary
>>> about having asadmin join the group with GMS, as it would introduce
>>> the notion of a non-server member of the group. Maybe it's possible
>>> to use an in-between approach, where asadmin synchronized critical
>>> info and then lets the server do the rest.
>>
(WS)>> We don't plan to have asadmin join the GMS group. I've got a more
>> complete
>> design for this that I plan to send to the admin alias soon.
>>
(RC)>>> "Communication (Transport) Layer" section: this business of
>>> retrieving a representation of a resource if not changed smells like
>>> REST (HTTP) to me. Issues like caching, compression, invalidation,
>>> etc. have pretty much standard answers there. I was wondering if the
>>> authors had spent some time looking at these issues while wearing a
>>> REST hat.
>>
(WS)>> I think the difference here is that we're batching multiple resources
>> in a single request, as well as returning resources that you didn't
>> even know to ask for.
>>
>> Possibly this could be cast in a more RESTful style, but given the
>> limited
>> use that doesn't seem important.
>
(RC)> It depends on how much of REST you'll be reinventing... ;-)

(WS)Not much.  Right now it leverages the entire admin command infrastructure,
and just adds a command to synchronize certain sets of files.  It's
pretty special-purpose, which seems fine since the only client we envision
is GlassFish itself.

(RC)>>> "Server Software Upgrade Implications" section: re the "need to
>>> carefully manage compatibility of the synchronization protocol and
>>> the config files", while reading the earlier sections I was
>>> expecting explicit version information to be included in both the
>>> protocol and the config files, since managing backward and forward
>>> compatibility without explicit version information can get tricky.
>>
(WS)>> This issue probably hasn't been given enough consideration.
>>
>> Designing a protocol to handle this sort of evolution gracefully is
>> *hard*,
>> and it's not clear that adding version numbers is either necessary or
>> sufficient.
>
(RC)> That's true.
>
> I find it helps to know what version of the protocol each side is using,
> so that negotiation is explicit.

(WS)I *do* need to spend more time on this.  We might be able to at least
detect when the client and server are not compatible, but I doubt that
we're going to spend any effort making old clients work with new servers
or vice versa.