Description

a.install 8.2ee in /opt/SUNWappserver

b. start domain1 and default node-agent "mynodeagent". Create few stand alone instances, cluster and some clustered instances.

c. stop node-agent and domain.

d. install 9.1ee in /opt/SUNWappserver

e. the installer asks for 3 components (to install) - nodeagent/loadbalancer/high availability (since this is an inplace upgrade). Atleast one of the components should be chosen. So i choose node-agent. The installation proceeds and fails at the end because of node-agent creation failure. (since it is already existent).

f. Invoke asupgrade from the bin directory of /opt/SUNWappserver.

g. During an upgrade, the domain is started up and the cluster, stand alone instances and clustered instances are created in the default node-agent "mynodeagent", that has already rendezvoused with the DAS. The entries for these clusters/stand-alone instances are populated in /opt/SUNWappserver/domains/domain1/config/domain.xml.

h. After a successful upgrade, the domain is started up fine.

i. While starting up the node-agent, the synchronization fails. The config context is not refreshed because of Exception in initializing SunPKCS11.

j. Because of this, the older domain.xml (8.2ee) remains in the nodeagent config directory and the startup will fail with some missing attributes like "heartbeat-port" in the <cluster> element.

Why does the synchronization fail here? Any ideas?

Evaluation

When the node-agent tries to synchronize with the DAS, the domain.xml in the nodeagent/config directory is parsed. When the parsing is done, the old domain.xml (8.2ee) exists in this directory and it does not match the sun-domain_1_3.dtd. Hence the exceptions related to the missing attributes are thrown. The synchronization fails because of this and the nde-agent does not start up.

However, when the domain.xml is copied over from domains/domain1/config directory to the nodeagents/mynodeagent/agent/config directory, the node-agent start up goes through fine.

According to Nandini, eventhough we copy the domain.xml from DAS config to node-agent config, the node-agent should not start up, since the domain was created afresh by the upgrade process and the domain's cert would be different.

This is still under investigation.

-Shalini.

The node-agent synchronization would not be a problem in the SBS Upgrade scenario. In the inplace upgrade, the node-agent would have synchronized (while creation of clusters and instances) in the 8.2ee installation before an upgrade to 9.1ee. During the upgrade, the domain.xml in the node-agent config directory is not modified. Upgrade tool assumes that the old domain.xml should work seamlessly with the new runtime. ie., 8.2ee domain.xml should work with the 9.1ee runtime binaries.

After upgrade process, during the synchronization process, the domain.xml in 8.2ee node-agent config directory is parsed first. But since this will be the 8.2ee domain.xml, the parsing fails due to the fact that it is incompatible with the new runtime. This is the issue.

-Shalini.

Update on 10/1/2007

Keywords: Inplace Upgrade, Synchronization, domain.xml

Shalini : The node-agent synchronization would be a problem in an inplace upgrade from 8.2ee to 9.1ee because the domain.xml in node-agent/config directory is parsed before synchronization happens. Since the domain.xml would refer to an older dtd in case of 8.x->9.x upgrade, the synchronization would fail.

Changing the rendezvous status of the node-agent(in das.properties, nodeagent.properties and domain.xml) before starting it up does not help it much since the failure is at the synchronization step.

The node-agent would have synchronized (while creation of clusters and instances) in the 8.2ee installation before an upgrade to 9.1ee. During the upgrade, the domain.xml in the node-agent config directory is not modified. Upgrade tool assumes that the old domain.xml should work seamlessly with the new runtime. ie., 8.2ee domain.xml should work with the 9.1ee runtime binaries.

After upgrade process, during the synchronization process, the domain.xml in 8.2ee node-agent config directory is parsed first. But since this will be the 8.2ee domain.xml, the parsing fails due to the fact that it is incompatible with the new runtime. This is the issue.

The transforms will be applied on the domain.xml in the domain1/config directory. But not in the domain.xml in the node-agent config directory. This has to be achieved via synchronization.

Nandini : This is an issue... Basically you have to force NA to sync for this special case disregarding the DAS trust-check part. Is there something like a CLI option for upgrade indicating the NA that it is being started post inplace upgrade (I doubt it)... Couple of hacks could be to use that option and insert the special logic or perhaps take NA to unbound state. Let me take a look at it and get back to you should we decide to take this approach.

Before that : Has it been decided that we live with the domain.xml incompatibility?

Prasad : I think there has been no decision on this, but at the same time Sreedhar has not been told that this might need to change. We might need to look at this option too.

Shalini : Upgrade tool will take care of domain.xml
incompatibilities within the domain configuration. But when it comes to
node-agent, it does not modify the domain.xml in the agent config. The
domain.xml that is in node-agent config should get synchronized after an
upgrade, to get the latest domain.xml that has been upgraded.

Hence, this is not an upgrade issue, since upgrade tool expects the
synchronization to happen by itself. It cannot take care of the
node-agent config incompatibilities with the upgraded domain.

Nandini : NA upgraded bits will not complain
to start with if the domain config is backward compatible.

Shalini : That is exactly the problem. The 9.1ee domain is not backward
compatible. Till it is backward compatible, the upgrade tool procedure is not going to be successful, whatever approach we follow to upgrade the domain configuration. Kedar suggested (1) for an upgrade process.

(1) Copy the 8.2ee domain config over to the 9.1ee domain config and add the required cluster attributes and other elements, so that the domain can start up successfully after an upgrade.

Even in this case, the domain.xml that is in the node-agent config directory is going to remain as 8.2ee's domain.xml. The DTD of the upgraded domain would have changed to sun-domain_1.3.dtd whereas the dtd of domain in node-agent config would be sun-domain_1.1.dtd. The node-agent config cannot be modified by the upgrade tool, since it expects the domain config to be backward compatible or else the
synchronization should go through without the domain validation in process. If either of these is true, then the upgraded domain will be synched with the node-agent config and the upgrade process will be a
success. Else the node-agent is never going to start up because the synchronization will fail.

From all these, it can be made clear that the node-agent config WILL NOT be modified by the upgrade tool It expects the node-agent to start up without exceptions after an upgrade, synching up the instances.

The problem will happen with the copy approach or create approach - whichever is taken by the upgrade procedure.

Nandini : Are we not going to maintain compatibility of domain.xml across
8.2 and 9.x?. Prasad had replied no decision has been made and that option can be looked at.

I suggest we explore this route. I believe Kedar was also inclined towards this solution. From the data we have about incompatibility, it does not look difficult to make it compatible.

So, If domain.xml is made compatible(it is not presently; we know that), issue will be resolved. So let's try to fix that.

Prasad : I guess that this means that we go and make the #REQUIRED attributes as #IMPLIED right ?

Shalini : Does this mean it is okay if the upgraded domain.xml has the cluster
attributes IMPLIED? If so, the domain will not start up after an upgrade. The doctype will be sun-domain_1.3.dtd in the upgraded domain and it says the "heartbeat-enabled", "heartbeat-port"and
"heartbeat-attribute" are REQUIRED.

Also i would like to know here, if "compatible" means "forward compatible" or "backward compatible"? The domain.xml from 8.2ee can be made 9.1ee compliant by adding these REQUIRED attributes that were
introduced in 9.1. A 9.1 domain cannot be made an 8.2, because the doctype changes in 9.1. Till the attributes are made IMPLIED in the 1.3 dtd, this is going to be a problem with node-agent synchronization.

Kedar : There could be 2 workarounds for this issue.

1. Modify the sun-domain_1.3.dtd to make the attributes : heartbeat-enabled, heartbeat-port and heartbeat-address IMPLIED. Bundle this in the appserv-rt.jar and test out the node-agent startup. If the node-agent startup goes through fine, then we can conclude that these are the only 3 hurdles that prevent the synchronzation.

2. To make the validation work, we should modify the node-agent domain.xml. In short, the following would have to be done.

  • When asadmin start-domain is invoked, an inline upgrade would be started up, if an old domain is found in the domains directory.The upgrade would go through fine and domain started up after that.

On Similar lines,

  • When asadmin start-node-agent is started, a DTD check has to be made and if the DTD is an older one, then the domain.xml validation might not go through fine. So the domain.xml has to be modified by some kind of upgrade process that will add the REQUIRED attributes so that validation goes through fine. After the validation, the latest domain.xml will be synchronized from the central repository.

Back