GlassFish Wiki : 3.1ClusteringSecurity

GlassFish Server Open Source Edition 3.1 Clustering Security Specification

This page describes how security related artificats are managed in a GlassFish V3.1 Cluster using the framework defined by the Clustering Design Spec.

Kumar Jayanti
Nithya Subramanian

GlassFish Server Open Source Edition 3.1 Clustering Security Specification
1. Introduction
2. Dynamic Re-Configuration
3. Data Synchronization

3.1 Synchronization of Security Related Server Configuration

3.1.1 Handling Changes to the Configuration

3.2 Synchronization of Application Policy Files generated by the JACC Provider(s)

3.2.1 Policy Providers
3.2.2 Two Approaches being discussed
3.2.3 Current Code for PolicyGeneration in V3 : Single Instance
3.2.4 What are the implications of this ?.
3.2.5 Handling Undeploy
3.2.6. Handling Disable and Enable
3.2.7 Conclusion

4. LoadBalancing and Failover of Secure IIOP Messages
5. Verify the alternate In-Memory JACC Provider works properly in Clustered Mode
6. New CLI Commands

1. Introduction

There are the following primary things that need to tracked in the overall clustering design spec from a security module perspective. Some of the other critical aspects of a clustered runtime such as Session Replication and High Availability, HTTP-Loadbalancing for secure requests are handled by the WebTier and LB Modules, see Clustering Availability. The Load balancing and FailOver of Secure IIOP messages is managed by the IIOP team, but it is expected that the security module may have to restructure some of its CSIv2 handling code (more details on this below).

Dynamic Re-Configuration support in the form of --target option for Security related CLI commands
Data Synchronization
- Synchronization of Security Related Server Configuration
- Synchroniztion of Application Policy Files generated by the "default" JACC Provider
LoadBalancing and Failover of Secure IIOP messages
Verify the alternate In-Memory JACC Provider works properly in Clustered Mode

2. Dynamic Re-Configuration

Dynamic reconfiguration deals with how CLI commands (for example, asadmin create-file-user some-options --target some-target) from user to DAS gets reflected in the server instance (if the user specified target is a server instance) or all server instances that are part of the cluster (if user specified target is a cluster).

The --target option would be supported for the following security related CLI commands supported in V3.0.

change-master-password, create-audit-module, delete-audit-module, list-audit-modules, create-auth-realm, delete-auth-realm, list-auth-realms, create-file-user, delete-file-user, update-file-user, list-file-users, list-file-groups, create-message-security-provider, delete-message-security-provider, list-message-security-providers, create-password-alias, delete-password-alias, update-password-alias, list-password-aliases, change-admin-password, login.

3. Data Synchronization

The need for data sychronization is generally the consequence of deploying an application or running any of the above commands in a cluster.

3.1 Synchronization of Security Related Server Configuration

The following security related configuration elements need synchronization when running inside a cluster.

1. Changes to <security-service> element in domain.xml
2. Changes to keystore/truststore
3. Changes to keyfile(s) (admin-realm keyfile and file-realm keyfile)
4. Changes to jceks files domain-passwords
5. Changes to JAAS login.conf
6. Changes to server.policy
7. Changes to security related jvm-options

- a. javax.net.ssl.keyStore
- b. java.net.ssl.trustStore
- c. java.security.auth.login.config
- d. java.security.policy
- e. com.sun.enterprise.security.httpsOutboundKeyAlias
- f. java.security.manager

(1) will happen as a natural consequence of replicating admin commands to be executed on the server instances.

Prior to V3 there was a single JSR-115 Provider and pluggability was not addressed. Since we now support pluggability of JACC providers and two supported providers exist by default (under the security-service), there is a need for new CLI commands to create-jacc-provider, delete-jacc-provider, update-jacc-provider. So any addition or deletion or updation of jacc-provider element would require manual synchronization of security-service element or domain.xml as a whole. We do not envision this as a very important usecase though, since it is very rare that someone will change the JACC provider or add a new one.

(2) - (7) will be synchronized at server startup.

If (2) - (7) are modified as a consequence of an admin command, executing the same command on the instances should result in the same updates to the files on the instance.

If (2) - (7) are modified "manually", without going through the server, nothing is going to synchronize them. We might need a command to force synchronization of such files, and we do not plan to have such a command for V3.1 release, since the user can be asked to use the admin-cli command instead. However (5) and (6) are an exception here, there are no CLI commands that allow updation or modification of login.conf and server.policy files today. This means that we may require a general file sychronization facility, which can be used to propagate changes to these files.

3.1.1 Handling Changes to the Configuration

The next question when there is a configuration change is whether the server is prepared to handle changes to these files made by some other program without requiring a server-restart Or will we need to restart a server instance to pick up external changes in these files ?.

The Security runtime cannot handle all such updates in (1)-(7) without requiring a restart. Specifically changes to the following cannot be handled without a restart :

(1): changes to the default jacc-provider attribute on the security-service would require a restart. The ConfigListener implemented by security modules indicate that by returning a org.jvnet.hk2.config.NotProcessed from the method changed() on the ConfigListener. Changes to DefaultP2RMapping Status (activated/deactivated), and the MappedPrincipalClass would also require a restart and the ConfigListener on the instance would indicate the same.

(2): So far (V2 and V3.0) we have always required that server be restarted whenever the keystore/truststore contents change. One can also change the keystore/truststore location via the javax.net.ssl.keyStore/trustStore properties specified in jvm-args and we do not handle this as well and would require a system restart. The primary reason for not handling this seems to be that there could be existing SSL connections (In-HandShake-Phase/Already-Connected) and if we were to re-initialize the SSLContext and its associated KeyManagers and TrustManagers (thereby changing CipherSuites etc) then it may break such connections.

(3): Changes to (3) are taken care at runtime and do not require server restart

(4): Changes to (4) are taken care at runtime and do not require a server restart

(5): New additions to login.conf do not require a restart of the server. However changes to existing definitions in (5) would require a restart. Also note that there is no CLI command to update login.conf and so there is a need to define a new command for doing this.

(6): Changes to (6) take effect at the next deployment of an App when the PolicyContext of all apps is refreshed. But ideally we should restart the server since changes to server.policy could impact the permissions granted to not just application's but other resources and the server code itself. Again there is no CLI command to update server.policy and so there is a need to define a new command for doing this.

(7): Changes to (7) would require a restart of the server/cluster.

3.2 Synchronization of Application Policy Files generated by the JACC Provider(s)

3.2.1 Policy Providers

We have two policy-providers in GFV3, the first one is the provider which existed for a long time and which generates persistent policies onto the disk. The second provider which is new in GFV3 is an in-memory provider which does not generate any policies but would do policy-translation everytime the application is loaded on an instance.

So it is desirable that the approach we take for handling generated policies be Provider Agnostic.

3.2.2 Two Approaches being discussed

1. V2-Model : Synchronize the generated bits as well as the application bits and do a partial deployment on instances.
2. Alternate-Model : Only synchronize the application bits and regenerate artifacts as needed on instances, more of a full deployment on instances.

Apparently the current decision made for GlassFish V3.1 is to follow option (1), the V2-Model.

3.2.3 Current Code for PolicyGeneration in V3 : Single Instance

The code is Event Driven, based on events from Deployment system and the special event WebBundleDescriptor.AFTER_SERVLET_CONTEXT_INITIALIZED_EVENT.

The reason we are using the event model as opposed to having code inside the overridden generateArtifacts() method of the SecurityDeployer is because in JavaEE 6 a ServletContextListener can inject policy once it is initialized and so the policies cannot be committed (or written to disk in case of a persistent provider) until all the ServletContext Listeners have been initialized. A Sample (demonstrating the working of the code) for a simple .war deployment is as below :

1. Event : Deployment.MODULE_LOADED :
- a) Process web.xml constraints of the web-module and perform policy-translation for the module. The result is a PolicyConfiguration object. The state of the policyconfiguration in OPEN.
2. Event: Deployment.APPLICATION_LOADED :
- a) Link policies of all the web modules. This step can be thought of as doing some kind of consistency checking, where deployment could fail if we figure out that the Prinicipal-To-Role-Mappings are different across the modules.
3. Event: WebBundleDescriptor.AFTER_SERVLET_CONTEXT_INITIALIZED_EVENT
- a) if the policy of any web-module has changed redo policy translation for it and populate the PolicyConfiguration again.
- b) Generate the policy files for the web-module by writing them to disk (commit policies, the PolicyConfiguration is now in IN_SERVICE state)

Given the current code it appears from a discussion with deployment team that none of these events would be raised during the partial deployment on the DAS (unless the DAS itself is a target). The three events above are raised while loading the application.

3.2.4 What are the implications of this ?.

The net effect seems to be that even though we will follow the V2-Model, w.r.t Security Policies the system would behave as if we have the Alternate-Model.

a.) For a fresh deployment of a new App : No policy would be generated/committed on the DAS and hence the synchronization system need not do anything.
b.) For a restart of the cluster instances : There will be no synchronization again from the DAS required since the policy was not originally written on the DAS. The instances have a way to see that policy has already been generated and so they internalize the policy.
c.) Would the performance difference between
- 1. time-taken to regenerate the policy on each instance
- 2. Versus, time-taken to just synchronize the policy onto the instances from the DAS be significant. Especially would (1) be slower than (2). It appears the decision in V2 was primarily based on the assumption that (1) would be slower. One could think of a solution which does policy generation on instances (as opposed to the DAS) only when the web/enterprise archive has Servlet-Context-Listeners that inject policy but there is no general way to detect apriori if a particular web/enterprise archive has such things.

d.) Would the policy generation on the individual server instances result in a situation where the instances all have different policies (possibly due to different principal-to-role-mapping on the instances or becuase the ServletContextListneres that executed on the instances injected different policies).

3.2.5 Handling Undeploy

It appears the model for V3.1 is reverse of V2. In v2, when an application is undeployed, the application is first unloaded/undeployed from instances, and then undeployed from DAS. With the new 3.1 command replication framework, the decision seems to be undeploy from DAS first, and then undeploy from instances.

The SecurityDeployer and EJBDeployer will use the clean() methods being called on the DAS and the instances to handle the undeployment of generated security-policies. Listening to APPLICATION_CLEANED events is probably unsuitable here since it is too late to fail an undeployment after APPLICATION_CLEANED (It appears APPLICATION_CLEANED is not being sent at all from the code).

3.2.6. Handling Disable and Enable

It appears in V3.0.1 and current V3.1 codebase the Disable of an EJB App (unload()) does not destroy the corresponding the EJBSecurityManagers. This needs to be fixed as part of the larger design for Clustered V3.1.

The SecurityDeployer and EJBDeployer would listen to Deployment.MODULE_UNLOADED and MODULE_LOADED events on the instances to handle Disable and Enable of modules in a clustered application.

We could also just depend on unload() methods being called on the cluster instances to disable (destroy the security managers) the application and the code that exists today already makes use of MODULE_LOADED events to enable the SecurityManager's for the application.

3.2.7 Conclusion

Though the General model being followed in V3.1 for generated artifacts seems to be the V2-Model, it will not work for the generated Security Policy Files of the Applications for reasons explained above. If it turns out that Cluster Deployment Performance is an issue due to this then we need to investigate other possibilities for V3.2.

The fact that the Security Deployer in V3 uses Events would allow it to safely generate the policies on the individual instances and the whole thing should work (despite the V2-Model for the rest of the system).

There is code in EJBDeployer where we currently do Security related Policy-Translation (not generation) during the generateArtifacts() call. This needs to be changed to follow the same model as in SecurityDeployer. The EventListener should look for MODULE_LOADED event to do policy translation for the EJB modules.

In terms of what is desirable for a cluster, it appears policies should just be available in a central repository and made Highly-Available so instances can get them when needed. The servlet-context-listener in JavaEE 6 does add complexity to this, specifically can the servlet-context-listeners be executed only once on the DAS if we are using the V2-Model (This seems unlikely)?.

4. LoadBalancing and Failover of Secure IIOP Messages

The main issue is that IIOP FOLB has never worked with CSIv2 in GlassFish V1.X/V2.X. During the Milestone Planning meetings it was indicated that since EJB's are generally not Tier-1 objects so support for CSIv2 FOLB is no longer a release driver.

However the ORB team has planned on restructuring the ORB code in order to better integrate with CSIv2. Part of the restructuring would include moving some of the CSIv2 interfaces which currently lie in the Security Module into the ORB. More details on this task can be found in the IIOP FOLB OnePager for V3.1.

5. Verify the alternate In-Memory JACC Provider works properly in Clustered Mode

This is more of a testing and bug-fixing task where we try to enable the alternate "In-Memory" JACC Provider in clustered mode and ensure things are working fine.

6. New CLI Commands

The following new Security related CLI commands can be envisioned based on what has been described above.

1. A command to force synchronization of server.policy file modified manually on the DAS to the instances.
2. A command to force synchronization of login.conf file modified manually on the DAS to the instances.
3. Commands pertaining to <jacc-provider> element in <security-service>

- a. create-jacc-provider

 
        create-jacc-provider   --policyconfigfactoryclass pc_factory_class 
     --policyproviderclass pol_provider_class 
     [--help] 
     [ --property (name=value)[:name=value]*] 
     [ --target  target_name] jacc_provider_name 
       

- b. delete-jacc-provider

 
        delete-jacc-provider   
     [--help] 
     [ --target  target_name] jacc_provider_name 
       

- c. list-jacc-providers

 
        list-jacc-providers   
     [--help] 
     [ --target  target_name] 
       

While (1) and (2) seem more important, support for (3) would be considered lower priority (P3) for GFV3.1 release. The new CLI command for handling (1) and (2) could have a structure as follows (but would like to finalize based on comments from Reviewers, it appears one can levarage a command which has a much wider scope than just handling server.policy and login.conf) :

 
        synchronize-config   
     [--help] 
     config_name 
       

The command when executed on the DAS would copy-over the named configuration artifact from the DAS to the instances.