3.1 Startup Performance Analysis

Issues

Regressions from 3.0:

Issue Description Status
12379 5% regression in startup time because of autostart/ RESOLVED
12381 Bundles in modules/autostart and autodeploy/bundles can cause performance regression RESOLVED
12454 WebContainerStarter is taking much longer to start RESOLVED - duplicate
12477 WebContainerStarter looks at too many configs when determining whether to start RESOLVED
12574 deployment command brings in Cluster Admin module when there is no cluster RESOLVED
13076 Cluster SSH Provisioning is resolved after startup RESOLVED
13277 Monitoring bootstrap start-up performance problem RESOLVED
13278 security module start-up performance problem RESOLVED
13279 javax.xml.rpc.handler resolution at startup taken from metro rather than module that is already loaded RESOLVED
13626
(GRIZZLY-893)
admin-thread-pool size causes startup pipes RESOLVED in Grizzly,
integrated into GF on 10/15
13782 3.1 deployment performance - ASURLClassLoader.appendURL, urlSet.contains() call takes more time than v3.0 RESOLVED
13783 3.1 deployment performance - web container initialization RESOLVED - 12/16
13784 3.1 deployment performance - loading/starting components in web container RESOLVED - 1/3
13785 3.1 deployment performance - accessing web application takes longer RESOLVED - 11/18 - no specific changes, but no big regression here either
13786 Cluster Admin module becomes active after deploy without cluster RESOLVED
13789 3.1 deployment performance - thread/threadpool usage for the annotation scanning RESOLVED
13830 3.1 deployment performance - accessing web application takes longer RESOLVED - 11/18 - duplicate of 14010
13995 Performance in iterating org.jvnet.hk2.component.Habitat$5 RESOLVED
14010 3.1 deployment performance - AdminAdapter.service RESOLVED
14011 3.1 deployment performance - security check in WebPipeline RESOLVED - 12/12 - no changes
14116 performance for startup - WebModuleListener.configureJsp RESOLVED
14161 asadmin perf regression RESOLVED
14351 time to start domain the 2nd time is too large
RESOLVED - 11/8
14503 set felix.cache.singlebundlefile=true
RESOLVED - 11/8
14782 3.1 deployment performance - security container initialization
RESOLVED - 12/14 - no changes, due to slower class loading
14992 Slower classloading with Felix 3.0.6 RESOLVED - 12/30

Compensations for regressions elsewhere:

Issue Description Status
13612 javadb (derby) accessed during initial startup - shouldn't be RESOLVED
FELIX-2619 Bundle cache is rechecking non-existing files again and again RESOLVED in Felix, integrated into GF on 10/8
FELIX-2626 Bundle cache is rewriting some files when restarting bundles RESOLVED in Felix, integrated into GF on 10/8
??? Reduce number of Felix cache files by combining data into single file Implemented in Felix 3.0.5, but need to set felix.cache.singlebundlefile=true in the felix config file.
13813 Weld brings in JSF implementation INVESTIGATED, but INVALID
14007 move tools.jar from launcher class loader to application class loader RESOLVED as WONTFIX - EJB needs tools.jar where it is
14000 Consider disabling implicit boot delegation feature of Felix  
15560 JSP compiler javac classpath causes too many JARs to be opened
RESOLVED

Diagnostic issues:

Issue Description Status
13169 Add diagnostic information to explain why a module is being resolved  
13170 Add log messages for module resolution time  

Useful Links

Startup Regression

Here is an analysis of how the DAS module startup has changed from 3.0 to the 3.1 trunk on 9/14. This ignores changes to the versions of modules and looks only at changes in the initial state of modules. The data was obtained by running "asadmin start-domain" for both releases, and comparing the output of the "ps" command after telneting to port 6666.

1. Modules that went from Installed in 3.0 to Active or Resolved in 3.1

   Connectors Inbound Support  :  Resolved
   Admin Utilities classes  :  Active
   JMS Module  :  Active                                      OK
   Appserver Core Bootstraping Classes  :  Resolved           OK
   GlassFish Web container Embedded APIs  :  Resolved

2. Modules that went from Resolved in 3.0 to Active in 3.1

 
       

3. New modules in 3.1 that are Active or Resolved

libpam4j repackaged as a module  :  Resolved
   javax.resource API v.1.6  :  Resolved
   osgi.core  :  Resolved
   osgi.cmpn  :  Resolved
   GlassFish High Availability APIs and SPI  :  Resolved
   shoal-gms-api  :  Resolved                                  (EXPECTED - brought in by GMS Bootstrap)
   GMS Bootstrap Module  :  Active                             (EXPECTED - lazy loading of GMS)
   grizzly-websockets  :  Resolved
   Mojarra JSF Implementation 2.1.0-SNAPSHOT  :  Resolved      (NO CHANGE - new version of package)
   Mojarra JSF API Implementation 2.1.0-SNAPSHOT  :  Resolved  (NO CHANGE - new version of package)
   grizzly-lzma  :  Resolved
   Cluster SSH Provisioning  :  Resolved                       (UNEXPECTED!! - issue 13076)
   jersey-core  :  Resolved
   GlassFish SSL Implementation Module  :  Active
   trilead-ssh2 repackaged as a module  :  Resolved            (UNEXPECTED!! - issue 13076)
   Validation API  :  Resolved
   Base module for providing support for OSGi in Java EE applications  :  Resolved
   jersey-client  :  Resolved
   GlassFish ha-shoal-cache-bootstrap  :  Active
   OSGi resource locator bundle - used by various API providers that rely on META-INF/services mechanism to locate providers.  :  Active
   Appserver Logging Classes  :  Active
   Apache Felix EventAdmin  :  Active

4. New modules in 3.1 that are Installed

GlassFish ORB interface layer implementation
   Backup Restore classes
   Cluster Admin CLI
   Cluster Admin
   Web Container HA code
   GlassFish ha-shoal-store
   Load-Balancer admin
   GMS Module
   Data mapper for Jackson JSON processor
   shoal-cache
   AMX Combining
   Jackson JSON processor
   Admin Console Clustering Support Plugin
   shoal-gms-impl
   Cluster Common
   XML Compatibility extensions for Jackson data binding
   GlassFish Metro Glue Code
   Jersey server library repackaged as OSGi bundle for GlassFish
   jersey-json
   GlassFish ha-file-store
   JAX-RS provider for Json content type, using Jackson data binding

5. Modules from 3.0 that no longer exist in 3.1

org.jvnet.tiger-types repackaged as module  :  Resolved
   Admin Console Jruby Plugin  :  Installed
   Hibernate validator library and Validation API  :  Active
   Upgrade Tool  :  Installed
   javax.resource API v.1.6-alpha  :  Resolved
   AMX V3 Core implementation  :  Installed
   modules/grizzly-jruby.jar  :   Installed
   AMX V3 Config implementation  :  Installed
   Mojarra JSF Implementation 2.0.2-FCS  :  Resolved
   Mojarra JSF API Implementation 2.0.2-FCS  :  Resolved
   AMX V3 Core  :  Installed
   grizzly-cometd  :  Installed
   JRuby container and deployer  :  Installed
   AMX V3 extensions implementation  :  Installed
   AMX V3 Config  :  Installed
   Jackson library repackaged as OSGi bundle for GlassFish  :  Resolved
   Jersey bundle repackaged as OSGi module for GlassFish  :  Resolved
   CLI Framework  :  Installed
   AMX V3 Java EE Management  :  Installed
   jsr311-api  :  Resolved
   GlassFish ORB conector implementation  :  Active
   Administration Console JRuby Help Content  :  Installed
   JRuby sniffer for Glassfish v3  :  Active

6. Modules that were Resolved or Active in 3.0 but are Installed in 3.1

JSF implementation connector module  :  Installed
   Web Container glue code  :  Resolved
   GlassFish Core EJB container implementation  :  Installed
   GlassFish ORB connector implementation  :  Resolved
   WebServices Security and JSR 196 SOAP Profile Implementation  :  Installed
   Web module command line interface  :  Resolved
   Deployment Related Admin Classes  :  Installed
   WebTier Security Integration  :  Resolved
   Admin Console JTS Plugin  :  Installed
   JSR-109 implementation to deploy Metro  :  Resolved
   Deployment Related JavaEE Core Classes  :  Resolved
   JSP caching taglib connector module  :  Installed
   Core Servlet Container  :  Resolved
   Administration Console Java Transaction Service Help Content  :  Installed
   Web Container Common Utilities  :  Resolved
   Apache Felix Declarative Services  :  Installed
   JSTL implementation connector module  :  Installed
   JavaServer Pages  :  Installed
   Deployment Object Library  :  Resolved

With groups (1) and (3), we have 23 modules that are being Resolved in 3.1 during startup that were not being resolved in 3.0. I don't have a measure of the time that this takes, but this is likely leading to at least some of the startup performance regression.

Performance Data from the Logs

To enable gathering of information about what bundles are loaded at startup time, and other performance information, the following setting in the logging.properties file is useful:

javax.enterprise.system.core.level=FINE

After starting and stopping the benchmark domain, the server.log file has the following key message that indicates the different in startup time:

From 3.0:

[#|2010-06-24T08:06:58.475-0700|INFO|glassfishv3.0|javax.enterprise.system.core.com.sun.enterprise.v3.server|_ThreadID=11;_ThreadName=Thread-1;|GlassFish v3 (74.2) startup time : Felix(8513ms) startup services(2089ms) total(10602ms)|#]

From 3.1b05:

[#|2010-06-24T08:41:42.694-0700|INFO|glassfish3.1|javax.enterprise.system.core.com.sun.enterprise.v3.server|_ThreadID=14;_ThreadName=Thread-1;|GlassFish3.1-b05 (5) startup time : Felix(9980ms) startup services(10939ms) total(20919ms)|#]

This is the message that comes out right before the table of services.

Startup Breakdown

The startup time can be divided based on the following events:

  • asadmin start
  • GFLauncherLogger message, "Successfully launched in n msec"
  • first log message from DAS, "Running GlassFish Version..."
  • completion of services startup

The timing for these intervals between these events are:

version launch load service startup
3.0 ? 11 sec 1.3 sec
3.1b05 ? 12 sec 8.8 sec

The time between the asadmin start and the first log message cannot be determined because there is no reference for when asadmin actually starts.

There isn't any information in the server.log file that provides more details about the load time.

The service start time can be broken down by looking at the start times for each service which are printed in the logs. 3.1 has one more service, GmsAdapterService, that is started.

service 3.0 start time (msec) 3.1 start time (msec)
GrizzlyService 1143 2210
RestService 1 442
GlassFishNamingBuilder 3 5
ORBConnectorStartup 3 3
LifecycleModuleService 8 9
WebContainerStarter 25 142
ApplicationLoaderService 90 126
GmsAdapterService n/a 17
LogManagerService n/a 265
InstanceStateService n/a 20
LocalPasswordImpl n/a 12
IdmService n/a 8
Globals n/a 1
ShoalBackingStoreProxy n/a 14
ReplicationStoreProxy2 n/a 3
     

The WebContainerStarter takes significantly longer in 3.1 and deserves closer inspection. This was resolved with issue 12454.

There are several differences in the startup between 3.0 and 3.1.

  1. In 3.0, GrizzlyService is initialized before WebContainerStartup. In 3.1, WebContainerStartup is first.
  2. In 3.0, the WebContainerStartup service doesn't generate any log messages. However, in 3.1, it generates messages for the Grizzly framework startup (the same messages that are generated for the GrizzlyService in 3.0), and the HTTP listeners and the virtual servers are created. These latter operations take significant time and account for most of the 8+ seconds.
  3. In 3.1, there are 5 Grizzly Framework threads created, while in 3.0 there are 2. The message for these is "Starting Grizzly Framework...".

Deployment-Time Regression

Here is a comparison of the modules state after a start-domain, a deploy, and an access to the web application for the trunk on 7/22:

1. Modules that went from Installed to Active/Resolved:

Connectors Inbound Support  :  Active
   Admin Utilities classes  :  Resolved
   JMS Module  :  Active
   Appserver Core Bootstraping Classes  :  Active
   GlassFish Web container Embedded APIs  :  Resolved

2. New modules that are Active or Resolved:

libpam4j repackaged as a module  :  Resolvedlibpam4j repackaged as a module  :  Resolved
   Versioning Classes  :  Active
   GlassFish High Availability APIs and SPI  :  Resolved
   GMS Bootstrap Module  :  Active
   Apache Felix Bundle Repository  :  Active
   grizzly-websockets  :  Resolved
   grizzly-lzma  :  Resolved
   jersey-core  :  Resolved
   shoal-gms-impl  :  Resolved
   Base module for providing support for OSGi in Java EE applications  :  Active
   Jersey server library repackaged as OSGi bundle for GlassFish  

3. New modules that are Installed:

shoal-backing-store
   Backup Restore classes
   Cluster Admin CLI
   Cluster Admin
   Web Container HA code
   shoal-gms-api
   Load-Balancer admin
   GMS Module
   Cluster SSH Provisioning
   shoal-cache
   Jackson JSON processor
   trilead-ssh2 repackaged as a module
   Admin Console Clustering Support Plugin
   Cluster Common
   jersey-client
   jersey-json
   GlassFish ha-file-store

4. Old modules that are no longer there:

org.jvnet.tiger-types repackaged as module
   Upgrade Tool
   Jackson library repackaged as OSGi bundle for GlassFish
   CLI Framework
   jsr311-api
Here's what we know about why these modules have a different state than from 3.0.  We focus first on those that are Active.

  • "Cluster Admin" - This module is becoming active because the PostDeployCommand injects a ClusterExecutor even if the application is not being deploy to a cluster. Issue 12574 has been filed for this.
  • "JMS Module" - The JmsProviderLifecycle class was modified to implement the PostStartup interface so that it is automatically activated by a start-domain. This happened in revision 35125 as part of issue 11311 . Since this is a PostStartup activity, it is not clear that activation of the module has any impact on the startup benchmark (because of parallelism). It is also not clear how much time this module takes to load. 
  • "Connectors Inbound Support" - TBD
  • "Appserver Core Bootstrapping Classes" - In 3.1, the bootstrapping classes were made into a service that is started by hk2; see revision 35609. So this code was loaded and activated before in 3.0, it's just that hk2 now reports it as activated. 
  • "Versioning Classes" - These are the deployment versioning classes in deployment-versioning.jar. This is part of the new application versioning feature in 3.1.  This is active because the deploy command injects the VersionService that is defined by this module. I don't know how much time this takes to load.
  • "GlassFish GMS Bootstrap Module" - This is a new module that was added in 3.1 for clustering support.  It looks at whether any clusters are configured, and if not, does nothing else. Based on the timing above, it appears that this module adds about 50ms to the startup time. 
  • "Apache Felix Bundle Repository" - This is new module that is made active because it is in the "modules/autostart" directory.  It adds the OSGi Bundle Repository functionality to GlassFish. 
  • "Base module for providing support for OSGi in Java EE application" - this is made active because it is in the "modules/autostart" directory.  This is a new experimental feature in 3.1 that provide the ability to have hybrid applications (OSGi modules and Java EE features). See OSGi one-pager.  This and the previous item add about 8% to the server startup time.  Waiting for the Felix 3.1.0 FileInstall release to fix this (issue 12381 ).

Performance Improvement Ideas

Since the features that have been added for 3.1 will result in at least some startup regression, the only way to achieve the goal of no regression overall for the release is to find ways to improve the performance of the existing system.  This section lists ideas and observations that might lead to some performance improvement to make up for the additional code brought in for new 3.1 features. 

JSF Loading

The JSF modules are resolved during startup as a result of a dependency from the Weld module. The startup benchmark does not involve the use of JSF at all. So if we could prevent JSF from being resolved, then this might save some startup time. A simple test of removing the JSF and Weld modules and the osgi-web-container.jar module that brings in Weld resulted in the domain not starting up at all. NOTE: JSF was resolved as part of 3.0 too, so this idea is not trying to correct a regression.  Rather it is looking for an opportunity to make up for other regressions.

Truss Output Analysis

The truss command on Solaris provides a list of all the system calls that are called during the run of a program.  By starting and stopping the DAS under truss, we can look for patterns in the system call output that indicate inefficiencies. Here is a list of observations for the truss output.

  1. redundant stat64 calls:   On a single run, there are 7806 stat64 calls. 4122 of them do not find the file they are looking for. And there are only 3249 unique files that are checked. So this means that more than half of the stat64 calls are redundant. For example, there are 5 stat64 calls on the "refresh.counter" file for every OSGi bundle. That's 1195 calls, all of which do not find the file. There are 1099 calls looking for bundle.state files, 1059 of which do not find the file. Since we have only 239 bundles, we must be calling stat64 for the bundle.state file at least 4 times for each bundle. A problem was found in the Felix code such that these redundant calls can be eliminated. See FELIX-2619 (Richard)
  2. revision.location write: The revision.location file is being written for almost every bundle (238 times). The content being written to the revision.location file is the same as what is read in just prior to this, so the write is redundant. The Felix code is being modified to eliminate this. See FELIX-2626 (Richard)
  3. derby JAR file access: Many JAR files from the derby installation are being accessed (mainly via stat64) during initial startup, before the OSGi bundles are read in.  See issue 13612. (Tom investigating)
  4. double stat64 calls on JARs: On almost every JAR file that is read in, there are double stat64 calls, like this:
    /2:      0.0007  0.0001 stat64("/home/trm/test/glassfishv3/glassfish/modules/hk2-core.jar", 0xD025CB90) = 0
    /2:      0.0005  0.0001 stat64("/home/trm/test/glassfishv3/glassfish/modules/hk2-core.jar", 0xD025CB50) = 0
    /2:      0.0005  0.0000 brk(0x08256FF0)                                 = 0
    This appears to be something in the JDK class loader.
  5. double stat64 calls on JARs (2nd case): On some JARs, stat64 is called 2 times in a row, like this.  Since this is a derby JAR, it might be related to item 3.
    /2:      0.0005  0.0001 lwp_cond_signal(0x08192248)                     = 0
    /2:      0.0013  0.0001 stat64("/home/trm/test/glassfishv3/glassfish/config/../../javadb/lib/derbyLocale_ko_KR.jar", 0xD025C590) = 0
    /2:      0.0006  0.0001 stat64("/home/trm/test/glassfishv3/glassfish/config/../../javadb/lib/derbyLocale_ko_KR.jar", 0xD025C550) = 0
    /2:      0.0007  0.0002 open64("/home/trm/test/glassfishv3/glassfish/config/../../javadb/lib/derbyLocale_ko_KR.jar", O_RDONLY) = 22
    /2:      0.0005  0.0000 fstat64(22, 0xD025C590)                         = 0
    /2:      0.0003  0.0000 fcntl(22, F_GETFD, 0x082CA938)                  = 0
    /2:      0.0004  0.0001 fcntl(22, F_SETFD, 0x00000001)                  = 0
    This appears to be something in the JDK class loader.
  6. pipe calls: The admin-thread-pool is somehow related to the number of pipe calls that are made by the process. With the default value of 50, there are 55 pipe calls, consuming 110 file descriptors.  This seems to be excessive.  See issue 13626 . Alexey has provided a fix to try that will limit the number of pipes created to the number of threads. Testing in progress.
  7. OSGi bundle information file reads: There are 4 small files for each bundle in the OSGi module cache and all for of these are read for every bundle during a start-domain.  If these could be reduced, either by having one file per bundle or one file for the entire cache, then maybe there could be some improvement in startup time.  Richard provided a quick hack to the felix.jar file that creates one file for each bundle.  This saved 9.5% on the start-domain time and 13% on the Felix start time that is shown in the log. Looks promising.