GlassFish Wiki : MS5 Demo - Automatic Transaction Failover with EJB Timer Migration

Automatic Transaction Failover with EJB Timer Migration Demo

This demo shows automatic transaction failover combined with the timer migration if an instance that owns EJB Timers in a cluster is killed.

The demo uses ejb devtest

v2/appserv-tests/devtests/transaction/ee/timer_with_autorecovery

to step through various stages

Team

Marina Vatkina

Setup and Cluster creation

Compile the classes and create the WAR file to be deployed:

ant build

Setup:

Start JavaDB database
Start DAS.
Create an XA resource that can be shared by all instances in the cluster (_jdbc/xa_)
Register this resource as the _ejb-timer-service.timer-datasource_ with the _default-config_.
Create and start the cluster of 2 instances

ant setup

Running the test

ant run-recover

Programmatic timer is created on _in1_ in a separate method call to get it going

You can see in the instance log _glassfishv3/glassfish/nodes/localhost/in1/logs/server.log_ the following output:

 
        [#|2010-09-29T16:18:16.371-0700|INFO|glassfish3.1|javax.enterprise.system.tools.admin.org.glassfish.deployment.admin|_ThreadID=15;_ThreadName=Thread-1;|tx-ee-timer-with-autorecovery-web was successfully deployed in 12,952 milliseconds.|#]

[#|2010-09-29T16:18:48.907-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|In ___MyBean:timeout___ timer01 28080|#]

After that 3 rows are inserted on in1 into the default and xa databases:

 
        [#|2010-09-29T16:18:49.993-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Call # 1|#]

[#|2010-09-29T16:18:49.996-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Call # 2|#]

[#|2010-09-29T16:18:49.998-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Call # 3|#]

[#|2010-09-29T16:18:50.001-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Insert successfully|#]

[#|2010-09-29T16:18:50.023-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Call # 1|#]

[#|2010-09-29T16:18:50.025-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Call # 2|#]

[#|2010-09-29T16:18:50.025-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Call # 3|#]

[#|2010-09-29T16:18:50.026-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Insert successfully|#]

... But the transaction is left waiting between 2 phases of the two-phase commit (2PC):

 
        [#|2010-09-29T16:18:50.029-0700|WARNING|glassfish3.1|javax.enterprise.system.core.transaction.com.sun.jts.utils.RecoveryHooks|_ThreadID=15;_ThreadName=Thread-1;|JTS5057: FailPoint : [0]|#]

[#|2010-09-29T16:18:50.033-0700|WARNING|glassfish3.1|javax.enterprise.system.core.transaction.com.sun.jts.utils.RecoveryHooks|_ThreadID=15;_ThreadName=Thread-1;|JTS5057: FailPoint : [null]|#]

[#|2010-09-29T16:18:50.040-0700|WARNING|glassfish3.1|javax.enterprise.system.core.transaction.com.sun.jts.utils.RecoveryHooks|_ThreadID=15;_ThreadName=Thread-1;|JTS5057: FailPoint : [1]|#]

[#|2010-09-29T16:18:50.043-0700|WARNING|glassfish3.1|javax.enterprise.system.core.transaction.com.sun.jts.utils.RecoveryHooks|_ThreadID=15;_ThreadName=Thread-1;|JTS5057: FailPoint : [null]|#]

[#|2010-09-29T16:18:50.044-0700|WARNING|glassfish3.1|javax.enterprise.system.core.transaction.com.sun.jts.utils.RecoveryHooks|_ThreadID=15;_ThreadName=Thread-1;|JTS5057: FailPoint : [2]|#]

[#|2010-09-29T16:18:53.745-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|In ___MyBean:timeout___ timer01 28080|#]

At this point _in1_ is killed

Instance _in2_ receives GMS failure recovery notification after a short period of time (see instance log _glassfishv3/glassfish/nodes/localhost/in2/logs/server.log_)

 
        [#|2010-09-29T16:19:15.966-0700|INFO|glassfish3.1|ShoalLogger|_ThreadID=15;_ThreadName=Thread-1;|GMS1016: Analyzing new membership snapshot received as part of event : FAILURE_EVENT for Member: in1 of Group: c1|#]

[#|2010-09-29T16:19:15.966-0700|INFO|glassfish3.1|ShoalLogger|_ThreadID=15;_ThreadName=Thread-1;|GMS1019: The following member has failed: in1 of Group: c1|#]

[#|2010-09-29T16:19:15.968-0700|INFO|glassfish3.1|ShoalLogger|_ThreadID=15;_ThreadName=Thread-1;|GMS1102: Appointed Recovery Server:in2:for failed member:in1:for group:c1|#]

[#|2010-09-29T16:19:15.980-0700|INFO|glassfish3.1|ShoalLogger|_ThreadID=15;_ThreadName=Thread-1;|GMS1004: Sending FailureRecoveryNotification to component TRANSACTION-RECOVERY-SERVICE|#]

[#|2010-09-29T16:19:15.983-0700|INFO|glassfish3.1|ShoalLogger|_ThreadID=15;_ThreadName=Thread-1;|GMS1004: Sending FailureRecoveryNotification to component GlassfishV31|#]

[#|2010-09-29T16:19:15.983-0700|INFO|glassfish3.1|javax.enterprise.system.core.transaction.com.sun.jts.jta|_ThreadID=15;_ThreadName=Thread-1;|[GMSCallBack] failure recovery signal: com.sun.enterprise.ee.cms.impl.common.FailureRecoverySignalImpl@64d74895|#]

The Transaction Service waits 60 seconds (or the user set value) to ensure that indoubt xids are updated into the database, otherwise while doing the recovery an instance may not get all the correct indoubt xids.

The recovery process starts (see messages from _javax.enterprise.system.core.transaction.com.sun.enterprise.transaction.jts.recovery_) and when it is executed, you can see

 
        [#|2010-09-29T16:20:16.017-0700|INFO|glassfish3.1|javax.enterprise.resource.resourceadapter.com.sun.enterprise.connectors.inbound|_ThreadID=15;_ThreadName=Thread-1;|Recovery of Inbound Transactions started.|#] 
       

When resource recovery completes, automatic EJB Timer migration starts as the after event:

 
        [#|2010-09-29T16:20:16.463-0700|INFO|glassfish3.1|javax.enterprise.system.container.ejb.com.sun.ejb.containers|_ThreadID=15;_ThreadName=Thread-1;|[DistributedEJBTimerServiceImpl] afterRecovery event for instance in1|#]

[#|2010-09-29T16:20:16.463-0700|INFO|glassfish3.1|javax.enterprise.system.container.ejb.com.sun.ejb.containers|_ThreadID=15;_ThreadName=Thread-1;|[DistributedEJBTimerServiceImpl] migrating timers from in1|#]

[#|2010-09-29T16:20:16.463-0700|INFO|glassfish3.1|javax.enterprise.system.container.ejb.com.sun.ejb.containers|_ThreadID=15;_ThreadName=Thread-1;|Beginning timer migration process from owner in1 to in2|#]

[#|2010-09-29T16:20:16.699-0700|INFO|glassfish3.1|javax.enterprise.system.container.ejb.com.sun.ejb.containers|_ThreadID=15;_ThreadName=Thread-1;|Timer migration phase 1 complete. Changed ownership of 1 timers.  Now reactivating timers...|#]

[#|2010-09-29T16:20:16.701-0700|INFO|glassfish3.1|javax.enterprise.system.container.ejb.com.sun.ejb.containers|_ThreadID=15;_ThreadName=Thread-1;|Rescheduling missed expiration for periodic timer '1@@1285802327613@@in1@@in1' 'TimedObject = MyBean' 'Application = tx-ee-timer-with-autorecovery-web' 'CREATED' 'PERIODIC' 'Container ID = 84266337573208064' 'Wed Sep 29 16:18:48 PDT 2010' '5000' .  Last timer expiration occurred at Wed Sep 29 16:19:08 PDT 2010|#]

Delegated recovery removes the locks, and the EJB Timers are being fired:

 
        [#|2010-09-29T16:20:17.735-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|In ___MyBean:timeout___ timer01 28080|#]

[#|2010-09-29T16:20:18.773-0700|SEVERE|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|In ___MyBean:timeout___ timer01 28080|#]

Validation code queries the table on the XA resource and finds 3 rows:

 
        [#|2010-09-29T16:20:35.743-0700|INFO|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Found: BAA280800 : BBB280800|#]

[#|2010-09-29T16:20:35.744-0700|INFO|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Found: BAA280801 : BBB280801|#]

[#|2010-09-29T16:20:35.744-0700|INFO|glassfish3.1|null|_ThreadID=15;_ThreadName=Thread-1;|Found: BAA280802 : BBB280802|#]

Cleanup

To undeploy the application, stop and remove the cluster do:

ant unsetup

This is the end of the demo and the test.