GlassFish Server Open Source Edition 3.1 - In-memory Replication One Pager
1. Introduction1.1. Project/Component Working Name:In memory Replication in GlassFish Server Open Source Edition 3.1 1.2. Name(s) and e-mail address of Document Author(s)/Supplier:Mahesh Kannan 1.3. Date of This Document:May 15 2010 2. Project Summary2.1. Project Description: In memory Replication in GlassFish Server Open Source Edition 3.1GlassFish V2 relied on session state persistence feature to provide high availability of session states (HTTP and EJB) by enabling containers to persist HTTP or EJB session states to a persistent 'store' that can survive GlassFish instance faiures. In order to be able to interact with a variety of physical stores (like file system, in memory, HADB etc.) GlassFish V2 defined an SPI called BackingStore SPI. The SPI defined a set of classes that a store provider must implement (SPI) and also a set of APIs for the containers to use. This allowed containers to persist the session states to a variety of BackingStores without worrying about the actual physical store. One such BackingStore was in memory replication that replicated the session states to an another GlassFish instance so that session states are available even in the presence of instance failure. In V3.1, replication module has been redesigned to provide some more improvements over V2 implementation. First of all, it is an OSGi module just like many other modules in V3. Secondly, it internally uses a consistent hash algorithm to pick a replica instance within a cluster of instances. This allows the replication module to locate the replica (or find the replicated data) in an easy manner when the contianers need to retrieve the data. The replication module allows any Serializable POJOs to be replicated (and retrieved) in a cluster of GlassFish instances. This means that the module can be used to provide availability of any POJOs and not just HTTP and Stateful EJBs. Basically, the replication layer provides a set of APIs to (a) save, (b) retrieve, (c) update and (d) delete data (some POJO). The save method is implemented by replicating the data to another instance in the cluster. In V3.1, it internally uses a consistent hash algorithm to pick a replica instance within a cluster of instances. The retrieve operation is implemented by locating the replica and then pulling the data (POJO) from the replica. The rest of the document will describe the various details of in memory implementation module. 2.2. Risks and Assumptions:For V3.1, we will continue to use the BackingStore SPI that was approved for the Sailfin release. The SPI allows our containers to save states (HTTP Session / Stateful EJBs) to avariety of physical stores (like Database, Filesystem, InMemory replication etc.) It is assumed that GMS will provide the APIs for the following: 1. APIs for sending and receiving serialized data (byte[]) to a specific target instance / all members in the cluster 2. APIs for receiving events about health of members in the cluster. More specifically, the HA module will rely on the existence of JOIN_AND_READY, MEMBER_FAILED, CLUSTER_STOP and CLUSTER_START events. Also, the JOIN_AND_READY event can additionally specify a sub-event (REJOIN) to indicate that the member has rejoined the group after leaving the group for a brief period of time. This sub event is typically generated when an instance is restarted (and joins the cluster) even before GMS could detect the failure. More details can be found the GMS one pager. 3. It is assumed that, for this release, we cannot guarentee availability of session state in the event of multiple (more than one) server failures. This is because, the replication module will replicate the data to only one replica instance and hence may not be able to recover state in the presence of more than one server failure. 4. Metro HA (particularly reliable messaging) requires larger payloads. While the exact payload sizes are application dependent, it is assumed that GMS messaging layer can support these higher payload sizes. 5. For better performance, it is expected that the web container sets a cookie called 'replicaLocation' in the response. This cookie tells where the replica resides. It is assumed that the GlassFish LB plugin will - after a failure - route the request to the node indicated by the cookie. 6. The 'replicaLocation' cookie approach can be taken by EJB container as well, but this is NOT planned for this release (as this requires more work in the ORB, request/response interceptors etc.) 3. Problem Summary3.1. Problem Area:3.2. Justification: Why is it important to do this project?GlassFish 2.x ensured HTTP and EJB session state availability by providing a in memory replication module. GlassFish 3.0 did not have the necessary clustering infrastructure to provide the above HA feature. Since, clustering support will be available in GlassFish 3.1, we need a light weight, open source implementation of in-memory replication module. 4. Technical Description:Similar to V2.x, the general approach is to replicate session state from each instance in a cluster to a back-up / replica instance. The replication layer itself will use the GMS communication APIs for transporting the data. It is assumed that GMS will use the appropriate transport APIs (like Grizzly APIs) to send the data. The current plan is to leverage GMS for cluster group membership services including things like initial bootstrapping/startup and various cluster shape changes including instances being stopped and re-started, instances failing, new instances being added to the cluster, etc. Its is assumed that ReplicaSelector will register itself with GMS to know the current 'alive' members so that it can pick an 'alive' replica. The plan is to have as close to zero configuration as possible. Availability configuration will continue to work as it has for previous HA enabled releases. The existing persistence-type ("replicated") will continue to be supported. This will allow QA and performance tests to run as they do with V2.x. 4.1. Details:For the rest of the document we will use the following notations:
Note: We will continue to use the same set of classes and interfaces that ASARCH approved for Sailfin release. 4.1.1 Replication in V2Let us assume that the loadbalancer routes sessions s1, s2 and s3 to node n1. In V2, n1 would have replicated s1, s2 and s3 to n2. This is because it followed a 'buddy' replication scheme wherein a instance will replicate all its data just to its neighboring instance. Now, lets say that n1 fails and the loca balancer routes s1 to n2 and s2 and s3 to n3. Obviously, when the container(s) in both instances will ask the BackingStore to 'load' the data. The load request for s1 can be satisfied 'locally' by n2 as originally n1 replicated s1 to n2. However, the load requests for s2 and s3 cannot be satifisfied 'locally' by n3. It will have to send out a 'broadcast' query to the entire cluster to retrieve the dat from n2. We have seen that this approach leads to severe performance problems as, under heavy load, (a) there will be a lot of these broadcast requests and (b) the instance that has the replica will also has to serve extra data to the requesting instances. V2.1 used a bunch of classes like Metadata, SimpleMetadata, CompositeMetadata and BatchMetadata to replicate / persist data in to an instance of BackingStore. This meant that the BackingStore implementations needed to understand these objects to correctly store / replicate data. Also, it was difficult to support a new container as addition of new containers typically meant new artifacts that must be understood by <bold>each</bold> BackingStore implementations. 4.1.2 Replication in V3Again, assume that the loadbalancer routes sessions s1, s2 and s3 to node n1. In V3, n1 might replicate s1 to n2 and s2, s3 to n3. This is because it uses a consistent hash algorithm to map a key to an (available) instance. The reason why this is called a consistent hash is that, the function yields the same (consistent) instance name when computed from any instance in the cluster. In V3, instead of using a consistent hash function directly, the replication module will define an interface called ReplicaSelector. The input to the ReplicaSelector will be the key and the groupname and the ReplicaSelector will return an instance name (that is alive witin the specified group) to which the data must be replicated. Each backup instance will store the replicated data in-memory. Project SHOAL will provide the in-memory replication implementation. Project shoal will provide a class called DataStore that supports all methods of GlassFish BackingStore interface. DataStore will use GMS under the covers to join a group and to replicate data (using GMS send API). The reasons for doing it in project shoal are:
The main interfaces that are used by the Containers are (a) BackingStoreFactory, (b) BackingStore and (c) BatchBackingStore. BackingStoreFactory is used to create the other two stores. BatchBackingStore is used only by the EJB Contianer. BackingStore will be used by Web, EJB and Metro. In Sailfin, we took a different approach. The BackingStore SPI was extended by defining four new annotations. The most important annotation being @StoreEntry and @Attribute annotations. This allowed containers to define any POJO that contained the data to be persisted. The POJOs are annotated with these annotations which allowed containers to describe each attribute of the POJO. Here is an example: @StoreEntry public class PersistentState { @Attribute(name="id") public String getId() {...} @Attribute(name="version") public long getVersion() {...} @Attribute(name="state") public byte[] getState() {...} @Attribute(name="data1") public String getData1() {...} } Here we have defined a simple POJO called PersistentState and have annotated the getter methods with @Attribute annotation. Lets say that a container wants to store the above in a BackingStore. The container first calls BackingStoreFactory.createBackingStore() and passes PersistetState.class as one of the parameters. This allows the BackingStore implementation class to use reflection APIs to examine the "Attributes" of this POJO. A database implementation might use these "Attributes" to create a table whose field names are the attribute names. Currently, only "Attributes" that are primitives, primitive wrappers, String and byte[] types can be annotated with @Attribute. Once the POJO is defined, it must be run through an annotation processor that is defined in ha-apt module. This will create a couple of classes:
In V3.1, we will let the containers to use any POJO object that are annotated with @StoreEntry and @Attribute. In fact, it will be easy for the containers to annotate the V2 artificats (like Metadata, WebMetada, SimpleMetadata, SFSBBeanState etc.) These genertaed sub class POJOs are used in BackingStore operations. The generated sub class X_Sub (or PersistentState_Sub) will contain (generated) methods that allows dirty detection (meaning if a setter was called on, say attribute 'y', then the generated sub class remembers that the attribute y is dirty. This allows, say a database implementation, to efficiently perform updates). 4.1.3 Similarities between V2 and V3.1 replication
4.1.4 Assumptions about how BackingStore will be used
4.1.5 Note for BackingStore providersEnsure that the implementation class is annotated with @Service like: @Service(name="replication") public class ReplicatedBackingStoreFactory { ....... } Also, (to lazily load the actual BackingStoreFactory) write a separate OSGi module (V3 module) that implements V3 Startup interface that registers a proxy BackingStoreFactory for the supported persistence type as follows: public ProxyForReplicationBackingStoreFactory implements Startup, BackingStoreFactory { public Startup.Lifecycle getLifecycle() { BackingStoreFactoryRegistry.register("replication", this); return Startup.Lifecycle.SERVER; } //Other BackingStoreFactory methods... } 4.1.6 Note for Containers In V3.1 an instance of BackingStoreFactory can be obtained as follows: 4.2. Bug/RFE Number(s):4.3. In Scope:This project will deal will replication of HTTP Sessions and Stateful Session EJBs and some Metro artifacts. 4.4. Out of Scope:
4.5. Interfaces:Note: We will continue to use the same set of classes and interfaces that ASARCH approved for Sailfin release. BackingStoreFactory/** * A factory for creating BackingStore(s). Every provider must provide an * implementation of this interface. * * The <code>createBackingStore(env)</code> method is called typically during * container creation time. * * The <code>createBatchBackingStore()</code> method will be called whenever * the container decides to save a set of states that belong different * applications. * */ @Contract public interface BackingStoreFactory { /** * This method is called to create a BackingStore that will store objects * of type vClazz and keys of type kClazz. */ public <K, V> BackingStore<K, V> createBackingStore( String storeName, Class<K> keyClazz, Class<V> vClazz, Properties env) throws BackingStoreException; /** * This method is called to store a set of Objects objects atomically. */ public BatchBackingStore createBatchBackingStore(Properties env) throws BackingStoreException; } "BackingStore"/** * An object that stores a objects (of type V) against an id (of type K). This class defines the * set of operations that a container could perform on a store. * * The store implementation must be thread safe. * */ public abstract class BackingStore<K, V> { protected String getStoreName() {...} protected void initialize(String storeName, Class<K> keyClazz, Class<V> vClazz, public abstract V load(K key, long version) throws BackingStoreException; public abstract void save(K key, V value, boolean isNew) throws BackingStoreException; public abstract void remove(K key) throws BackingStoreException; public abstract void updateTimestamp(K key, long time) throws BackingStoreException; public abstract int removeExpired(long idleForMillis) throws BackingStoreException; public abstract int size() throws BackingStoreException; //Typically called during appserver shutdown public void close() throws BackingStoreException { } //Typically called during app undeployment. public abstract void destroy() throws BackingStoreException; } @StoreEntry@Retention(RetentionPolicy.RUNTIME) @Target(ElementType.TYPE) public @interface StoreEntry { String value() default ""; } @AttributeRetention(RetentionPolicy.RUNTIME) @Target(ElementType.TYPE) public @interface Attribute { String value() default ""; } @Version@Retention(RetentionPolicy.RUNTIME) @Target(ElementType.TYPE) public @interface Version { String value() default ""; } . 4.5.3 Deprecated/Removed Interfaces:
4.6. Doc Impact:Any asadmin commands that set various attributes of availability-service need to change (to not use any hadb configuration) 4.7. Admin/Config Impact:Admin GUI pages that performed availaibility-service configuration need to change (to not use any hadb configuration) Identify changes to GUIs, CLI, agents, plugins... 4.8. HA Impact:Depends on GMS for communication APIs 4.9. I18N/L10N Impact:Does this proposal impact internationalization or localization? 4.10. Packaging, Delivery & Upgrade:4.10.1. PackagingThe BackingStore SPI classes will be available as an OSGi module and will be available in the web profile as well. The module will also contain a no-op BackingStore implementaiton. 4.10.2. DeliveryThere will be there jars delivered to support the in memory replication feature.
4.10.3. Upgrade and Migration:
4.11. Security Impact:N/A 4.12. Compatibility Impact4.12.1 Changes to BackingStore interfaceWe have added the following three methods to the BackingStore. These methods were added for Sailfin to support query feature. But, none of te GlassFish 3.1 containers will use these APIs. public Collection findByCriteria(Criteria<V> criteria, StoreEntryEvaluator<K, V> eval) { return (Collection) Collections.EMPTY_LIST; } public void removeByCriteria(Criteria<V> criteria, StoreEntryEvaluator<K, V> eval) throws BackingStoreException { } public Collection synchronizeKeys(Criteria<V> criteria, StoreEntryEvaluator<K, V> eval, boolean eagerFetch) { return Collections.EMPTY_LIST; } 4.12.2 Changes to availability-service element in domain.xmlListed below are the relevant elements and attributes that were used to configure availability-service. <!ELEMENT availability-service (web-container-availability?, ejb-container-availability?, jms-availability?, property*)> <!ATTLIST availability-service availability-enabled %boolean; "true" ha-agent-hosts CDATA #IMPLIED ha-agent-port CDATA #IMPLIED ha-agent-password CDATA #IMPLIED ha-store-name CDATA #IMPLIED auto-manage-ha-store %boolean; "false" store-pool-name CDATA #IMPLIED ha-store-healthcheck-enabled %boolean; "false" ha-store-healthcheck-interval-in-seconds CDATA "5"> <!ATTLIST web-container-availability availability-enabled %boolean; #IMPLIED persistence-type CDATA "memory" persistence-frequency %session-save-frequency; #IMPLIED persistence-scope %session-save-scope; #IMPLIED persistence-store-health-check-enabled %boolean; "false" sso-failover-enabled %boolean; "false" http-session-store-pool-name CDATA #IMPLIED> <!ATTLIST ejb-container-availability availability-enabled %boolean; #IMPLIED sfsb-ha-persistence-type CDATA "ha" sfsb-persistence-type CDATA "file" sfsb-checkpoint-enabled %boolean; #IMPLIED sfsb-quick-checkpoint-enabled %boolean; #IMPLIED sfsb-store-pool-name CDATA #IMPLIED> The current plan is to retain the same V2 availability-service element. The HADB related attributes will not be used by V3. The plan is to mark the corresponding config attributes with @Deprecated 4.12.3 Impact on Upgrade toolSince clustering was not in Glassfish v3, there is no upgrade from v3 to v3.1. Upgrade from glassfish v2 clustering applications to glassfish v3.1 is also NOT needed since we will retain the same availability-service elements from v2 domain.xml 4.13. Dependencies:Replication module depends heavily on GMS module for the following:
In terms of maven dependencies,
4.14. Testing Impact:In V2, GlassFish used buddy replication to replicate data. Some of the SIFT tests might have been coded for this specific way of replication. For example, after sending the requests to instance1, the tests may expect some log messages to appear in the instance2 server.log. These tests need to change. One approach is to rely on the 'replicaLocation' cookie (which tells the replica location) 5. Reference Documents:HA BackingStore classes (fisheye) Metro One Pager 6. Schedule:
6.1. Projected Availability:
|