The discussion mainly focussed around a list of design decisions.
The replication of the Sip Session and the Sip Applicaiton Session is done per application based on the deployment descriptor.
The replication of the DialogueSet and DialogueFragment is done per container. Probably as a part of the domain.xml.
The scope of the SS/SAS/DF/DS is based on either the modified-session or session scope. 2007-06-09: However, well written application should not rely on this functionality. In general the modified-session is prefered for performance reasons. Both options will be provided and tested. Should we not just pick one option only? This would also simplify the testing.
The sip-transaction is similar to the web-method frequency, i.e., replication is driven by the requests (incoming or outgoing).
The scope and frequency of the HTTP and SS/SAS replication must be the same.
Applications deployed on a container for which replication is not enabled will either fail, or will revert to non-replicating mode with a warning. We should consider to dynamically determine whether the DF/DS should be replicated based on the fact if there are any applications associated with the chain for which replication is active. If none of the applications associated with a DF have replication enabled, it is a waste of resources to replicate the DF/DS. If this is optimised, then allocating and initialising the JXTA resources can be delayed until the first application with replication enabled is deployed, increasing startup times. EVDV: Still I think that it might be useful to always enable replication for the container by default or at least have a default replication configuration. Explicit configuration of the container configuration might not be needed if replication can be fully driven by the applications on top.
Both Async and synchronous replication are supported. It must be noted that the synchronous replication does actually not do a full roundtrip handshake. There is code present for a full synchronous solution, but this has never been system tested. It also will perform worse than the current synchronous replication (it would require a response message). Before the sailfin release the replication framework will use a new version of JXTA that will offer various performance improvements. The queueing will be removed from the JXTA part and only the TCP queue will remain. AP EVDV + Johan: it should be evaluated whether the current synchronous replication is acceptable 20070906: The fully synchronous solution might suffer from hanging threads after a crash of the buddy. The GMS notifications indicating a buddy down will not unblock these threads. However, since the assumption is that a fully synchronous solution is not needed, this is not a problem.
After a failure we will not try to eagerly re-active the sessions. The mean time between failures in combination with the mean time for reactivation (driven by either a timeout on the SAS or by a message on the session) will ensure that the chance of session loss at a second failure is neglible. AP EVDV + Johan: check that the assumptions made by the model regarding mean time between failure and reactivation are acceptable. E.g., in practice the standard deviation of the failures might not be so high (e.g., more failures under high load situation).
We will monitor the timers in the replica for timer expiry. When a replica's timer expires, the replica will be activated. There will be at least one timer per SAS. This might prove a problem for the replica. After a failure of the primary instance, the replica might have to scan a million timers (300K sessions each with 3 or more timers). In general, the amount of objects is an issue. In a normal situation (no @SAK or encoded URI, only one application involved in the call) there will be four objects per sip session; one SS, one SAS, one CF and one CS (akthough the latter might be modelled as one replicated object). When stating 300K sessions per instance, is this assuming all of these are present? AP EVDV + Johan: make clear how many of each object should be present in the performance requirements of 300K sessions. It has to be investigated where the replica is activated on the timeout. AP EVDV + Joel: check if activating the timer on the correct instance is possible. If it is not possible to do the reactivation on the correct instance, then it could still be beneficial to spread the reactivations over the cluster. The chances of being activated on the 'right' instance is the same, but at least the load will not be concentrated on the replica of the failed instance. 20070906: The indications it would indeed be possible to activate on the correct instance . The BEKey is already stored as part of the SAS and the LBmanager is accessible from the sipsessionmanager. The LBManager would implement the consistent hashing algorithm. The same mechanism as the reverse repair could be used to trigger the loading of the session on the correct instance. It merely be one session id in the list instead of multiple session ids. The main question is how does the replica get the BEKey. Maybe it should then also be stored in deserialised form? Or we make this a two step approach. First activate the replica on the buddy, then peek inside, then trigger the migration to the correct instance?
For the timer based activation the replica must be able to check if the re-activated copy is the most recent version in the cluster (since there is no version number in the timeout). For this a peek functionality is needed, similar to the current load mechanism, but only reporting the version instead of the complete replica.
Migration of active sessions is needed in a UC based LB strategy (and in other situations). To support this there must be a variant of the load request that removes the active version. It is recommended that also all replica versions are removed. This should only be done when the re-activation or migration is completed. We have to be careful with making a destructive load; the code should be very careful to only call the load request once. EVDV: I still do not completely get this. If the load request is done twice, then the second one will not result in a broadcast since the correct version is already present. If both load requests occur on the same instance from different threads, then I see a potential problem similar to the locking problem. If the requests come from different instances then we are back to the locking problem. Not sure that if we solve the locking problem this will not also be automatically solved.
SipSessionsUtil.getApplicationSession(id) will only return the version of the SAS that is available on the local instance. If the SAS is not present on the local instance it will return null. It is a nice-to-have to also cover the migration, with a loss of performance.
Shutdown will be handled similar to failure. We will not actively re-activate sessions. There might be a queiscing, probably only based on message level. This means that the LB will not quiesce, but the operator will still wait for a while until all the traffic in the instance has terminated. There is no support for doing the actual shutdown based on the presence of traffic. This is purely a human decision and typically based on time instead of on counters.
After a restart or recovery a repair-under-load and a reverse-repair is done. Reverse repair needs information about the origin of the data.
We could optimise the latter by only loading from the buddy, but this would expose us to the risk of multiple active copies (e.g, in race conditions).
The objects (SS/SAS/DS/DF and possibly timers) are replicated separately.
Like stated above the replicas can become inconsistent. However, each object (or better artifact) also contains a version number. These version numbers not really be used for checking the consistency between artifacts since versioned references between objects lead to every object in a tree being updated for every replication of a part. This means that there in addition to the chance of getting inconsistent replicas, these inconsistent replicas can also not be detected.
The version numbers are used in the HTTP situation to request to check for staleness of the active session and to request the correct version of the session in case the active session is stale. In SIP there is no compatible cookie mechanism that can be used to transport the latest version of the session. In SIP the CSeq can be used for similar purposes, but has a lot of limitation. This means it can be used for a sort of staleness check, but that it should be taken more as a hint of a potential stale session in stead of a guarantee of a stale session. I.e., there may be gaps in the CSeq number range. This opens up the possibility of DOS attack. An malicious or badly written client that skips CSeq numbers will cause unnecessary load requests.
During upgrade the replication is suspended. The latter was an assumption made by me (Erik) which has to be validated if this actually occurs in normal failure situation. In addition to a lazy re-activation there might also be a lazy re-replication mechanism at work, where the FSD assumed an active re-replication mechanism (repair-under-load) for both the failure and the restore/restart case. AP: Jan, check with Larry if we do lazy re-replication or repair-under-load.
The servlet context will not be replicated. The JSR 289 proposed way to do this is to allow the application to add additional keys to a SAS (called linkSession now in sailfin, but likely something like addKey() in the final version of JSR289). It might complicate the design if the application can lookup a SAS by its secondary key. The replication framework has to be aware of the secondary keys. Possibily the secondary index can be created on the fly?
There are two ways of ensuring that the number of sessions and the related memory is kept in check. The overload protection mechanism from EAS 4.1 In addition the active cache has a setting that controls the maximum number of active sessions. Together with an estimated session size this should give some protection as well
Discussion still ongoing; proposal is to reject requests for migrating a locked active session.
There was a proposal from /// to do a reconciliation. This would remove all the duplicates and stale replicas from the cluster after a merge.
Just like replication is based on single artifacts, so will migration and re-activation be.
The LB will not do any queiescing before shutdown. It will handle this similar to failure. However, the instance is not shutdown until it has hasd the time to handle the ongoing requests. This could be based on some counter of the nu mumber of ongoing requests, but this will not be automated.
We will use a dual/redundant ethernet setup. |