Feedback on Rolling Upgrade Design Erik vd V: The document has a focus on Rolling Upgrade with SSR. A mechanism is presented that ensures session retention for Sip sessions. But somewhere it should also be described what mechanisms are in place for retention of other data during rolling upgrade. E.g., how is EJB stateful session bean retention ensured. Is this mechanism also based on the backup-restore-reconciliate principle or on eager replication and eager re-activation? Also the HTTP sessions are not really described in the document. Stoffe's comments for review 1/9 2008 Title EVDV: done If the backup is optional then when it is not used how much more is it to a normal shutdown? EVDV: PA5 already states when it could be wise not use backup: In case an application does have a small replica cache with a high session modification rate, the approach willl not work optimally, since almost all the restored data will have become obsolete by the time the instance is restored. By making the backup optional, and the reconciliation resistant against partial backups, there is an option to cater for installations with these kinds of applications as well. When the sequence is integrated in the shutdown, there would be no difference at all between a normal shutdown and a non-backup shutdown. There is in general too little distingtion between UDP & TCP since both react differently to the scenarios. EVDV: I suppose this mainly applies to section 2.2.2. It is stated that for outgoing requests, due to the fact that the outgoing connection can not be established a 5xx error is returned. This is regardless of TCP or UDP. Is this not what you mean? This will stop retransmissions, I hope. One thing I can not understand is why it is bad to have a transition state where all connections are still up but only the listeners are closed? It should be that existing responses would still arrive completenig transaction while if the current instance generated a response then there is also an already established TCP connection to the other node. That one could be reused! EVDV: The CLB team did not want to introduce a asymetric implementation where there would be response stickiness, but no associated request stickiness and no FE stickiness. This is actually what was my preferred alternative, but since the (limited) BE quiescing was not needed to meet the requirements, I let it go. A new request that arrives on a existing connection should be droped or rejected with a retry after but most of the cases this is an error case or a connection reuse scenario. 2.1.2.1 EVDV: the problem is that the 408 is internally generated. There will be a different view by the client and by the server. The client can think the transaction succeeded. so the client and the BE will have a different view on the same transaction, and on the same dialog/session. The client will expect any changes resulting from this response to have been applied. 2) It says not yet persisted transaction (But the transactions are never persisted anyhow!?) EVDV: Whoops. I modified the text. It now talks about not yet completed transaction and mentions that replication only happens on completed transactions. Also where is the problem? In the node that is shutting down it will not receive the request but that is expected. EVDV: Let's take the ACK. The ACK will be routed according to the rehashed BE, where it is dropped (no 481 is returned on an ACK). The client will not notice this. It will consider the ACK transmitted. However, since the server orginally handling the INVITE did not see the ACK, it will retransmit the INVITE (if it is still capable of doing so...). 3) What happes to the response is very dependent of what is put in the Via header... EVDV: but the Via header now always includes the FE that was used to route the request, unless it was a co-located BE that handled it. This is because we decide to always route the response via the FE that handled the request to enable conneciton reuse in the TCP case (and for SSL). 2.1.3 Why is there interaction with VIP (This should be done under the hood by the vip impl when closing the listener) EVDV: What you want to do is close existing external connection, but keep internal connections open and optionally, allow new internal connections to be established. This requires actively closing of the external connections (difficult). It also requires some mechanism to avoid new external connections while allowing internal connection establishment. This could be done in several ways; different listeners for internal and external connections. This was too difficult to implement according to the CLB. Or not closing the port, but ensure that the IP-sprayer does not route any more external traffic to the instance. This is where the VIP communication comes in. The baracuda has some SOAP API to achieve this. We could achieve something similar with VIP. 2.2.2 Is the session leakage not fixed with the timer on every SAS? 2.6 The connection closing is not so good for TLS. The idle closing is already in the GrizzlyNetworkManager. EVDV: I do not really understand this. Sure we will loose transactions when closing the TLS. But this is an acceptable loss. Sreeram's comments
EVDV: will add an abbreviations and concepts section
EVDV: Well, the question is who automates this. According to MMAS, it is acceptable if the building blocks are available and the automation is done in the SAF context.
EVDV: Good point. I left out the roll-back capability completely. The idea was that there are several ways to roll-back. Either we do a smooth roll-back, sort of a rolling upgrade, but with the old version. Another way would be to do a complete reboot (with loss of all sessions). Note that such a backup and roll-back solution is larger then only the SSR, it would also involve things like DB backup etc.
EVDV: THere is a requirement on session loss (2.3.7.1). There is a mention of transaction loss as well (2.3.7.2). Furthermore, it is stated that from upgrade will be modeled as shutdown/restart from a requirement perspective. There is mentioned that for each instance failure there is an acceptable transaction loss of 1/n-th of all ongoing transactions. For a complete rolling upgrade, this will be equal to all ongoing n times 1/nth of the transactions. Should this be made more clear?
EVDV: We do not rely on lazy replication mechanisms to establish the invariant that there are two copies of every data item. However, we stop there. We do not actively start migrating sessions back to establish a situation where everything is on its home instance. One of the reasons for this is that eager migration has a chance of failing due to locks that are obtained. The lazy variant of migration uses the retry-after to place the retry problem at the client. An eager migration variant would have to take responsibility for these retries. And, again, it is not needed to due eager migration for robustnes, the only thing that is important is that every data item has two copies. In Section 2.2, Step 10: Reconciliation, your statement seems to suggest that my expectations are met. Pls. clarify. EVDV: Except for eager migration, they are. I'll clarify the text.
EVDV: ??
EVDV: Yes, I will add some suggestions. The main one is to use the SAS timer. Also if there are problems with the established RTP sessions, this might be a hint to the app.
EVDV: We do not. Drain is for things on the wire, in the TCP buffers etc. There is no way of monitoring that all. I think it will be a time based drain.
EVDV: we made the backup phase optional. If the app expects that the majority of the data will have been outdated when the instance is restored it can rely on the reconcilliation. It is like a diff between an a full and an empty file, in that case. The design is not really optimised for this, since the eager replication and reactivation done during reconcilliation are not bulk, but on a per item basis. The eager re-activation is even a broadcast, a number of acks, another broadcast and a save, so not in-expensive.
EVDV: THe numbering of your doc must be different. 2.4 is not empty in my copy.
EVDV: Yes. But only when configured to do so. This should only be for some time after restart or at any time the operator sees or expects to have unbalanced connections. This is not what I would consider the final solution to this problem, but a sufficient solution... — Binod's comments
EVDV: this would be, e.g., something that is already in the TCP buffer on the BE or still on the wire.
EVDV: It say the back up is to a file. The file can be on a ram-disk. In a normal application upgrade or AS upgrade scenario, this should be very fast.
EVDV: I do not understand the quesstion.
EVDV: I agree.
EVDV: I agree that this is a short term solution. I do not think there will be much extra transaction loss (see reply to Sreeram).
EVDV: 10 instances in 2 hours is what the requirements say. We really have to get some feedback on this from some early tests we can do here. Adrian did do something, which suggests that the bottleneck may be in the serialisation/deserialation effort and not so much in the file access. Also we have to investigate using the full potential of multi-core to speed up the process. Bhavani's initial comments: 1. In section 2.1.2.1, another possible case is Outgoing associated request: ACK in case of INVITE, for example. When the BE sends out the ACK, the dialog structure is replicated. But, by that time if FE is not available, the ACk does not get sent to the client. EVDV: I think this is not true. The ACK is a new transaction and does not have to follow the via header. Instead it is sent directly to the contact or via the routes. The FE is not included in that, meaning that the ACK will be sent directly from the BE to the client. At least that is my understanding. Since the ACK does not reach the client, the client will resend 200OK response. If that response ends up in a different instance, then there are 2 possibilites: (a) If the dialog structure was already replicated, then the replicated DS is used. But eventually the replication will happen leading to a session leakage. The replicated session(s) will remain in replica cache until the load-advisory is issued and re-activated in another instance. 2. In section 2.5, since the expat list is disabled during the rolling upgrade, we should probably add a note saying that the "broadcast" is always used during session creation or session lookup. EVDV: true. I will add such a note. 3. In general, do we support the neighbouring instance going down when the instance is being rolled? If not, what is the advantage of storing the sessions to the disk? EVDV: We do not have any requirements on session retention in case of failures during a rolling upgrade. Having said that, it is interesting how this would be handled. 4. When the instance is being rolled, if the session(s) gets re-activated in another instance, then does the session(s) get removed from the replica cache from where it got re-activated? EVDV: yes. For example, let us have i1, i2, i3, i4 instances with i1 being rolled. Let us say a traffic requests s1 (which belonged to i1 before the roll), and lets assume s1 gets activated in i3. When the s1 gets reactivated in i3, will it get removed from replica cache of i2? EVDV: yes. This is part of the normal reactivation, also applies when there is no rolling upgrade. If it does not get removed, then s1's replica will be there in both i2 and i4, which might lead to problems during active cache reconciliation. |