Notes from running out of threads when running start-instance command in parallel (13021: simultaneous start-instance commands fail).

What is unique to this command is that it will run another command (start-local-instance) which in turn calls back to the DAS to sync the file system. This calling back uses another thread in grizzly and we can end up with a deadlock where several start-instance commands have started and called the start-local-instance command which are all calling back to the DAS to synchronize the file system. We can end up in a situation where there are no threads are available for the sync calls and none of the commands can complete successfully. Eventually commands start to time out. We have increased the size of the admin thread pool but there can always be the case where the user tries to start large number of instances in parallel and exhaust the threads in the pool. These threads are in the grizzly thread pool.

The following proposal will allievate the problem and allow commands like start-instance to run in parallel successfully regardless of the number of instances the user is trying to start in parallel. We specifically don't want an unbounded thread pool as that could be a security breach. Therefore, we want to release the grizzly thread so it doesn't wait for a long running command to complete while still waiting for the initial command to complete before returning execution. This is done by using a custom thread pool in the AdminAdapter code.

  • Add a new annotation for commands like start-instance. The new annotation is called @UseThreadPool(name="pool-name"). A new thread pool will be declared within the "server-config".
  • AdminAdapter will access this new thread pool.
  • The original thread that was servicing the command will have the grizzly context and will call   grizzlyResponse.suspend() to signal to grizzly that the thread can be returned to the thread pool.
  • A new thread from the thread pool will be used to execute the command.
  • When processing is complete the new thread calls resumes on grizzly using grizzlyResponse.resume();
  • AdminAdapter code is still responsible for building the Action Report with the results of the command

Here is some pseudo-code based on what Alexey sent:

static class AdminAdapter extends GrizzlyAdapter {

        @Override
        public void service(final GrizzlyRequest grizzlyRequest,
                final GrizzlyResponse grizzlyResponse) {

            // get the command, check if it is annotated with @UseThreadPool
            if (annotated) {
                grizzlyResponse.suspend();   // Suspend response here
                threadpool = get thread pool named in annotation

                threadpool.execute(new Runnable() {  // Run task in the separate thread

                    @Override
                    public void run() {
                        try {
                            doCommand(....); // run the command the same way it is normally run, but in a different thread
                            // write the response (same code that is at the end of AdminAdapter.service() )
                        } catch (IOException e) {
                        } catch (InterruptedException e) {
                        } finally {
                            grizzlyResponse.resume();  // finish the HTTP request processing
                        }
                    }
                });
                return;    // return from the command, this releases the thread to be used for another request, but doesn't finish the response
        }
    }