Server start and restart

Server restart has been a highly desired feature for a long time. The Admin Console Team is especially excited about this feature so that a user can easily restart the server after making configuration changes that require a restart. The user will be able to restart remotely.

There are several ways to implement server restart. All such ideas of course require some sort of process to be running on the server's behalf so that it can restart the server. Two different approaches for implementing the behavior are discussed here. The two approaches have wildly differing levels of \

  • development cost
  • ease of use
  • reliability

Asadmin as a Watchdog

  • development cost: low
  • ease of use: medium
  • reliability: high

This solution leverages an already existing process and behavior. This is overwhelmingly our choice for V3 final.

Background

asadmin start-domain is the supported way to start the server in an external process. It will start the server in a new JVM and exit. If you add the --verbose option then asadmin starts the server and remains running waiting for the server to exit. asadmin holds a reference to the Java Process object of the server's JVM. It also has a console window. It is called "verbose" because it echos all log messages to its console window. If the console window is disposed of (e.g. by entering ^C) then not only is the asadmin process killed but the server's JVM is also destroyed.

Goals

  • Add a new option to asadmin start-domain. Namely the --watchdog option.
  • start the domain on the domain machine with a watchdog process running
  • The watchdog does not require a console window but will have one if --verbose is also chosen.
  • A new CLI Command will be added – restart-domain.

Details

  • The user starts the domain with asadmin start-domain --watchdog
  • If verbose is not also set, asadmin does not directly start the domain. Instead it creates and starts another JVM which in turn runs start-domain. The original asadmin then exits.
    • Why? This is how we get rid of the unwanted console window.
  • If verbose is set then we do everything in this asadmin process.
  • The user runs the asadmin restart-domain command, which makes the JVM process exit with integer value 10.
  • asadmin sees the special return value and starts up the server again with the original arguments.

If the startup fails for any reason in this restart mode, launcher gives up and exits with whatever exit code server process exits with. The launcher clearly logs/displays the errors so that the corrective actions can be taken. For this release, no retry attempts have been specified. This facility is used to do a clean implementation of domain restart from the admin console/CLI.

Notes

  • If the server won't stop properly via a System.exit(10) then a restart is not attempted.
  • killing the server JVM from the OS itself results in an exit value of 1 - so no restart is attempted
  • We purposely don't want to add a state to asadmin where it will always try to restart – e.g. asadmin start-domain --alwaysrestart
  • The new CLI command, restart-domain should be very easy to implement. It is exactly the same as stop-domain with one difference. It returns 10 instead of 0.
  • For now, starting the domain with asadmin --watchdog will leave a console window running. The stretch goal will be to guarantee no console window. I would like to get approval of this first from reviewers because it is fairly complicated to implement.

Documentation Ideas/Requirements:

  1. User must start the domain with the --watchdog option to enable restarts
  2. If the previous step is forgotten then a call to the remote command, RestartDomainCommand, will simply stop the domain.
  3. The --watchdog option must be added to the CLI man page for start-domain
  4. A new man page must be created for the new remote command – RestartDomainCommand (asadmin restart-domain)

QA Testing Suggestions:

  • after running this: asadmin start-domain --watchdog
    • call asadmin stop-domain. The domain should stop and not restart.
    • Kill the domain externally via, say, the kill command. The domain should NOT restart.
    • Kill the console where asadmin start-domain --watchdog is running by entering ^C. The domain should stop and not restart.
    • Run jps then asadmin restart-domain then jps again. You should see the server running with some pid, then it should reappear in the list with a different pid. The console window should say "Restarting Domain..."
    • Repeat the previous step over and over and over.
  • asadmin start-domain --verbose
    • Restarting should NOT work. It should only work with the --watchdog option
  • asadmin start-domain --verbose --watchdog
    • This should work ''exactly' like * asadmin start-domain --watchdog except that log messages are printed to the console.

II Server Reincarnation

  • development cost: very high
  • ease of use: very high
  • reliability: medium

In this solution the server itself starts a separate JVM process that will run asadmin start-domain. The running server starts the new process when it is near death. Asadmin will be given the process ID (pid) of the running server's JVM. Asadmin waits for the operating system to verify that the process has died.

Notes

  • Guaranteeing the running server's demise is the sticky and expensive part of this solution but without it it will be far less reliable.
  • Windows assigns pid's in a much less clear way than *NIX. What if Windows reuses the dead server's pid right away before asadmin has had a chance to check? Asadmin will wait for that new random process to stop.
  • There is a requirement for native code. The plan would be to use JNA. Nevertheless it is still native code.
  • This solution is great from a user's perspective because the server did not have to be started in any special way
  • All running servers are capable of restarting remotely or locally