The Problem

12942 Mac: restart fails for instance started with start-instance

When an instance is started via SSH (using start-instance) it appears to run OK. But when the instance is restarted the newly started instance fails to start because it can't resolve the hostname of the DAS. It also has other symptoms – like the user.name Java system property is set to "?".

Mac only. Solaris and Linux AOK.

The Root Cause

On the Mac a process has an Execution Context that scopes what Mac OS services the process can locate. These services provide basic functionality – like resolving hostnames and getting the current user name. When a process is started over SSH it looses its context when the SSH session ends. This means the process (and its children) can't do basic things like looking up a hostname. That's what is happening in the case of our bug. As long as the SSH session is open all is well, but once it is closed things can break. The symptom we see is the restart bug.

The Fix

Mac OS provides a special utility that can be used to launch long-running processes which preserves an Execution Context: /usr/libexec/StartupItemContext

The proposed fix is to change start-local-instance so that, on Mac OS, it uses StartupItemContext to launch the JVM of the instance like this:

/usr/libexec/StartupItemContext /usr/bin/java . . .

The new code in GFLauncher.launchInstance() looks like this:

List<String> cmds = null;
        if (OS.isDarwin()) {
            // On MacOS we need to start long running process with
            // StartupItemContext. See IT 12942
            cmds = new ArrayList<String>();
            cmds.add("/usr/libexec/StartupItemContext");
            cmds.addAll(getCommandLine());
        } else {
            cmds = getCommandLine();
        }

In addition to fixing the original bug (over SSH), this change will allow the user to start an instance (or a domain?), then log out and still have the instance behave well and be restartable.

Todo
  • Test
  • Finish reading technical note
  • Check behavior when start-local-instance (and hence /usr/libexec/StartupItemContext) is run by a non-admin user.
References