GlassFish Server Open Source Edition 3.2 - Platform Services One Pager

(template version 1.92)

1. Introduction

1.1 Project/Component Working Name:

GlassFish Server Open Source Edition 3.2 Platform Services Improvements

1.2 Name and Email Address of Author

Byron Nevins

1.3. Date of This Document:

Started: 04/18/11

2. Project Summary

2.1. Project Description:

Platform Services Improvements

2.2. Risks and Assumptions:

Platform Services are, obviously, dependent on native code and tools for multiple operating systems.  There are many many flavors of Linux.  We assume that the ancient common denominator of init.d scripts will work on all Linux deployments. 

Is it possible for a savvy and/or clueless user to foul-up his operating system configuration so badly that it is impossible for our Platform Services to work?  Yes!  E.g. he or she can redefine all the standard run-levels.  We will not support things like that.

Assumption:

  • I will have access to the appropriate hardware so that I can develop and test on multiple platforms
  • I will be able to purchase reference materials and have time to do the necessary research about the multiple platforms.

3. Problem Summary

We need richer support for running the native tools on multiple platforms for handling the lifecycle of GlassFish servers.

3.1. Problem Area:

We have support, right now, for creating services on Windows, all versions of Linux, and Solaris 10+ SMF. We need to extend this to include all versions of UNIX including non-SMF Solaris.

After creation the user is on his own and must use the platform's native tools for managing the services.

3.2. Justification:

It is important to do this in order to make Platform Services closer to the Java ideal of "write once - run everywhere". Services for our product will be platform-independent, as far as the user is concerned.
In addition, the experienced and/or intrepid user is free to use the platform's native tools instead of or in addition to our tools.

4. Technical Description:

4.1. Details:

Linux and non-SMF UNIX will be done at the lowest common denominator. We manipulate scripts directly in the special services area of the file systems. This was all developed over 40 years ago. This makes it old-fashioned and very time-consuming but phenomenally well-tested, well-documented and robust.

SMF will be worked with directly when available on the platform.

Windows will be worked with directly.

You can see in the existing code precisely how we interact with these 3 main implementation areas. Each area will be expanded to include the new features.

4.2. Bug/RFE Number:

RFE

4.3. In Scope:

4.3.1 New commands

New Asadmin commands will be developed to join the already existing create-service command:

  • delete-service
  • list-services
  • start-service
  • restart-service
  • stop-service

4.3.2 New Auto-Restart Behavior

If the user wished to have a service restart automatically he was required to set this up using the platform's native tools. We now will take this over internally. We will add a new Service Property (see create-service details below) that sets the number of times a restart is attempted after a crash.
A very nice feature here is that our code knows the difference between a deliberate shutdown and a crash. If it is deliberately shutdown we do nothing. If it crashes then we will restart if the conditions apply.

By default we will try to restart a server that crashes 3 times and then quit.

4.4. Out of Scope:

We will not support multiple ad hoc tools available on particular Linux versions. E.g. Ubuntu has one way of working with services and Red Hat has a completely different solution. They all share the ancient tried and true solution.

4.5. Interfaces:

4.5.1 Public Interfaces

As mentioned before the new public interfaces are the new asadmin commands discussed above.

4.5.1.1 create-service

For completeness, here is the usage for the existing command from 3.1 One change is planned which is to make --force true by default instead of false.

  • --name optional, default is the domain or instance name
  • --serviceproperties optional, no default See below for new option.
  • -n or --dry-run optional – do everything except the actual creation
  • --force default is false in 3.1, In 3.2 the default will be true. This simply undoes a previous service creation
  • --domaindir the usual option possibly needed for commands that specify a domain
  • --serviceuser Specify a different user that will run the commands on Linux only
  • --nodedir the usual option possibly needed for commands that specify an instance
  • --node the usual option needed for commands that specify an instance
  • --help the usual help command
  • domain_or_instance_name the name of the server
4.5.1.1.1 AutoRestart option

RESTART_TRIES=number where -1 means restart infinitely, 0 means do not restart, and any other number means try that many times before any of these commands reset the counter (as appropriate to the server type of course):

  • stop-service
  • stop-instance
  • stop-domain
  • stop-local-instance
  • restart-service
  • restart-domain
  • restart-local-instance
  • restart-instance
4.5.1.2 delete-service

This command exists in 3.1 as the undocumented command named _delete-service
In 3.2 it will be exposed as delete-service with these options:

  • --name optional, default is the domain or instance name
  • --domaindir the usual option possibly needed for commands that specify a domain
  • --nodedir the usual option possibly needed for commands that specify an instance
  • --node the usual option needed for commands that specify an instance
  • --help the usual help command
  • domain_or_instance_name the name of the server
4.5.1.3 list-services

This command will report on all GlassFish Services that are discoverable. I.e. all such services that were created by create-service.
Options:

  • --domaindir If specified, will look inside here for domains
  • --nodedir If specified, will look inside here for instances
  • --node If specified, will look inside here for instances
  • --help the usual help command
4.5.1.4 start-service

The start-service command will start a service that is not currently running.
Options:

  • --name optional, default is the domain or instance name
  • --domaindir If specified, will look inside here for domains
  • --nodedir If specified, will look inside here for instances
  • --node If specified, will look inside here for instances
  • --help the usual help command
4.5.1.4 stop-service

The stop-service command will stop a service that is currently running.
Options:

  • --name optional, default is the domain or instance name
  • --domaindir If specified, will look inside here for domains
  • --nodedir If specified, will look inside here for instances
  • --node If specified, will look inside here for instances
  • --help the usual help command
4.5.1.4 restart-service

The restart-service command will restart a service that is currently running.
If the service happens to not be running then it will be started.
Options:

  • --name optional, default is the domain or instance name
  • --domaindir If specified, will look inside here for domains
  • --nodedir If specified, will look inside here for instances
  • --node If specified, will look inside here for instances
  • --help the usual help command

4.5.2 Private Interfaces

No new private interfaces that are observable externally.

4.5.3 Deprecated/Removed Interfaces:

The asadmin command, _delete-service will be removed. It is, by definition, not supported so is not really a public interface.
I list it here for completeness.
The command will be replaced by the fully-supported command, delete-service.

4.6. Doc Impact:

Rather extensive new documentation will be required for this feature.

4.7. Admin/Config Impact:

This change is part of Admin, so the impact has been discussed already above.
The Admin Console will need to support and use the new commands.

4.8. HA Impact:

None.

4.9. I18N/L10N Impact:

No impact.

4.10. Packaging, Delivery & Upgrade:

4.10.1. Packaging

No new packages will be necessary.

4.10.2. Delivery

No impact.

4.10.3. Upgrade and Migration:

No backward compatibility issues. We plan on requiring zero changes to domain.xml so no upgrade issues there.

4.11. Security Impact:

Obviously there are huge security implications for running services on any platform. But there is nothing new to consider for this feature enhancement.

4.12. Compatibility Impact

Old interfaces remain the same. We are adding new commands and functionality - not modifying the existing behavior.

4.13. Dependencies:

4.13.1 Internal Dependencies

Common utilities.

4.13.2 External Dependencies

winsw.exe is a C# program for wrapping java programs as Windows services. It is a Kenai project and is in GlassFish 3.1 already so it should not be an issue.
Sathyan Catari has expressed interest in developing native code to replace winsw. Winsw works fine but doesn't report on errors very well. Meantime I don't know the C# language nor do I have any tools for building C# applications. I believe we can live with Winsw as-is.

4.14. Testing Impact:

This is a very difficult area to test. We need many different platforms. I don't believe it was tested thoroughly for GF 3.0 or 3.1. I did plenty of manual testing on the platforms I have access to for 3.1 (Solaris 10, Windows XP, Ubuntu).
This is an example of an area where, in my opinion, it would be a waste of resources to make automated tests. Instead what we need is a dedicated Engineer running through all of the different commands and observing results on many many different machines.

5. Reference Documents:

List of related documents, if any (BugID's, RFP's, papers).

Explain how/where to obtain the documents, and what each contains, not just their titles.

6. Schedule:

Use this link to see the issues and assigned milestones

Issue What Milestone
10266 delete-service command 3
16522 list-services command 3
16526 restart-service command 4
16523 Add support for all UNIX 4
16525 auto restart 5
16311 improve OS integration 5
16140 -Xrs (signals) issue on Windows 5
11692 Advanced Improvements for SMF 6

6.1. Projected Availability:

Indicate which milestones from the current schedule the project
will be:

  • Initially integrated: Varies. The new commands and functionality are staged across milestones 3-6
  • Feature complete (ready for handoff to QA): MS5
  • At production quality level: MS6

7.0 Reviewer Comments

From Tom Mueller, April 26, 2011

1. Section 3.1 talks about supporting more platforms (non-SMF solaris), but I don't see this in RFE 16311. Where did this come from? I'd suggest removing it. Would this include adding AIX too?

2. Section 3.1 doesn't talk about implementing 3a. Specifically in the case where you have a service that will automatically restart the server, if the user runs stop-instance, the instances shouldn't be restarted. Also, on Windows, there is the notion of a service being in Manual or Automatic mode. Does a start/stop-instance effect that setting?

Maybe what I'm looking for is to see your last comment in the one-pager. Or maybe a reference to the RFE in section 3.1 saying that the details of exactly what will be implemented are there.

3. I'm not convinced that we really need start-service, stop-service or restart-service. P3?

4. Section 4.5.1.1.1 - is RESTART_TRIES passed in as a "--serviceproperties" option? It's really all upcase with an underscore?

5. WRT restarting a server, is this implemented by the GlassFish code or via the automatic mechanisms of the OS? I seem to recall that SMF has a restart mechanism itself.

6. If RESTART_TRIES is set to 3, I can see not trying to start the server again if it fails to start 3 times in a row. But what happens if the server crashes once per day for 3 days. Or how about once a month for 3 months? Does it not get restarted then? Should there be some timeout after which the count is reset?

7. Section 4.12: Isn't the change of the default value for the --force option a compatibility issue?

8. Maybe this is for the design spec, but it would be really helpful to have a state transition diagram that shows what happens for various actions, including the GF commands (stop-instance, etc.) as well as the OS commands "svcadm ..." or GUI actions.


From John Clingan, April 27, 2011

3.1 Technically, "We need to extend this to include all versions of UNIX including non-SMF Solaris." is not a requirement.  The solution only needs to include supported platforms, but the supported platform matrix may change in the future :-)

4.3.1 Regarding Tom's comment about (re)start/stop-service, how would (re)start/stop-instance (or domain) differ? Wouldn't (re)start/stop-instance do the same thing? Byron, is there a semantic difference in your mind?

4.3.2: Can you clarify "we will now take this over internally"? Does this mean the DAS?  Solution must be able to watchdog 100 remote instances today, and perhaps 100's in the future.  Also, note that *all* the solutions require administrator/root privileges (or "process management" privileges in RBAC environment).  Since this is the case, why not use inittab for non-SMF Unix/Linux environments to distribute the watchdog role?  Also, how would auto-restart deal with the competing SMF & Windows Service restarting roles?