March 09, 2010

  • Attendees: Dies, Anissa, Jennifer, Ken, Nazrul, Bill, Byron, Paul, Tom, ???
  • Today's meeting was a continuance of last week's cluster support discussion.
    The following will be the same between GF V2 and V3 clustering:
  • The DAS manages all the data of all nodes.
  • It sends out this data to the instances using some synchronization
    mechanism.
    What will be different in GF V3.1 is:
  • There will be no Node Agent at first. The main functionality of the
    node agent was to start and stop remote instances. This can be done by
    the OS's facilities.
    Some of the reasons of the changes are:
  • The node agent was basically duplicating the OS's services functionality.
  • The were scalability issues in GFv2.x where the DAS was suffering when
    many nodes started to synchronise.
  • A rewrite of the related code is required anyway because the way the
    domain.xml information is handled has changed radically in GF V3 anyway.
  • There were also issues because the code that was running in the
    cluster and non-cluster cases was different.
    One thing that needs to be investigated (tried out) is whether and how
    much optimizations are required for the synchronization process between
    nodes and DAS. For example, Ericsson used 40 instances on different
    machines.
    How to do the synchronization is still being considered. Using a
    database or SVN repository is a bad idea because of performance issues
    and SVN is not Java-only. Mature rsync implementations in Java do not
    seem available.
    There will be two timings of synchronization:
    1.
    When an instance starts. This could even be done just before the
    instance starts (from the asadmin command that starts the instance, as
    it has all the infrastructure to connect to the DAS anyway), instead of
    from the VM process of the starting instance.
    2.
    When operations are done on the DAS, which need to be propagated to the
    instances. The DAS could have a global unique incrementing number
    counting its state changes, so this number can be compared with the
    remote instances's numbers to quickly see if they are up to date.
    (Currently the plan is to use the timestamp of the DAS's domain.xml for
    this).
    In case of an instance restart, 1. could be skipped. The window of the
    restart is short, so no need to sync everything again?
    Two things that need to be taken care with is:
  • What should happen if a user tries to run an asadmin command on the
    DAS while the DAS is sync'ing its state with remote nodes? Queue the
    command? Does that mean the command won't come back until the command is
    completed? What if the synchronization brings down the DAS, how will the
    user know whether the command has been executed or not?
  • The DAS will propagate state changes to the remote instance. What if
    an instance is starting but not yet ready to accept the DAS's commands?
    How does the DAS know?
    -> Maybe we can make use of GMS here.
  • Second topic was about the issue Markus recently brought up.
    Our understanding is that according to Markus (external contributor),
    "real" Windows applications don't use batch files. Starting GlassFish
    from the Start Menu makes a command prompt come up briefly, which looks
    very clunky.
    See: https://github.com/javaee/glassfish/issues/6784
    There are other OSS that use batch files (like Maven), but that is no
    excuse for GlassFish if that is not the proper way.
    More importantly, when starting GlassFish from the Start Menu it should
    pop up a window. Depending on the functionality the GlassFish installer
    offers, we should try using a Short-Cut that uses 'javaw.exe' to start
    GlassFish to prevent the window popping up.
    On Windows 7 there might be other issues (privileges on executables).
    Furthermore, depending on Windows' (standard) ability of executing jar
    files (as they're mapped to java -jar) would lead to complaints from
    people who have mapped jar files to their zip utility.