StAX.next Discussion This page has been created as a place to share ideas about the next version of StAX, i.e. StAX.next. If you are interested in discussing any of the other JAXP components see JaxpNextDiscussion. Null vs. Empty String Inconsistencies Issues with DTDs The specification states that when an XMLStreamReader is positioned on a DTD event, calling XMLStreamReader.getText() returns the internal subset. Currently many StAX implementations actually return the full DOCTYPE declaration. Clarification is needed here as to what should be the correct behavior. If the correct behavior is to only return the internal subset, then there must be APIs available to extract the other details (such as the element name, system and public identifiers) so that XML object models that are built from StAX can be built correctly. The attached [MSXMLStreamReader|http://wiki.glassfish.java.net/attach/StaxNextDiscussion/MSXMLStreamReader.java and [MSXMLStreamWriter|http://wiki.glassfish.java.net/attach/StaxNextDiscussion/MSXMLStreamWriter.java interfaces are extensions to the StAX APIs (which includes Typed extensions) that we have had at the firm for quite some time. Before going into the details of the extensions, I'd like to take this opportunity to thank Jason Petrone who should also take credit for these APIs but has since left the firm. The key additional APIs for the XMLStreamReader are:
- getValueAsXXX, for example getValueAsInt().
- getAttributeValueAsXXX, for example getAttributeValueAsInt().
- getValueListReader().
- next(int event, String nsURI, String localName).
- skipElement(), skipParentElement().
- getDepth().
- bookmark().
In our API, primitive types are considered synonymous to a CHARACTERS event. Hence the getValueAsXXX methods are valid on states where getText() would be used. Underneath the covers the appropriate conversion is made with dates assumed to be in ISO 8601. There are additional typed extensions for attributes which perform a similar conversion to getValueAsXXX. For APIs that are returning a primitive type (for example getAttributeValueAsInt()), a NoSuchAttributeException is thrown because null cannot be returned. To conclude the Typed extensions there is something known as a ValueListReader. This caters for the case where the content of an element is actually a list of values delimited by some character. This is particularly useful for matrix type content. An overloaded next(int, String, String) method has been added which is similar in nature to requireCheck() but actually advances the reader to that event. The namespace uri and local name are optional, if not set the respective comparison is ignored. The skipElement() and skipParentElement() APIs allow the current element to be skipped or the parent element to be skipped. This subtree skipping functionality is very useful when performing XPath over a stream. For convenience there is an API that exposes the current element depth to avoid the need for users to manage this state themselves. Finally we have just added the ability to bookmark() the reader. Essentially this provides multiple mark/reset functionality of an XML stream. This effectively adds random access functionality which is especially useful in anything that requires some form of speculative processing. The key additional APIs for the XMLStreamWriter are:
- writeXXX, for example writeLong().
- writeAttribute(..., XXX)
- writeListSeparator()
Similar in vain to the reader extensions there are APIs for writing out primitive element content as well as primitive attribute content. There is also an extension to deal with delimiting a list of values. Write the XML Declaration using StreamWriter Currently, there are two processes that may determine the encoding: 1) Creating a stream writer using the XMLOutputFactory. In this process, the encoding may be set but is not required. There's no default specified. 2) Using the XMLStreamWriter's writeStartDocument methods. Defaults were specified as the XML version to 1.0, and the encoding to utf-8. The 3rd writeStartDocument method, e.g. writeStartDocument(String encoding, String version), states that XMLStreamException should be thrown if given encoding does not match encoding of the underlying stream. These specifications lead to the following issues: a. There's no explicit preference description that would give a clear understanding as to which encoding should be used. A logical deduction may be: encoding of the underlying stream, encoding set by using the 3rd writeStartDocument method e.g. writeStartDocument(String encoding, String version), and default encoding specified in writeStartDocument definition; b. Due to the above reason, assuming default encoding for the 1st and 2nd writeStartDocument methods could be in conflict with that of the underlying stream. c. Assuming encoding could be set using writeStartDocument(String encoding, String version) may also cause conflict since the encoding of the underlying stream may be different. Proposed solutions: Proposal 1: this proposal focuses on maintaining backward compatibility to avoid the long process it would take to make any behavior change in the jdk. This solution requires no change to applications that uses any of the methods to write xml documents. 1) XMLOutputFactory: add three setters and getters, setXMLVersion/getXMLVersion, setXMLEncoding/getXMLEncoding and setStandAlone; Defaults: version 1.0, encoding "UTF-8", standalone "not set". The encoding set by setXMLEncoding is used to create the XMLStreamWriter, however, it may be overridden by the one specified in the create writer method. 2) XMLStreamWriter: add writeXMLDeclaration() – write the xml declaration using the version, encoding and standalone information set on the XMLOutputFactory. 3) XMLStreamWriter: deprecate the writeStartDocument methods. Write the XML Declaration using XMLEventWriter The XMLEventWriter has similar problems that may be a little more complicated given that the XMLEventFactory has no knowledge of the XMLEventWriter. The XMLEventFactory contains 4 createStartDocument. Notable differences are follows: 1) createStartDocument(String encoding) vs. XMLStreamWriter.writeStartDocument(String version) 2) additional createStartDocument(String encoding, String version, boolean standalone). 3) no defaults specified 4) no conflict specified, nor in the XMLEventWriter.add(XMLEvent event) method. Proposed solutions: Proposal 1: this proposal focuses on maintaining backward compatibility, as well as align the solution with that for the XMLStreamWriter 1) XMLEventWriter: add createXMLDeclaration(XMLOutputFactory) – using the properties set on the XMLOutputFactory to create an StartDocument event. 2) XMLEventWriter: deprecate the createStartDocument methods.
|