This is the first of three articles describing the ServingXML pipeline language.
ServingXML is a language for building XML pipelines, and an extensible Java framework for defining the elements of the language. This article gives a short introduction to some of the basic ideas. It focuses on pipelines where the input is an XML stream and the output is a serialized XML stream.
ServingXML responds to requests by invoking a service, which in turn reads content and subjects it to a number of transformations, and finally writes output.
ServingXML makes it easy to implement SAX pipelines like Example 5 in Michael Kay's XSLT 2nd Edition Programmer's Reference, Appendix F. This example is a three-stage pipeline, where the first stage is a SAX filter written in Java, the second stage is an XSLT transformation, and the third stage is another SAX filter written in Java. In ServingXML, it may be expressed as follows.
Figure 1. SAX pipeline
<sx:resources xmlns:sx="http://www.servingxml.com/core"> <sx:service id="myPipeline"> <sx:serialize> <sx:transform> <sx:saxFilter class="PreFilter"/> <sx:xslt> <sx:urlSource url="filter.xsl"/> </sx:xslt> <sx:saxFilter class="PostFilter"/> </sx:transform> </sx:serialize> </sx:service> </sx:resources>
To execute the myPipeline
service, you need to do two things.
Compile the two Java classes, PreFilter
and PostFilter
, and copy
the .class
files into the dir/classes
directory.
Run the command
servingxml -r resources.xml myPipeline < input.xml > output.xml
Here dir
is the directory where the ServingXML software is installed,
resources.xml
defines the "myPipeline" service, and input.xml
and output.xml
are your input and output.
The pipeline body may be thought of as a sequence
of processing steps applied to the default input stream. The input stream is parsed and transformed
into a stream of SAX events, and the events pass through a number of stages. They pass through the inner
sx:transform
element, flowing through the SAX PreFilter, the XSLT stylesheet,
and the SAX PostFilter, in that order, on their way to a sx:serialize
element,
there to become serialized to an output stream.
Transform elements can be nested to any depth, and each can contain an arbitrary number of filters. The flow is always from the innermost element to the outermost element, and within a transform stage, from the top filter to the bottom filter.
In the example above, the service myPipeline
is an example of a resource.
Resources are identified by an absolute or relative URI. We could have written the resources script like this:
<sx:resources xmlns:sx="http://www.servingxml.com/core" xmlns:myns="http://mycompany.com/mynames/"> <sx:service id="myns:myPipeline"> ...
Then we would need to identify the service with a full URI:
servingxml -r resources.xml http://mycompany.com/mynames/myPipeline < input.xml > output.xml
Note that ServingXML follows the RDF convention for converting QNames into URIs, by concatenating the XML namespace URI and local name.
In the SAX pipeline example, the service "myPipeline" executes one task, represented by the
sx:serialize
element, which serializes the XML generated by the XML pipeline body into text,
and writes it to the standard output. A service, however, may execute multiple tasks, including
sx:serialize
)sx:recordStream
)jm:sendMail
)swing:runApp
)sx:runService
)
The sx:parameter
element is used to define a parameter as a QName-value pair,
for example,
<sx:parameter name="validate">no</sx:parameter>
A parameter defined inside an element is accessible to sibling and descendent elements, but not to ancestor elements. If the parameter has the same QName as a parameter in an ancestor, the new parameter value replaces the old one within the scope of siblings and descendents, but not in the scope of ancestors, the old value is still visible to ancestors. This is to avoid side effects.
The application processing the resources script may pass additional parameters to the
script. For example, the console app may pass the parameter validate
like this:
servingxml -r resources.xml myPipeline validate=yes < input.xml > output.xml
If you want to define a default value for the parameter, you must do so with
a sx:defaultValue
element as follows.
<sx:parameter name="validate"><sx:defaultValue>no</sx:defaultValue></sx:parameter>
A passed parameter cannot override a parameter defined in a resources script unless
the script's value is a default value, enclosed by a sx:defaultValue
element.
More generally, a parameter in an ancestor cannot override a parameter in a descendent
unless the descendant's value is a default value.
ServingXML supports conditional processing with a sx:choose
element,
which tests XPath boolean expressions against parameters to determine which of several alternative
pipeline bodies to execute. Here's an example
<sx:resources xmlns:sx="http://www.servingxml.com/core" xmlns:msv="http://www.servingxml.com/extensions/msv"> <sx:service id="myPipeline"> <sx:parameter name="validate"><sx:defaultValue>yes</sx:defaultValue></sx:parameter> <sx:serialize> <sx:choose> <sx:when test="$validate = 'yes'"> <sx:transform> <sx:saxFilter class="PreFilter"/> <sx:xslt><sx:urlSource url="filter.xsl"/></sx:xslt> <sx:saxFilter class="PostFilter"/> <msv:schemaValidator> <sx:urlSource url="mySchema.xsd"/> </msv:schemaValidator> </sx:transform> </sx:when> <sx:otherwise> <sx:transform> <sx:saxFilter class="PreFilter"/> <sx:xslt><sx:urlSource url="filter.xsl"/></sx:xslt> <sx:saxFilter class="PostFilter"/> </sx:transform> </sx:otherwise> <sx:choose> </sx:serialize> </sx:service> </sx:resources>
If the validate
parameter is "yes", the pipeline service will stream the SAX events through
the first three filters, and also through the SUN Multi-Schema Validator, which is implemented by the
msv:schemaValidator
component; if it is "no", the validation step is skipped. The
sx:parameter
element at the top of the script initializes the validate
parameter
to "yes", so by default the validation step will be performed. This may be overriden by passing a
validate
parameter on the command line, like this
servingxml -r resources.xml pipeline validate=no < input.xml > output.xml
The resources defined in a resources script may be given ids and referred to by reference. For example, the SAX pipeline example may be rewritten as follows.
Figure 2. SAX pipeline with references
<sx:resources xmlns:sx="http://www.servingxml.com/core"> <sx:service id="myPipeline"> <sx:serialize> <sx:transform> <sx:content ref="myPreFilter"/> <sx:content ref="myFilter"/> <sx:content ref="myPostFilter"/> </sx:transform> </sx:serialize> </sx:service> <sx:saxFilter id="myPreFilter" class="PreFilter"/> <sx:xslt id="myFilter"> <sx:urlSource url="filter.xsl"/> </sx:xslt> <sx:saxFilter id="myPostFilter" class="PostFilter"/> </sx:resources>
Note that we could have written <sx:saxFilter ref="myPreFilter"/>
,
but instead we wrote <sx:content ref="myPreFilter"/>
,
substituting the abstract component sx:content
for the derived sx:saxFilter
.
Names given to components must be unique up to
the abstract component level, for instance, a service and a filter may both be named "myPipeline",
but a sx:saxFilter
and a sx:xslt
must be named differently.
In our example so far, XML input is read from standard input and XML output is written to standard output. We can, however, specify sources of input and sinks of output explicitly in the resources script. Below, we specify an input file named "input.xml", and an output file named "output.xml".
Figure 3. SAX pipeline with specified input source and output sink
<sx:resources xmlns:sx="http://www.servingxml.com/core" <sx:service id="myPipeline"> <sx:serialize> <sx:xsltSerializer> <sx:fileSink file="output.xml"/> </sx:xsltSerializer> <sx:transform> <sx:content ref="myInput"/> <sx:content ref="myPreFilter"/> <sx:content ref="myFilter"/> <sx:content ref="myPostFilter"/> </sx:transform> </sx:serialize> </sx:service> <sx:document id="myInput"> <sx:fileSource file="input.xml"/> </document> <sx:saxFilter id="myPreFilter" class="PreFilter"/> <sx:xslt id="myFilter"> <sx:urlSource url="filter.xsl"/> </sx:xslt> <sx:saxFilter id="myPostFilter" class="PostFilter"/> </sx:resources>
The attributes file
in sx:fileSource
, url
in sx:urlSource
and file
in sx:fileSink
can contain parameters. We can, for example,
include parameters in the input and output filenames, like this,
<sx:fileSink file="{$myOutput}.xml"/> <sx:fileSource file="{$myInput}.xml"/>
and run the pipeline with passed parameters,
servingxml -r resources.xml myPipeline myInput=input myOutput=output
ServingXML supports the idea of abstract elements. New elements can be created as specializations of abstract elements and used interchangeably with core ServingXML elements in resources
scripts. Want your XML serialized to a file on an FTP server? Use the ftpSink
:
<sx:resources xmlns:sx="http://www.servingxml.com/core" xmlns:edt="http://www.servingxml.com/extensions/edtftp"> <edt:ftpClient name="myFtpClient" host="tor3" user="dap" password="spring"/> <sx:service name="myPipeline"> <sx:serialize> <sx:xsltSerializer> <edt:ftpSink remoteDirectory="incoming" remoteFile="output.xml"> <edt:ftpClient ref="myFtpClient"/> </edt:ftpSink> </sx:xsltSerializer> ...
Pipeline bodies may be composed out of other pipeline bodies. In the example below,
four common steps in preparing invoices are collected in the sx:transform
element
named "steps1-4". This pipeline body is used in two other pipeline bodies that are
specialized to produce HTML and XSL-FO output.
Figure 4. Composition of pipeline bodies
<sx:resources xmlns:sx="http://www.servingxml.com/core" xmlns:fop="http://www.servingxml.com/extensions/fop"> <sx:service id="invoice-html"> <sx:serialize> <sx:transform> <sx:document><sx:urlSource url="invoice.xml"/></sx:document> <sx:transform ref="steps1-4"/> <sx:xslt><sx:urlSource url="styles/invoice2html.xsl"/></sx:xslt> </sx:transform> </sx:serialize> </sx:service> <sx:service id="invoice-pdf"> <sx:serialize> <fop:foSerializer/> <sx:transform> <sx:document><sx:urlSource url="invoice.xml"/></sx:document> <sx:transform ref="steps1-4"/> <sx:xslt><sx:urlSource url="styles/invoice2fo.xsl"/></sx:xslt> </sx:transform> </sx:serialize> </sx:service> <sx:transform id="steps1-4"> <sx:xslt><sx:urlSource url="styles/step1.xsl"/></sx:xslt> <sx:xslt><sx:urlSource url="styles/step2.xsl"/></sx:xslt> <sx:xslt><sx:urlSource url="styles/step3.xsl"/></sx:xslt> <sx:xslt><sx:urlSource url="styles/step4.xsl"/></sx:xslt> </sx:transform> </sx:resources>
The ServingXML
implementation acts as a URI resolver for an XSLT stylesheet in the pipeline.
If an XSLT stylesheet uses the document
function to reference a URI,
an attempt will be made to resolve that URI against content identified by QName.
The URI will be resolved if it matches the identifier obtained by
concatenating the namespace URI and the local name of content defined in the resources script.
If there is no match, URI resolution reverts to the default URI resolution for the transformer.
The ServingXML
implementation will recognize query parameters such as ?directory=input
in the URI passed to the document()
function. These parameters may
be referenced in XML content elements.
ServingXML supports filters that extract subtrees and perform serialization
or other tasks on those subtrees. For example, suppose we have a file invoices.xml
containing multiple invoice elements.
<invoices> <invoice id="200302-01" ... <invoice id="200302-02" ... </invoices>
By applying the resources script below, we can produce a separate PDF file for each invoice, each filename being identified by the invoice id.
Figure 5. Resources script
<sx:resources xmlns:sx="http://www.servingxml.com/core" xmlns:fop="http://www.servingxml.com/extensions/fop" xmlns:inv="http://www.telio.be/ns/2002/invoice"> <sx:service id="invoices"> <sx:transform> <!-- Here we extract a subtree from the SAX stream --> <sx:processSubtree path="/inv:invoices/inv:invoice"> <!-- Transform invoice subtree to pdf--> <sx:serialize> <!-- We initialize a parameter with an XPATH expression applied to the document subtree --> <sx:parameter name="invoice-name" select="@id"/> <fop:foSerializer> <sx:fileSink file="output/invoice{$invoice-name}.pdf"/> </fop:foSerializer> <sx:transform> <sx:transform ref="steps1-4"/> <sx:xslt><sx:urlSource url="styles/invoice2fo.xsl"/></sx:xslt> </sx:transform> </sx:serialize> </sx:processSubtree> </sx:transform> </sx:service> <sx:transform id="steps1-4"> <sx:xslt><sx:urlSource url="styles/step1.xsl"/></sx:xslt> <sx:xslt><sx:urlSource url="styles/step2.xsl"/></sx:xslt> <sx:xslt><sx:urlSource url="styles/step3.xsl"/></sx:xslt> <sx:xslt><sx:urlSource url="styles/step4.xsl"/></sx:xslt> </sx:transform> </sx:resources>
The sx:processSubtree
element has an attribute
path
that references a SAXPath
pattern,
to extract subtrees from the stream of SAX events.
A SAXPath
pattern is an expression that matches on a stack of SAX events as they flow through a SAX filter.
The syntax for a SAXPath
is a restricted XSLT match pattern,
including the parts that make sense for filtering on the SAX startElement
event.
The match pattern is evaluated against the path of elements leading to the current element,
the attributes of the elements, and any parameters in scope.
A SAXPath pattern consists of a series of one or more elements separated by "/" or "//". An absolute SAXPath pattern begins with a "/" or "//", and is matched against the entire path of elements. A relative SAXPath pattern is matched against a portion of the path that ends at the current element. A "//" expands to match any series of elements separating two matched path entries. The wildcard "*" may be used to match against any element. Predicates that a path entry must satisfy may be appended to the entry with square brackets.
ServingXML supports the notion of an XML tee, to fork a stream of SAX events. Suppose, for example, we wanted to
serialize each invoice in the previous example to HTML as well as PDF. One way to do this is
to insert an sx:tagTee
element in the pipeline, like this:
<sx:resources xmlns:sx="http://www.servingxml.com/core" xmlns:fop="http://www.servingxml.com/extensions/fop" xmlns:inv="http://www.telio.be/ns/2002/invoice"> <sx:service id="invoices"> <sx:transform> <!-- Here we extract a document subtree from the SAX stream --> <sx:processSubtree path="/inv:invoices/inv:invoice"> <sx:transform> <!-- We initialize a parameter with an XPATH expression applied to the document subtree --> <sx:parameter name="invoice-name" select="@id"/> <fop:foSerializer> <sx:fileSink file="output/invoice{$invoice-name}.pdf"/> </fop:foSerializer> <sx:transform> <sx:transform ref="steps1-4"/> <!-- Tee - invoice document subtree to html--> <sx:tagTee> <sx:xslt documentBase="documents/"> <sx:urlSource url="styles/invoice2html.xsl"/> </sx:xslt> <sx:xsltSerializer> <sx:fileSink file="output/invoice{$invoice-name}.html"/> </sx:xsltSerializer> </sx:tagTee> <sx:xslt documentBase="documents/"> <sx:urlSource url="styles/invoice2fo.xsl"/> </sx:xslt> </sx:transform> </sx:transform> </sx:processSubtree> </sx:transform> </sx:service> <sx:transform id="html-output"> <sx:xslt documentBase="documents/"> <sx:urlSource url="styles/invoice2html.xsl"/> </sx:xslt> <sx:xsltSerializer> <sx:fileSink file="output/invoice{$invoice-name}.html"/> </sx:xsltSerializer> </sx:transform> ... </sx:resources>
As a resources script gets bigger, it becomes desirable to reorganize it, perhaps splitting off
the content and filter elements into separate files, and grouping resource names into distinct
namespaces. We may, for example, wish to decompose the resources.xml
file as follows.
documents.xml
- File of documents with names assigned from
the namespace http://www.mydomain.com/documents
.
<sx:resources xmlns:sx="http://www.servingxml.com/core" xmlns:myns="http://mycompany.com/mynames/"> <sx:document id="myns:myInput"> <sx:fileSource file="input.xml"/> </document> </sx:resources>
filters.xml
- File of filter definitions.
<sx:resources xmlns:sx="http://www.servingxml.com/core"> <sx:saxFilter id="myPreFilter" class="PreFilter"/> <sx:xslt id="myFilter"><sx:urlSource url="filter.xsl"/></sx:xslt> <sx:saxFilter id="myPostFilter" class="PostFilter"/> </sx:resources>
services.xml
- File of service definitions.
We now need to import the content and filter definitions in the services.xml file, and we
do that using the sx:include
instruction.
Figure 6. Resources script with includes
<sx:resources xmlns:sx="http://www.servingxml.com/core" xmlns:edt="http://www.servingxml.com/extensions/edtftp" xmlns:myns="http://mycompany.com/mynames/"> <sx:include href="documents.xml"/> <sx:include href="filters.xml"/> <sx:service id="myPipeline"> <sx:serialize> <sx:xsltSerializer> <edt:ftpSink remoteFile="output.xml"> <edt:ftpClient ref="myFtpClient"/> </edt:ftpSink> </sx:xsltSerializer> <sx:transform> <sx:content ref="myns:myInput"/> <sx:content ref="myPreFilter"/> <sx:content ref="myFilter"/> <sx:content ref="myPostFilter"/> </sx:transform> </sx:serialize> </sx:service> <edt:ftpClient id="myFtpClient" host="myHost" user="xxx" password="xxx"/> </sx:resources>
A number of elements support custom implementations by accepting a Java class that implements
a defined interface and a list of custom properties. These include sx:saxReader
,
sx:saxFilter
, sx:customSerializer
, sx:customRecordFilter
,
sx:customJdbcConnection
, and sx:dynamicContent
.
New components may be created as extensions and used interchangeably with framework components in resources
scripts. The edtftpj
extension, for example, provides the edt:ftpSource
and edt:ftpSink
implementations of the abstract sx:streamSource
and sx:streamSink
components.
Including the extension in the deployment build requires only that an entry be added in the
build-extensions.xml
file.