This is the first of three articles describing the Serving XML pipeline language.
Serving XML is a language for building XML pipelines, and an extendible Java framework for defining the elements of the language. This article gives a short introduction to some of the basic ideas. It focuses on pipelines where the input is an XML stream and the output is a serialized XML stream.
Serving XML responds to requests by invoking a service, which in turn reads content and subjects it to a number of transformations, and finally writes output.
Serving XML makes it easy to implement SAX pipelines like Example 5 in Michael Kay's XSLT 2nd Edition Programmer's Reference, Appendix F. This example is a three-stage pipeline, where the first stage is a SAX filter written in Java, the second stage is an XSLT transformation, and the third stage is another SAX filter written in Java. In Serving XML, it may be expressed as follows.
Figure 1. SAX pipeline
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"> <px:service name="myPipeline"> <px:serialize> <px:transform> <px:saxFilter class="PreFilter"/> <px:style> <px:urlSource url="filter.xsl"/> </px:style> <px:saxFilter class="PostFilter"/> </px:transform> </px:serialize> </px:service> </px:resources>
To execute the myPipeline
service, you need to do two things.
Compile the two java classes, PreFilter
and PostFilter
, and copy
the .class
files into the dir/classes
directory.
Run the command
java -jar dir/presentingxml.jar -r resources.xml myPipeline < input.xml > output.xml
Here dir
is the directory where the Serving XML software is installed,
resources.xml
defines the "myPipeline" service, and input.xml
and output.xml
are your input and output.
The pipeline body may be thought of as a sequence
of processing steps applied to the default input stream. The input stream is parsed and transformed
into a stream of SAX events that flow out from the centre, from child element to parent element, and within a
px:transform
element, down from the top. They pass through the inner
px:transform
element, flowing through the SAX PreFilter, the XSLT stylesheet,
and the SAX PostFilter, in that order, on their way to a px:serialize
element,
there to become serialized to an output stream.
Transform elements can be nested to any depth, and each can contain an arbitrary number of filters. The flow is always from the innermost element to the outermost element, and within a transform stage, from the top filter to the bottom filter.
In the SAX pipeline example, the service "myPipeline" performs one task, represented by the
px:serialize
element, which serializes the XML generated by the XML pipeline body into text,
and writes it to the standard output. A service, however, may perform multiple tasks, including
px:serialize
)px:writeRecords
)jm:sendMail
)swing:runApp
)px:runService
)
The px:parameter
element is used to define a parameter as a QName-value pair,
for example,
<px:parameter name="validate">no</px:parameter>
A parameter defined inside an element is visible to all siblings and all their descendents. It is not visible to ancestors. If the parameter has the same QName as a parameter in an ancestor, a new parameter value replaces the old one within the scope of siblings and descendents, but not in the scope of ancestors, the old value is still visible to ancestors. It is not possible to change the parameter value of an ancestor, changes are visible to siblings and descendents only. This is to avoid side effects.
The application processing the resources script may pass additional parameters to the
script. For example, the console app may pass the parameter validate
like this:
java -jar dir/presentingxml.jar -r resources.xml myPipeline validate=yes < input.xml > output.xml
If you want to define a default value for the parameter, you must do so with
a px:defaultValue
element as follows.
<px:parameter name="validate"><px:defaultValue>no</px:defaultValue></px:parameter>
A passed parameter cannot override a parameter defined in a resources script unless
the script's value is a default value, enclosed by a px:defaultValue
element.
More generally, a parameter in an ancestor cannot override a parameter in a descendent
unless the descendant's value is a default value.
Serving XML supports conditional execution with a px:choose
element,
which tests XPath boolean expressions against parameters to determine which of several alternative
pipeline bodies to execute. Here's an example
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML" xmlns:msv="http://www.presentingxml.com/PresentingXML/msv"> <px:service name="myPipeline"> <px:parameter name="validate"><px:defaultValue>yes</px:defaultValue></px:parameter> <px:serialize> <px:choose> <px:when test="$validate = 'yes'"> <px:transform> <px:saxFilter class="PreFilter"/> <px:style><px:urlSource url="filter.xsl"/></px:style> <px:saxFilter class="PostFilter"/> <msv:msvFilter schema="mySchema"/> </px:transform> </px:when> <px:otherwise> <px:transform> <px:saxFilter class="PreFilter"/> <px:style><px:urlSource url="filter.xsl"/></px:style> <px:saxFilter class="PostFilter"/> </px:transform> </px:otherwise> <px:choose> </px:serialize> </px:service> </px:resources>
If the validate
parameter is "yes", the pipeline service will stream the SAX events through
the first three filters, and also through the SUN Multi-Schema Validator, which is implemented by the
msv:msvFilter
component; if it is "no", the validation step is skipped. The
px:parameter
element at the top of the script initializes the validate
parameter
to "yes", so by default the validation step will be performed. This may be overriden by passing a
validate
parameter on the command line, like this
java -jar dir/presentingxml.jar -r resources.xml pipeline validate=no < input.xml > output.xml
The resources defined in a resources script may be given names and referred to by reference. For example, the SAX pipeline example may be rewritten as follows.
Figure 2. SAX pipeline with references
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"> <px:service name="myPipeline"> <px:serialize> <px:transform> <px:filter ref="myPreFilter"/> <px:filter ref="myFilter"/> <px:filter ref="myPostFilter"/> </px:transform> </px:serialize> </px:service> <px:saxFilter name="myPreFilter" class="PreFilter"/> <px:style name="myFilter"> <px:urlSource url="filter.xsl"/> </px:style> <px:saxFilter name="myPostFilter" class="PostFilter"/> </px:resources>
Note that we could have written <px:saxFilter ref="myPreFilter"/>
,
but instead we wrote <px:filter ref="myPreFilter"/>
,
substituting the abstract component px:filter
for the derived px:saxFilter
.
Names given to components must be unique up to
the abstract component level, for instance, a service and a filter may both be named "myPipeline",
but a px:saxFilter
and a px:style
must be named differently.
In our example so far, XML input is read from standard input and XML output is written to standard output. We can, however, specify sources of input and sinks of output explicitly in the resources script. Below, we specify an input file named "input.xml", and an output file on a remote host named "output.xml".
Figure 3. SAX pipeline with specified input source and output sink
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML" xmlns:edt="http://www.presentingxml.com/PresentingXML/edtftp"> <px:service name="myPipeline"> <px:serialize> <px:xmlEmitter> <edt:ftpSink remoteFile="output.xml"> <edt:ftpClient ref="myFtpClient"/> </edt:ftpSink> </px:xmlEmitter> <px:transform> <px:content ref="myInput"/> <px:filter ref="myPreFilter"/> <px:filter ref="myFilter"/> <px:filter ref="myPostFilter"/> </px:transform> </px:serialize> </px:service> <edt:ftpClient name="myFtpClient" host="myHost" user="xxx" password="xxx"/> <px:document name="myInput"> <px:fileSource file="input.xml"/> </document> <px:saxFilter name="myPreFilter" class="PreFilter"/> <px:style name="myFilter"> <px:urlSource url="filter.xsl"/> </px:style> <px:saxFilter name="myPostFilter" class="PostFilter"/> </px:resources>
The attributes remoteFile
in edt:ftpSink
,
file
in px:fileSource
, and url
in px:urlSource
can contain parameters. We can, for example,
include parameters in the input and output filenames, like this,
<edt:ftpSink remoteFile="{$myOutput}.xml"> <edt:ftpClient ref="myFtpClient"/> </edt:ftpSink> <px:fileSource file="{$myInput}.xml"/>
and run the pipeline with passed parameters,
java -jar dir/presentingxml.jar -r resources.xml myPipeline myInput=input myOutput=output
Pipeline bodies may be composed out of other pipeline bodies. In the example below,
four common steps in preparing invoices are collected in the px:transform
element
named "steps1-4". This pipeline body is used in two other pipeline bodies that are
specialized to produce HTML and XSL-FO output.
Figure 4. Composition of pipeline bodies
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML" xmlns:fop="http://www.presentingxml.com/PresentingXML/fop"> <px:service name="invoice-html"> <px:serialize> <px:transform> <px:document><px:urlSource url="invoice.xml"/></px:document> <px:transform ref="steps1-4"/> <px:style><px:urlSource url="styles/invoice2html.xsl"/></px:style> </px:transform> </px:serialize> </px:service> <px:service name="invoice-pdf"> <px:serialize> <fop:foEmitter/> <px:transform> <px:document><px:urlSource url="invoice.xml"/></px:document> <px:transform ref="steps1-4"/> <px:style><px:urlSource url="styles/invoice2fo.xsl"/></px:style> </px:transform> </px:serialize> </px:service> <px:transform name="steps1-4"> <px:style><px:urlSource url="styles/step1.xsl"/></px:style> <px:style><px:urlSource url="styles/step2.xsl"/></px:style> <px:style><px:urlSource url="styles/step3.xsl"/></px:style> <px:style><px:urlSource url="styles/step4.xsl"/></px:style> </px:transform> </px:resources>
Serving XML supports filters that extract document fragments and perform serialization
or other tasks on those fragments. For example, suppose we have a file invoices.xml
containing multiple invoice elements.
<invoices> <invoice id="200302-01" ... <invoice id="200302-02" ... </invoices>
By applying the resources script below, we can produce a separate HTML file for each invoice, each filename being identified by the invoice id.
Figure 5. Resources script
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML" xmlns:fop="http://www.presentingxml.com/PresentingXML/fop" xmlns:inv="http://www.telio.be/ns/2002/invoice"> <px:service name="invoices"> <px:transform> <!-- Here we extract a document fragment from the SAX stream --> <px:taskRunnerFilter path="/inv:invoices/inv:invoice"> <!-- First task - invoice document fragment to pdf--> <px:serialize> <!-- We initialize a parameter with an XPATH expression applied to the document fragment --> <px:parameter name="invoice-name" select="@id"/> <fop:foEmitter> <px:fileSink file="output/invoice{$invoice-name}.pdf"/> </fop:foEmitter> <px:transform> <px:transform ref="steps1-4"/> <px:style><px:urlSource url="styles/invoice2fo.xsl"/></px:style> </px:transform> </px:serialize> <!-- Second task - invoice document fragment to html--> <px:serialize> <px:parameter name="invoice-name" select="@id"/> <px:xmlEmitter> <px:fileSink file="output/invoice{$invoice-name}.html"/> </px:xmlEmitter> <px:transform> <px:transform ref="steps1-4"/> <px:style><px:urlSource url="styles/invoice2html.xsl"/></px:style> </px:transform> </px:serialize> </px:taskRunnerFilter> </px:transform> </px:service> <px:transform name="steps1-4"> <px:style><px:urlSource url="styles/step1.xsl"/></px:style> <px:style><px:urlSource url="styles/step2.xsl"/></px:style> <px:style><px:urlSource url="styles/step3.xsl"/></px:style> <px:style><px:urlSource url="styles/step4.xsl"/></px:style> </px:transform> </px:resources>
As a resources script gets bigger, it becomes desirable to reorganize it, perhaps splitting off
the content and filter elements into separate files, and grouping resource names into distinct
namespaces. We may, for example, wish to decompose the resources.xml
file as follows.
documents.xml
- File of documents with names assigned from
the namespace http://www.mydomain.com/documents
.
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML" xmlns:mydoc="http://www.mydomain.com/documents"> <px:document name="mydoc:myInput"> <px:fileSource file="input.xml"/> </document> </px:resources>
filters.xml
- File of filter definitions.
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"> <px:saxFilter name="myPreFilter" class="PreFilter"/> <px:style name="myFilter"><px:urlSource url="filter.xsl"/></px:style> <px:saxFilter name="myPostFilter" class="PostFilter"/> </px:resources>
services.xml
- File of service definitions.
We now need to import the content and filter definitions in the services.xml file, and we
do that using the px:include
instruction.
Figure 6. Resources script with includes
<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML" xmlns:edt="http://www.presentingxml.com/PresentingXML/edtftp" xmlns:mydoc="http://www.mydomain.com/documents"> <px:include href="documents.xml"/> <px:include href="filters.xml"/> <px:service name="myPipeline"> <px:serialize> <px:xmlEmitter> <edt:ftpSink remoteFile="output.xml"> <edt:ftpClient ref="myFtpClient"/> </edt:ftpSink> </px:xmlEmitter> <px:transform> <px:content ref="mydoc:myInput"/> <px:filter ref="myPreFilter"/> <px:filter ref="myFilter"/> <px:filter ref="myPostFilter"/> </px:transform> </px:serialize> </px:service> <edt:ftpClient name="myFtpClient" host="myHost" user="xxx" password="xxx"/> </px:resources>
A number of elements support custom implementations by accepting a Java class that implements
a defined interface and a list of custom properties. These include px:saxReader
,
px:saxFilter
, px:customEmitter
, px:customRecordFilter
,
px:customJdbcConnection
, and px:dynamicContent
.
New components may be created as extensions and used interchangeably with framework components in resources
scripts. The edtftpj
extension, for example, provides the edt:ftpSource
and edt:ftpSink
implementations of the abstract px:streamSource
and px:streamSink
components.
Including the extension in the deployment build requires only that an entry be added in the
build-extensions.xml
file.