Serving XML: Pipeline Language

Daniel Parker


First Example
Resources
Tasks
Parameters
Conditional Processing
Referencing Resources by Id
Sources and Sinks
Abstract Elements
Composition
XSLT URI Resolution
Document Subtrees
XML Tee
Organizing Resource Scripts
Customization
Extendability

This is the first of three articles describing the ServingXML pipeline language.

ServingXML is a language for building XML pipelines, and an extensible Java framework for defining the elements of the language. This article gives a short introduction to some of the basic ideas. It focuses on pipelines where the input is an XML stream and the output is a serialized XML stream.

First Example

ServingXML responds to requests by invoking a service, which in turn reads content and subjects it to a number of transformations, and finally writes output.

ServingXML makes it easy to implement SAX pipelines like Example 5 in Michael Kay's XSLT 2nd Edition Programmer's Reference, Appendix F. This example is a three-stage pipeline, where the first stage is a SAX filter written in Java, the second stage is an XSLT transformation, and the third stage is another SAX filter written in Java. In ServingXML, it may be expressed as follows.

Figure 1. SAX pipeline


<sx:resources xmlns:sx="http://www.servingxml.com/core">
  <sx:service id="myPipeline">
    <sx:serialize>
      <sx:transform>
        <sx:saxFilter class="PreFilter"/>
        <sx:xslt>
          <sx:urlSource url="filter.xsl"/>
        </sx:xslt>
        <sx:saxFilter class="PostFilter"/>     
      </sx:transform>
    </sx:serialize>
  </sx:service>
</sx:resources>

To execute the myPipeline service, you need to do two things.

  • Compile the two Java classes, PreFilter and PostFilter, and copy the .class files into the dir/classes directory.

  • Run the command

    
    servingxml -r resources.xml myPipeline 
        < input.xml > output.xml
    
    

Here dir is the directory where the ServingXML software is installed, resources.xml defines the "myPipeline" service, and input.xml and output.xml are your input and output.

The pipeline body may be thought of as a sequence of processing steps applied to the default input stream. The input stream is parsed and transformed into a stream of SAX events, and the events pass through a number of stages. They pass through the inner sx:transform element, flowing through the SAX PreFilter, the XSLT stylesheet, and the SAX PostFilter, in that order, on their way to a sx:serialize element, there to become serialized to an output stream.

Transform elements can be nested to any depth, and each can contain an arbitrary number of filters. The flow is always from the innermost element to the outermost element, and within a transform stage, from the top filter to the bottom filter.

Resources

In the example above, the service myPipeline is an example of a resource. Resources are identified by an absolute or relative URI. We could have written the resources script like this:


<sx:resources xmlns:sx="http://www.servingxml.com/core"
              xmlns:myns="http://mycompany.com/mynames/">
  <sx:service id="myns:myPipeline"> ...

Then we would need to identify the service with a full URI:


servingxml -r resources.xml http://mycompany.com/mynames/myPipeline 
    < input.xml > output.xml

Note that ServingXML follows the RDF convention for converting QNames into URIs, by concatenating the XML namespace URI and local name.

Tasks

In the SAX pipeline example, the service "myPipeline" executes one task, represented by the sx:serialize element, which serializes the XML generated by the XML pipeline body into text, and writes it to the standard output. A service, however, may execute multiple tasks, including

  • serializing XML to a file (sx:serialize)
  • writing records to a file (sx:recordStream)
  • sending mail (jm:sendMail)
  • starting a Swing application (swing:runApp)
  • running another service (sx:runService)

Parameters

The sx:parameter element is used to define a parameter as a QName-value pair, for example,


  <sx:parameter name="validate">no</sx:parameter>

A parameter defined inside an element is accessible to sibling and descendent elements, but not to ancestor elements. If the parameter has the same QName as a parameter in an ancestor, the new parameter value replaces the old one within the scope of siblings and descendents, but not in the scope of ancestors, the old value is still visible to ancestors. This is to avoid side effects.

The application processing the resources script may pass additional parameters to the script. For example, the console app may pass the parameter validate like this:


servingxml -r resources.xml myPipeline validate=yes
    < input.xml > output.xml

If you want to define a default value for the parameter, you must do so with a sx:defaultValue element as follows.


  <sx:parameter name="validate"><sx:defaultValue>no</sx:defaultValue></sx:parameter>

A passed parameter cannot override a parameter defined in a resources script unless the script's value is a default value, enclosed by a sx:defaultValue element. More generally, a parameter in an ancestor cannot override a parameter in a descendent unless the descendant's value is a default value.

Conditional Processing

ServingXML supports conditional processing with a sx:choose element, which tests XPath boolean expressions against parameters to determine which of several alternative pipeline bodies to execute. Here's an example


<sx:resources xmlns:sx="http://www.servingxml.com/core"
              xmlns:msv="http://www.servingxml.com/extensions/msv">
  <sx:service id="myPipeline">
  
    <sx:parameter name="validate"><sx:defaultValue>yes</sx:defaultValue></sx:parameter>
    
    <sx:serialize>
      <sx:choose>
        <sx:when test="$validate = 'yes'">
          <sx:transform>
            <sx:saxFilter class="PreFilter"/>
            <sx:xslt><sx:urlSource url="filter.xsl"/></sx:xslt>
            <sx:saxFilter class="PostFilter"/>   
            <msv:schemaValidator>
              <sx:urlSource url="mySchema.xsd"/>
            </msv:schemaValidator>
          </sx:transform>
        </sx:when>  
        <sx:otherwise>
          <sx:transform>
            <sx:saxFilter class="PreFilter"/>
            <sx:xslt><sx:urlSource url="filter.xsl"/></sx:xslt>
            <sx:saxFilter class="PostFilter"/>   
          </sx:transform>
        </sx:otherwise>  
      <sx:choose>  
    </sx:serialize>

  </sx:service>
</sx:resources>

If the validate parameter is "yes", the pipeline service will stream the SAX events through the first three filters, and also through the SUN Multi-Schema Validator, which is implemented by the msv:schemaValidator component; if it is "no", the validation step is skipped. The sx:parameter element at the top of the script initializes the validate parameter to "yes", so by default the validation step will be performed. This may be overriden by passing a validate parameter on the command line, like this


servingxml -r resources.xml pipeline validate=no
    < input.xml > output.xml

Referencing Resources by Id

The resources defined in a resources script may be given ids and referred to by reference. For example, the SAX pipeline example may be rewritten as follows.

Figure 2. SAX pipeline with references


<sx:resources xmlns:sx="http://www.servingxml.com/core">
  <sx:service id="myPipeline">
    <sx:serialize>
      <sx:transform>
        <sx:content ref="myPreFilter"/>
        <sx:content ref="myFilter"/>     
        <sx:content ref="myPostFilter"/>     
      </sx:transform>
    </sx:serialize>
  </sx:service>
  
  <sx:saxFilter id="myPreFilter" class="PreFilter"/>
  <sx:xslt id="myFilter">
    <sx:urlSource url="filter.xsl"/>
  </sx:xslt>
  <sx:saxFilter id="myPostFilter" class="PostFilter"/>     
</sx:resources>

Note that we could have written <sx:saxFilter ref="myPreFilter"/>, but instead we wrote <sx:content ref="myPreFilter"/>, substituting the abstract component sx:content for the derived sx:saxFilter. Names given to components must be unique up to the abstract component level, for instance, a service and a filter may both be named "myPipeline", but a sx:saxFilter and a sx:xslt must be named differently.

Sources and Sinks

In our example so far, XML input is read from standard input and XML output is written to standard output. We can, however, specify sources of input and sinks of output explicitly in the resources script. Below, we specify an input file named "input.xml", and an output file named "output.xml".

Figure 3. SAX pipeline with specified input source and output sink


<sx:resources xmlns:sx="http://www.servingxml.com/core"

  <sx:service id="myPipeline">
    <sx:serialize>
      <sx:xsltSerializer>
        <sx:fileSink file="output.xml"/>
      </sx:xsltSerializer>
      <sx:transform>
        <sx:content ref="myInput"/>
        <sx:content ref="myPreFilter"/>
        <sx:content ref="myFilter"/>     
        <sx:content ref="myPostFilter"/>     
      </sx:transform>
    </sx:serialize>
  </sx:service>
  
  <sx:document id="myInput">
    <sx:fileSource file="input.xml"/>
  </document>

  <sx:saxFilter id="myPreFilter" class="PreFilter"/>
  <sx:xslt id="myFilter">
    <sx:urlSource url="filter.xsl"/>
  </sx:xslt>
  <sx:saxFilter id="myPostFilter" class="PostFilter"/>     
</sx:resources>

The attributes file in sx:fileSource, url in sx:urlSource and file in sx:fileSink can contain parameters. We can, for example, include parameters in the input and output filenames, like this,


  <sx:fileSink file="{$myOutput}.xml"/>
  
  <sx:fileSource file="{$myInput}.xml"/>

and run the pipeline with passed parameters,


servingxml -r resources.xml myPipeline 
    myInput=input myOutput=output

Abstract Elements

ServingXML supports the idea of abstract elements. New elements can be created as specializations of abstract elements and used interchangeably with core ServingXML elements in resources scripts. Want your XML serialized to a file on an FTP server? Use the ftpSink:


<sx:resources xmlns:sx="http://www.servingxml.com/core"
             xmlns:edt="http://www.servingxml.com/extensions/edtftp">

 <edt:ftpClient name="myFtpClient"
                host="tor3" user="dap" password="spring"/>

 <sx:service name="myPipeline">

   <sx:serialize>
    <sx:xsltSerializer>
      <edt:ftpSink remoteDirectory="incoming" remoteFile="output.xml">
        <edt:ftpClient ref="myFtpClient"/>
      </edt:ftpSink>
    </sx:xsltSerializer> ...

  

Composition

Pipeline bodies may be composed out of other pipeline bodies. In the example below, four common steps in preparing invoices are collected in the sx:transform element named "steps1-4". This pipeline body is used in two other pipeline bodies that are specialized to produce HTML and XSL-FO output.

Figure 4. Composition of pipeline bodies


<sx:resources xmlns:sx="http://www.servingxml.com/core"
              xmlns:fop="http://www.servingxml.com/extensions/fop">
  
  <sx:service id="invoice-html">                         
    <sx:serialize>
      <sx:transform>
        <sx:document><sx:urlSource url="invoice.xml"/></sx:document>
        <sx:transform ref="steps1-4"/>
        <sx:xslt><sx:urlSource url="styles/invoice2html.xsl"/></sx:xslt> 
      </sx:transform>
    </sx:serialize>
  </sx:service>

  <sx:service id="invoice-pdf">                         
    <sx:serialize>
      <fop:foSerializer/>
      <sx:transform>
        <sx:document><sx:urlSource url="invoice.xml"/></sx:document>
        <sx:transform ref="steps1-4"/>
        <sx:xslt><sx:urlSource url="styles/invoice2fo.xsl"/></sx:xslt> 
      </sx:transform>
    </sx:serialize>
  </sx:service>

  <sx:transform id="steps1-4">
    <sx:xslt><sx:urlSource url="styles/step1.xsl"/></sx:xslt> 
    <sx:xslt><sx:urlSource url="styles/step2.xsl"/></sx:xslt> 
    <sx:xslt><sx:urlSource url="styles/step3.xsl"/></sx:xslt> 
    <sx:xslt><sx:urlSource url="styles/step4.xsl"/></sx:xslt> 
  </sx:transform>

</sx:resources>


XSLT URI Resolution

The ServingXML implementation acts as a URI resolver for an XSLT stylesheet in the pipeline. If an XSLT stylesheet uses the document function to reference a URI, an attempt will be made to resolve that URI against content identified by QName. The URI will be resolved if it matches the identifier obtained by concatenating the namespace URI and the local name of content defined in the resources script. If there is no match, URI resolution reverts to the default URI resolution for the transformer.

The ServingXML implementation will recognize query parameters such as ?directory=input in the URI passed to the document() function. These parameters may be referenced in XML content elements.

Document Subtrees

ServingXML supports filters that extract subtrees and perform serialization or other tasks on those subtrees. For example, suppose we have a file invoices.xml containing multiple invoice elements.


<invoices>
  <invoice id="200302-01" ...
  
  <invoice id="200302-02" ...
</invoices>

By applying the resources script below, we can produce a separate PDF file for each invoice, each filename being identified by the invoice id.

Figure 5. Resources script


<sx:resources xmlns:sx="http://www.servingxml.com/core"
              xmlns:fop="http://www.servingxml.com/extensions/fop"
              xmlns:inv="http://www.telio.be/ns/2002/invoice">
   
  <sx:service id="invoices"> 
    <sx:transform>
      <!-- Here we extract a subtree from the SAX stream -->
      <sx:processSubtree path="/inv:invoices/inv:invoice">
         <!-- Transform invoice subtree to pdf-->
         <sx:serialize>
             <!-- We initialize a parameter with an XPATH expression
                  applied to the document subtree -->
            <sx:parameter name="invoice-name" select="@id"/> 
            <fop:foSerializer>
              <sx:fileSink file="output/invoice{$invoice-name}.pdf"/>
            </fop:foSerializer>
            <sx:transform>
              <sx:transform ref="steps1-4"/>
              <sx:xslt><sx:urlSource url="styles/invoice2fo.xsl"/></sx:xslt> 
            </sx:transform>
         </sx:serialize>
      </sx:processSubtree>
    </sx:transform>
  </sx:service>

  <sx:transform id="steps1-4">
    <sx:xslt><sx:urlSource url="styles/step1.xsl"/></sx:xslt> 
    <sx:xslt><sx:urlSource url="styles/step2.xsl"/></sx:xslt> 
    <sx:xslt><sx:urlSource url="styles/step3.xsl"/></sx:xslt> 
    <sx:xslt><sx:urlSource url="styles/step4.xsl"/></sx:xslt> 
  </sx:transform>

</sx:resources>


The sx:processSubtree element has an attribute path that references a SAXPath pattern, to extract subtrees from the stream of SAX events. A SAXPath pattern is an expression that matches on a stack of SAX events as they flow through a SAX filter. The syntax for a SAXPath is a restricted XSLT match pattern, including the parts that make sense for filtering on the SAX startElement event. The match pattern is evaluated against the path of elements leading to the current element, the attributes of the elements, and any parameters in scope.

A SAXPath pattern consists of a series of one or more elements separated by "/" or "//". An absolute SAXPath pattern begins with a "/" or "//", and is matched against the entire path of elements. A relative SAXPath pattern is matched against a portion of the path that ends at the current element. A "//" expands to match any series of elements separating two matched path entries. The wildcard "*" may be used to match against any element. Predicates that a path entry must satisfy may be appended to the entry with square brackets.

XML Tee

ServingXML supports the notion of an XML tee, to fork a stream of SAX events. Suppose, for example, we wanted to serialize each invoice in the previous example to HTML as well as PDF. One way to do this is to insert an sx:tagTee element in the pipeline, like this:


<sx:resources xmlns:sx="http://www.servingxml.com/core"
              xmlns:fop="http://www.servingxml.com/extensions/fop"
              xmlns:inv="http://www.telio.be/ns/2002/invoice">

  <sx:service id="invoices">
  
    <sx:transform>
      <!-- Here we extract a document subtree from the SAX stream -->
      <sx:processSubtree path="/inv:invoices/inv:invoice">
        <sx:transform>
          <!-- We initialize a parameter with an XPATH expression
               applied to the document subtree -->
          <sx:parameter name="invoice-name" select="@id"/>
          <fop:foSerializer>
            <sx:fileSink file="output/invoice{$invoice-name}.pdf"/>
          </fop:foSerializer>
          <sx:transform>
            <sx:transform ref="steps1-4"/>
            <!-- Tee - invoice document subtree to html-->
            <sx:tagTee>
              <sx:xslt documentBase="documents/">
                <sx:urlSource url="styles/invoice2html.xsl"/>
              </sx:xslt>
              <sx:xsltSerializer>
                <sx:fileSink file="output/invoice{$invoice-name}.html"/>
              </sx:xsltSerializer>
            </sx:tagTee>
            <sx:xslt documentBase="documents/">
              <sx:urlSource url="styles/invoice2fo.xsl"/>
            </sx:xslt>
          </sx:transform>
        </sx:transform>
      </sx:processSubtree>
    </sx:transform>
  </sx:service>

  <sx:transform id="html-output">
    <sx:xslt documentBase="documents/">
      <sx:urlSource url="styles/invoice2html.xsl"/>
    </sx:xslt>
    <sx:xsltSerializer>
      <sx:fileSink file="output/invoice{$invoice-name}.html"/>
    </sx:xsltSerializer>
  </sx:transform>
  ...

</sx:resources>

Organizing Resource Scripts

As a resources script gets bigger, it becomes desirable to reorganize it, perhaps splitting off the content and filter elements into separate files, and grouping resource names into distinct namespaces. We may, for example, wish to decompose the resources.xml file as follows.

  • documents.xml - File of documents with names assigned from the namespace http://www.mydomain.com/documents.

    
    <sx:resources xmlns:sx="http://www.servingxml.com/core"
                          xmlns:myns="http://mycompany.com/mynames/">
      <sx:document id="myns:myInput">
        <sx:fileSource file="input.xml"/>
      </document>
    </sx:resources>
    

  • filters.xml - File of filter definitions.

    
    <sx:resources xmlns:sx="http://www.servingxml.com/core">
      <sx:saxFilter id="myPreFilter" class="PreFilter"/>
      <sx:xslt id="myFilter"><sx:urlSource url="filter.xsl"/></sx:xslt>
      <sx:saxFilter id="myPostFilter" class="PostFilter"/>     
    </sx:resources>
    

  • services.xml - File of service definitions.

We now need to import the content and filter definitions in the services.xml file, and we do that using the sx:include instruction.

Figure 6. Resources script with includes


<sx:resources xmlns:sx="http://www.servingxml.com/core"
              xmlns:edt="http://www.servingxml.com/extensions/edtftp"
              xmlns:myns="http://mycompany.com/mynames/">
  <sx:include href="documents.xml"/>
  <sx:include href="filters.xml"/>

  <sx:service id="myPipeline">
    <sx:serialize>
      <sx:xsltSerializer>
        <edt:ftpSink remoteFile="output.xml">
            <edt:ftpClient ref="myFtpClient"/>
        </edt:ftpSink>
      </sx:xsltSerializer>
      
      <sx:transform>
        <sx:content ref="myns:myInput"/>
        <sx:content ref="myPreFilter"/>
        <sx:content ref="myFilter"/>     
        <sx:content ref="myPostFilter"/>     
      </sx:transform>
    </sx:serialize>
  </sx:service>
    
  <edt:ftpClient id="myFtpClient" host="myHost" user="xxx" password="xxx"/>
  
</sx:resources>


Customization

A number of elements support custom implementations by accepting a Java class that implements a defined interface and a list of custom properties. These include sx:saxReader, sx:saxFilter, sx:customSerializer, sx:customRecordFilter, sx:customJdbcConnection, and sx:dynamicContent.

Extendability

New components may be created as extensions and used interchangeably with framework components in resources scripts. The edtftpj extension, for example, provides the edt:ftpSource and edt:ftpSink implementations of the abstract sx:streamSource and sx:streamSink components. Including the extension in the deployment build requires only that an entry be added in the build-extensions.xml file.