Serving XML: Pipeline Language

Daniel Parker


First example
Tasks
Parameters
Conditional execution
Referencing resources by name
Sources and sinks
Composition
Document fragments
Organizing resource scripts
Customization
Extendability

This is the first of three articles describing the Serving XML pipeline language.

Serving XML is a language for building XML pipelines, and an extendible Java framework for defining the elements of the language. This article gives a short introduction to some of the basic ideas. It focuses on pipelines where the input is an XML stream and the output is a serialized XML stream.

First example

Serving XML responds to requests by invoking a service, which in turn reads content and subjects it to a number of transformations, and finally writes output.

Serving XML makes it easy to implement SAX pipelines like Example 5 in Michael Kay's XSLT 2nd Edition Programmer's Reference, Appendix F. This example is a three-stage pipeline, where the first stage is a SAX filter written in Java, the second stage is an XSLT transformation, and the third stage is another SAX filter written in Java. In Serving XML, it may be expressed as follows.

Figure 1. SAX pipeline


<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML">
  <px:service name="myPipeline">
    <px:serialize>
      <px:transform>
        <px:saxFilter class="PreFilter"/>
        <px:style>
          <px:urlSource url="filter.xsl"/>
        </px:style>
        <px:saxFilter class="PostFilter"/>     
      </px:transform>
    </px:serialize>
  </px:service>
</px:resources>

To execute the myPipeline service, you need to do two things.

  • Compile the two java classes, PreFilter and PostFilter, and copy the .class files into the dir/classes directory.

  • Run the command

    
    java -jar dir/presentingxml.jar -r resources.xml myPipeline 
        < input.xml > output.xml
    
    

Here dir is the directory where the Serving XML software is installed, resources.xml defines the "myPipeline" service, and input.xml and output.xml are your input and output.

The pipeline body may be thought of as a sequence of processing steps applied to the default input stream. The input stream is parsed and transformed into a stream of SAX events that flow out from the centre, from child element to parent element, and within a px:transform element, down from the top. They pass through the inner px:transform element, flowing through the SAX PreFilter, the XSLT stylesheet, and the SAX PostFilter, in that order, on their way to a px:serialize element, there to become serialized to an output stream.

Transform elements can be nested to any depth, and each can contain an arbitrary number of filters. The flow is always from the innermost element to the outermost element, and within a transform stage, from the top filter to the bottom filter.

Tasks

In the SAX pipeline example, the service "myPipeline" performs one task, represented by the px:serialize element, which serializes the XML generated by the XML pipeline body into text, and writes it to the standard output. A service, however, may perform multiple tasks, including

  • serializing XML to a file (px:serialize)
  • writing records to a file (px:writeRecords)
  • sending mail (jm:sendMail)
  • starting a Swing application (swing:runApp)
  • running another service (px:runService)

Parameters

The px:parameter element is used to define a parameter as a QName-value pair, for example,


  <px:parameter name="validate">no</px:parameter>

A parameter defined inside an element is visible to all siblings and all their descendents. It is not visible to ancestors. If the parameter has the same QName as a parameter in an ancestor, a new parameter value replaces the old one within the scope of siblings and descendents, but not in the scope of ancestors, the old value is still visible to ancestors. It is not possible to change the parameter value of an ancestor, changes are visible to siblings and descendents only. This is to avoid side effects.

The application processing the resources script may pass additional parameters to the script. For example, the console app may pass the parameter validate like this:


java -jar dir/presentingxml.jar -r resources.xml myPipeline validate=yes
    < input.xml > output.xml

If you want to define a default value for the parameter, you must do so with a px:defaultValue element as follows.


  <px:parameter name="validate"><px:defaultValue>no</px:defaultValue></px:parameter>

A passed parameter cannot override a parameter defined in a resources script unless the script's value is a default value, enclosed by a px:defaultValue element. More generally, a parameter in an ancestor cannot override a parameter in a descendent unless the descendant's value is a default value.

Conditional execution

Serving XML supports conditional execution with a px:choose element, which tests XPath boolean expressions against parameters to determine which of several alternative pipeline bodies to execute. Here's an example


<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"
              xmlns:msv="http://www.presentingxml.com/PresentingXML/msv">
  <px:service name="myPipeline">
  
    <px:parameter name="validate"><px:defaultValue>yes</px:defaultValue></px:parameter>
    
    <px:serialize>
      <px:choose>
        <px:when test="$validate = 'yes'">
          <px:transform>
            <px:saxFilter class="PreFilter"/>
            <px:style><px:urlSource url="filter.xsl"/></px:style>
            <px:saxFilter class="PostFilter"/>   
            <msv:msvFilter schema="mySchema"/>
          </px:transform>
        </px:when>  
        <px:otherwise>
          <px:transform>
            <px:saxFilter class="PreFilter"/>
            <px:style><px:urlSource url="filter.xsl"/></px:style>
            <px:saxFilter class="PostFilter"/>   
          </px:transform>
        </px:otherwise>  
      <px:choose>  
    </px:serialize>

  </px:service>
</px:resources>

If the validate parameter is "yes", the pipeline service will stream the SAX events through the first three filters, and also through the SUN Multi-Schema Validator, which is implemented by the msv:msvFilter component; if it is "no", the validation step is skipped. The px:parameter element at the top of the script initializes the validate parameter to "yes", so by default the validation step will be performed. This may be overriden by passing a validate parameter on the command line, like this


java -jar dir/presentingxml.jar -r resources.xml pipeline validate=no
    < input.xml > output.xml

Referencing resources by name

The resources defined in a resources script may be given names and referred to by reference. For example, the SAX pipeline example may be rewritten as follows.

Figure 2. SAX pipeline with references


<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML">
  <px:service name="myPipeline">
    <px:serialize>
      <px:transform>
        <px:filter ref="myPreFilter"/>
        <px:filter ref="myFilter"/>     
        <px:filter ref="myPostFilter"/>     
      </px:transform>
    </px:serialize>
  </px:service>
  
  <px:saxFilter name="myPreFilter" class="PreFilter"/>
  <px:style name="myFilter">
    <px:urlSource url="filter.xsl"/>
  </px:style>
  <px:saxFilter name="myPostFilter" class="PostFilter"/>     
</px:resources>

Note that we could have written <px:saxFilter ref="myPreFilter"/>, but instead we wrote <px:filter ref="myPreFilter"/>, substituting the abstract component px:filter for the derived px:saxFilter. Names given to components must be unique up to the abstract component level, for instance, a service and a filter may both be named "myPipeline", but a px:saxFilter and a px:style must be named differently.

Sources and sinks

In our example so far, XML input is read from standard input and XML output is written to standard output. We can, however, specify sources of input and sinks of output explicitly in the resources script. Below, we specify an input file named "input.xml", and an output file on a remote host named "output.xml".

Figure 3. SAX pipeline with specified input source and output sink


<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"
              xmlns:edt="http://www.presentingxml.com/PresentingXML/edtftp">
  <px:service name="myPipeline">
    <px:serialize>
      <px:xmlEmitter>
        <edt:ftpSink remoteFile="output.xml">
            <edt:ftpClient ref="myFtpClient"/>
        </edt:ftpSink>
      </px:xmlEmitter>
      <px:transform>
        <px:content ref="myInput"/>
        <px:filter ref="myPreFilter"/>
        <px:filter ref="myFilter"/>     
        <px:filter ref="myPostFilter"/>     
      </px:transform>
    </px:serialize>
  </px:service>
    
  <edt:ftpClient name="myFtpClient" host="myHost" user="xxx" password="xxx"/>
  
  <px:document name="myInput">
    <px:fileSource file="input.xml"/>
  </document>

  <px:saxFilter name="myPreFilter" class="PreFilter"/>
  <px:style name="myFilter">
    <px:urlSource url="filter.xsl"/>
  </px:style>
  <px:saxFilter name="myPostFilter" class="PostFilter"/>     
</px:resources>

The attributes remoteFile in edt:ftpSink, file in px:fileSource, and url in px:urlSource can contain parameters. We can, for example, include parameters in the input and output filenames, like this,


  <edt:ftpSink remoteFile="{$myOutput}.xml">
      <edt:ftpClient ref="myFtpClient"/>
  </edt:ftpSink>
  
  <px:fileSource file="{$myInput}.xml"/>

and run the pipeline with passed parameters,


java -jar dir/presentingxml.jar -r resources.xml myPipeline 
    myInput=input myOutput=output

Composition

Pipeline bodies may be composed out of other pipeline bodies. In the example below, four common steps in preparing invoices are collected in the px:transform element named "steps1-4". This pipeline body is used in two other pipeline bodies that are specialized to produce HTML and XSL-FO output.

Figure 4. Composition of pipeline bodies


<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"
              xmlns:fop="http://www.presentingxml.com/PresentingXML/fop">
  
  <px:service name="invoice-html">                         
    <px:serialize>
      <px:transform>
        <px:document><px:urlSource url="invoice.xml"/></px:document>
        <px:transform ref="steps1-4"/>
        <px:style><px:urlSource url="styles/invoice2html.xsl"/></px:style> 
      </px:transform>
    </px:serialize>
  </px:service>

  <px:service name="invoice-pdf">                         
    <px:serialize>
      <fop:foEmitter/>
      <px:transform>
        <px:document><px:urlSource url="invoice.xml"/></px:document>
        <px:transform ref="steps1-4"/>
        <px:style><px:urlSource url="styles/invoice2fo.xsl"/></px:style> 
      </px:transform>
    </px:serialize>
  </px:service>

  <px:transform name="steps1-4">
    <px:style><px:urlSource url="styles/step1.xsl"/></px:style> 
    <px:style><px:urlSource url="styles/step2.xsl"/></px:style> 
    <px:style><px:urlSource url="styles/step3.xsl"/></px:style> 
    <px:style><px:urlSource url="styles/step4.xsl"/></px:style> 
  </px:transform>

</px:resources>

Document fragments

Serving XML supports filters that extract document fragments and perform serialization or other tasks on those fragments. For example, suppose we have a file invoices.xml containing multiple invoice elements.


<invoices>
  <invoice id="200302-01" ...
  
  <invoice id="200302-02" ...
</invoices>

By applying the resources script below, we can produce a separate HTML file for each invoice, each filename being identified by the invoice id.

Figure 5. Resources script


<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"
              xmlns:fop="http://www.presentingxml.com/PresentingXML/fop"
              xmlns:inv="http://www.telio.be/ns/2002/invoice">
   
  <px:service name="invoices"> 
    <px:transform>
      <!-- Here we extract a document fragment from the SAX stream -->
      <px:taskRunnerFilter path="/inv:invoices/inv:invoice">
         <!-- First task - invoice document fragment to pdf-->
         <px:serialize>
             <!-- We initialize a parameter with an XPATH expression
                  applied to the document fragment -->
            <px:parameter name="invoice-name" select="@id"/> 
            <fop:foEmitter>
              <px:fileSink file="output/invoice{$invoice-name}.pdf"/>
            </fop:foEmitter>
            <px:transform>
              <px:transform ref="steps1-4"/>
              <px:style><px:urlSource url="styles/invoice2fo.xsl"/></px:style> 
            </px:transform>
         </px:serialize>
         <!-- Second task - invoice document fragment to html-->
         <px:serialize>
            <px:parameter name="invoice-name" select="@id"/> 
            <px:xmlEmitter>
              <px:fileSink file="output/invoice{$invoice-name}.html"/>
            </px:xmlEmitter>
            <px:transform>
              <px:transform ref="steps1-4"/>
              <px:style><px:urlSource url="styles/invoice2html.xsl"/></px:style> 
            </px:transform>
         </px:serialize>
      </px:taskRunnerFilter>
    </px:transform>
  </px:service>

  <px:transform name="steps1-4">
    <px:style><px:urlSource url="styles/step1.xsl"/></px:style> 
    <px:style><px:urlSource url="styles/step2.xsl"/></px:style> 
    <px:style><px:urlSource url="styles/step3.xsl"/></px:style> 
    <px:style><px:urlSource url="styles/step4.xsl"/></px:style> 
  </px:transform>

</px:resources>

Organizing resource scripts

As a resources script gets bigger, it becomes desirable to reorganize it, perhaps splitting off the content and filter elements into separate files, and grouping resource names into distinct namespaces. We may, for example, wish to decompose the resources.xml file as follows.

  • documents.xml - File of documents with names assigned from the namespace http://www.mydomain.com/documents.

    
    <px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"
                          xmlns:mydoc="http://www.mydomain.com/documents">
      <px:document name="mydoc:myInput">
        <px:fileSource file="input.xml"/>
      </document>
    </px:resources>
    

  • filters.xml - File of filter definitions.

    
    <px:resources xmlns:px="http://www.presentingxml.com/PresentingXML">
      <px:saxFilter name="myPreFilter" class="PreFilter"/>
      <px:style name="myFilter"><px:urlSource url="filter.xsl"/></px:style>
      <px:saxFilter name="myPostFilter" class="PostFilter"/>     
    </px:resources>
    

  • services.xml - File of service definitions.

We now need to import the content and filter definitions in the services.xml file, and we do that using the px:include instruction.

Figure 6. Resources script with includes


<px:resources xmlns:px="http://www.presentingxml.com/PresentingXML"
              xmlns:edt="http://www.presentingxml.com/PresentingXML/edtftp"
              xmlns:mydoc="http://www.mydomain.com/documents">
  <px:include href="documents.xml"/>
  <px:include href="filters.xml"/>

  <px:service name="myPipeline">
    <px:serialize>
      <px:xmlEmitter>
        <edt:ftpSink remoteFile="output.xml">
            <edt:ftpClient ref="myFtpClient"/>
        </edt:ftpSink>
      </px:xmlEmitter>
      
      <px:transform>
        <px:content ref="mydoc:myInput"/>
        <px:filter ref="myPreFilter"/>
        <px:filter ref="myFilter"/>     
        <px:filter ref="myPostFilter"/>     
      </px:transform>
    </px:serialize>
  </px:service>
    
  <edt:ftpClient name="myFtpClient" host="myHost" user="xxx" password="xxx"/>
  
</px:resources>

Customization

A number of elements support custom implementations by accepting a Java class that implements a defined interface and a list of custom properties. These include px:saxReader, px:saxFilter, px:customEmitter, px:customRecordFilter, px:customJdbcConnection, and px:dynamicContent.

Extendability

New components may be created as extensions and used interchangeably with framework components in resources scripts. The edtftpj extension, for example, provides the edt:ftpSource and edt:ftpSink implementations of the abstract px:streamSource and px:streamSink components. Including the extension in the deployment build requires only that an entry be added in the build-extensions.xml file.