» Splitting XML Well with XSLT 2 #

Paul R. Brown @ 2009-09-30

I recently had the need to split up a result set from a Solr query into a collection of smaller groups of add requests for POSTing into a different core. There are some ways to make the split work with text processing tools (split and friends), but it's always an open question whether an ad hoc approach will trip over some markup — it's just better to use XML tooling. By no coincidence (based on features missing from ), XSLT 2 makes it easy to do the right thing.

First up is grouping in chunks of 2000 records:

<xsl:for-each-group select="/response/result/doc"
                    group-by="round(position() div 2000)">
...
</xsl:for-each-group>

Outputting each hunk to a file named for the index of the group is also a one-liner:

<xsl:result-document href="{current-grouping-key()}_out.xml">
  <add>
    <xsl:for-each select="current-group()">
      <doc>
        <xsl:apply-templates />
      </doc>
    </xsl:for-each>
  </add>
</xsl:result-document>

And that's it. The only trick is choosing an XSLT  processor, and the superlative Saxon (from Saxonica) is my default choice.

 

← 2009-09-25 — Commandline Puzzler
→ 2010-05-12 — Come Work for Me