Element blocks example

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

This example parses the XML Input Stream (StAX) Test 2 - Element Blocks.xml file which has two main sample data blocks: Analyzer Lists and Products. The data blocks are separated by splitting the parent XML path to levels using Switch / Case steps. This separation could also be performed by the string contains option of the Switch / Case step or by using other steps. In more complex processing, you should use mappings (sub-transformations) for the different data blocks so they are clearly represented.

Here is the XML sample with different element blocks:
<?xml version="1.0" encoding="UTF-8"?>
<ProductInformation ExportTime="2010-11-23 23:56:40"
    ExportContext="german" ContextID="german" WorkspaceID="Test" id="1"
    parent="0">
    <AnalyzerResult>
        <AnalyzerLists>
            <AnalyzerList name="items.added">
                <AnalyzerElement ItemID="product?id=123456"
                    ProductID="123456" />
                <AnalyzerElement ItemID="product?id=789"
                    ProductID="789" />
            </AnalyzerList>
            <AnalyzerList name="items.deleted">
                <AnalyzerElement ItemID="product?id=111111"
                    ProductID="111111" />
                <AnalyzerElement ItemID="product?id=222222"
                    ProductID="222222" />
            </AnalyzerList>
            <AnalyzerList name="items.dummy_test">
                <AnalyzerElement ItemID="product?id=test1"
                    ProductID="test1" />
                <AnalyzerElement ItemID="product?id=test2"
                    ProductID="test2" />
            </AnalyzerList>
        </AnalyzerLists>
        <AnalyzerDummyTest>
            <AnalyzerDummyTest name="Dummy not processed" />
        </AnalyzerDummyTest>
    </AnalyzerResult>
    <Products>
        <Product id="123456" name="Product A">
            <MetaData>
                <Value AttributeID="AttrA">false</Value>
                <Value AttributeID="AttrB">true</Value>
                <Value AttributeID="AttrShortName">
                    Product A Short Name
                </Value>
                <Value AttributeID="AttrLongName">
                    Product A Long Name
                </Value>
            </MetaData>
        </Product>
        <Product id="789" name="Product B">
            <MetaData>
                <Value AttributeID="AttrA">true</Value>
                <Value AttributeID="AttrB">false</Value>
                <Value AttributeID="AttrShortName">
                    Product B Short Name
                </Value>
                <Value AttributeID="AttrLongName">
                    Product B Long Name
                </Value>
            </MetaData>
        </Product>
    </Products>
</ProductInformation>

A preview of the step looks like the image below, depending on the selected fields:
Step preview

Note that you can see the original streaming information, elements, and attributes from the XML file, and other helpful fields like the element level.

The transformation looks like the image below:
Example transformation

The result for the Analyzer List block looks like this:
Analyzer lists results

The result for the Products block (split into two separate data streams for the end system) looks like the image below:
Products results