Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Goal 

The XML Parser uses XPath to extract field from a complex XML Event. This is generally used in conjunction with the XML Source Reader. The XML Source Reader will provide individual events to the XML Parser and the XML Parser is responsible for extracting fields from the event and mapping them to output schema. 

A simple example - assume you have a XML Event that looks as follows:

<employee>
  <name>
   <first>Joltie</first>
   <last>Root</last>
  </name>
  <address>
   <street> 180, Mars Ave </street>
   <city> Marsville </city>
   <state> Marscity </state>
   <country> M.A.R.S </country>
   <coordinates>
     <lat>89</lat>
     <long>117</long>
   </coordinate>
  </address>
  <dob>
    <day>1</day>
    <month>Jan</month>
    <year>2177</year>
  </dob>
</employee>

User wants to extract following fields from the XML event.

  • first
  • last
  • lat
  • long
  • dob year

User uses the following XPath to extract those fields

/employee/name/first
/employee/name/last
/employee/address/coordinates/lat
/employee/address/coordinates/long
/employee/dob/year

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature

Use-case

  • User should be able to specify the input field that should be considered as source of XML event or record.
  • User is able to specify XML encoding (default is UTF-8)
  • The Plugin should ignore comments in XML
  • User is able to specify a collection of XPath to output field name mapping
    • User is able to extract values from Attribute (as supported by XPath)
    • User is NOT able to XPaths that are arrays. It should be runtime error. 
  • User is able to specify the output field types and the plugin performs appropriate conversions
  • User is able to specify what should happen when there is error in processing
    • User can specify that the error record should be ignored
    • User can specify that upon error it should stop processing
    • User can specify that all the error records should be written to separate dataset

 

Design


Example

 

Questions/Clarifications

Clarifications:

  1. For defining the output field types, field names and xpath value, following approach can be used:
    1. Common widget with 2 text boxes and a drop down   or 
    2. key value widget to take the output field name and xpath expression, and a second output schema widget
  2. User is able to specify what should happen when there is error in processing. Errors could be:

    1. IllegalCharacter

    2. Type conversion error

    3. NULL or EMPTY value for non nullable column value

  • No labels