XML
Parser is an Active and Connected transformation.
XML
Parser transformation is use to extract XML inside a pipeline. The XML Parser
transformation lets you extract XML data from messaging systems, such as TIBCO
or MQ Series, and from other sources, such as files or databases. The XML
Parser transformation functionality is similar to the XML source functionality,
except it parses the XML in the pipeline.
For
example, you might want to extract XML data from a Flat File (data will be in
XML Format) and pass the data to relational targets.
The XML Parser
transformation reads XML data from a single input port and writes data to one
or more output ports.
When
the Integration Service processes an XML Parser transformation, it reads a row
of XML data, parses the XML, and returns data through output groups. The XML
Parser transformation returns non-XML data in pass-through ports. You can parse
XML messages from sources such as JMS or IBM WebSphere MQ.
The
XML Parser transformation has one input group and one or more output groups.
The input group has one input port, DataInput, which accepts an XML document in
a string.
When
we create an XML Parser transformation, use the XML Wizard to import an XML,
DTD, or XML schema file. For example, you can import the following Employee DTD
file:
<!ELEMENT
EMPLOYEES (EMPLOYEE+)>
<!ELEMENT
EMPID CDATA #REQUIRED>
<!ELEMENT
ENAME (#PCDATA)>
<!ELEMENT
JOB (#PCDATA)>
<!ELEMENT
MGR (#PCDATA)>
<!ELEMENT
HIREDATE (#PCDATA)>
<!ELEMENT
SAL #CDATA #REQUIRED>
<!ELEMENT
COMM (#PCDATA)>
<!ELEMENT
DEPTNO CDATA #REQUIRED>
The
Designer creates a root view, X_1. X_1 is the parent of X_0.
Each
view in the XML Parser transformation has at least one key to establish its
relationship with another view. If we do not designate the keys in the XML
Editor, the Designer creates the primary and foreign keys for each view. The
keys are of datatype bigint. The keys are called generated keys because the
Integration Service creates the key values each time it returns a row from the
XML Parser transformation.
When
the Designer creates a primary or foreign key column, it assigns a column name
with a prefix. In an XML definition, the prefix is XPK_ for a generated
primary key column and XFK_ for a generated foreign key column. A foreign key
always refers to a primary key in another group. A generated foreign key column
always refers to a generated primary key column.
For
example, the group X_0 has the XPK_COLUMN primary key. Each group has the
foreign key column XFK_0.
The
repository stores the key values. You cannot change the values in the
repository, but you can choose to reset or restart the sequence numbers after a
session.
XML
Parser Input Validation
We
can configure the XML Parser transformation to validate XML before parsing it.
The XML Parser transformation validates the XML against a schema. If the XML is
not valid for the schema, a row error occurs. The XML Parser transformation
returns the XML and associated error messages to a separate output group. You
can pass the invalid XML and error message to a target.
The
session runs with a source-based commit. A message in the commit transaction
has an invalid XML payload. To prevent the commit from failing, you can
configure the XML Parser transformation to return the invalid XML to a separate
output group from the valid data. The XML Parser transformation processes the
valid XML messages and completes the transaction.
The
session log contains a message that indicates when Route Invalid Payload
Through Data Flow is enabled. When you set the session tracing level to Normal,
the Integration Service writes a message to the session log that indicates
whether the validation is successful. The log message contains the location of
the schema the XML Parser accessed to validate the XML. When XML streaming is
enabled and the XML is invalid, the Integration Service truncates the XML and
passes it to the Invalid_Payload port. The Integration Service logs the invalid
XML in the session log.
To
configure the XML Parser transformation to validate the XML, enable the Route
Invalid Payload Through Data Flow option on the Midstream XML Parser tab. The
Designer adds the following ports to the XML Parser transformation:
- Invalid_Payload.
Returns invalid XML messages to the pipeline. If the XML payload is valid,
the Invalid_Payload port contains a null value. This port has the same
precision as the DataInput port.
- Error_Status.
Contains the error string or status returned from the XML validation. If
the XML is valid for the current row, Error_Status contains a null value.
This port has the same precision as the DataInput port.
The
following mapping shows an XML Parser transformation that routes invalid XML
messages to an Errors target table:
a) XML Parser transformation. Receives
the XML message data in the DataInput port. When the XML is valid,
the XML Parser transformation returns the employee (X_0) data and passes
it to a target. When the XML is not valid, the XML Parser transformation
returns the XML in the Invalid_Payload port. It returns an error message in the
Error_Status port.
b) XML_Errors target definition. Receives invalid XML
and error messages.
Sample Mapping:
1.
Connect and Open the folder if not already opened.
2.
Select Tools --> Mapping Designer
3.
Select Mappings --> Create
It
will pop-up "Mapping Name". Enter the mapping name of your
choice" and Click on 'OK'. Example: m_employee_parsar
4.
Drag the Source and Target definitions into workspace if they are already exist.
If not click here to know how to create or import Table definitions.
Note:
a)
In General Source definition will have only one port/column, datatype string
with length max 64000.
b)
Here we have 2 target file, EMPLOYEE target for output records and
XML_Parser_Error for Parser Error records.
a)
That will appear you 'Select the transformation type to create:'
b)
Select 'XML Parser' from drop down and 'Enter a new name for this
transformation:' as "EMP_PARSER"
c)
Click 'Create' then it will pop-up 'Import XML Definition'
d)
Give any rootname in Optional XML Rootname: 'EMPLOYEE' as highlighted in red above
e)
Select 'Non-XML Source' as highlighted in red above and
Select Scott.EMP and click on >> as highlighted
in Green
f)
Select Scott.EMP from right side window 'Selected Source' and click on 'Open'
g) Click
on No
h)
XML Wizard (step 1) will pop-up then click 'Next'
i) Select
'Entity Relationships', Click 'Finish' and Click 'Done'
or
b)
Click in the workspace in Mapping Designer then it will pop-up 'Import XML
Definition'
c)
Give any rootname in Optional XML Rootname: 'EMPLOYEE' as highlighted in red above
d) Select
'Non-XML Source' as highlighted in red above and Select
Scott.EMP and click on >> as highlighted in Green
e)
Select Scott.EMP from right side window 'Selected Source' and click on 'Open'
f)
Click on No
g) XML
Wizard (step 1) will pop-up then click 'Next'
h)
Select 'Entity Relationships' and Click 'Finish'
i)
Select ' EMP' XML Parser transformation in workspace and Right Click -->
Edit.
j)
In Transformation tab --> Click on 'Rename' highlighted above which will
pop-up 'Rename Transformation'. Enter the Transformation Name:
"EMP_PARSER"
k) Click
on 'OK'
l)
Click on 'Apply' and 'OK'.
6)
Select 'EMP_PARSER' xml parser transformation and click on edit -->
Midstream XML Parser tab. From Properties option select 'Restart and Route
invalid payload through data flow' options as below.
Click
on 'Apply' and 'OK'
7)
Drag 'Field1' from ' SQ_x_employee' Source Qualifier to ' EMP_PARSER' XML Parser
Transformation.
8)
Drag required port from 1st group of from XML Parser transformation to Target
Employee (file) and 'ErrorOutputGroup' to XML_Parser_Error target.
9)
Select 'Mapping' from Menu --> Validate.
10)
Select 'Repository' from Menu --> Save.
very good thank you.
ReplyDeletethank you very much
ReplyDeleteHi,
ReplyDeleteMay i knw how to add new field(input and output fields apart from default) to xml parser trans.?
my requirment is xml data is coming from Oracle table. here there is one field xml_text having xml content and app_id having application id related to xml content. now i want to validate the xml content trhu parsert trans then if it is invalid i want to separate those xml content along with app_id value.
in the normal procedure i can track only xml content but how could i also link app_id field from xml parser to next trans/target.
Much appriciate on your quick help.
Did you get ur answer already?
DeleteHi, I also have the same issue Like Abhi. Can someone suggest the soluntion.??
DeleteCan some one help me in get the solution for abhi's Query
ReplyDeleteHey Swaroop,
ReplyDeleteCan you explain you requirement clearly with some example, I can help you.
Thanks,
Gowtham
Very very clear explanation Gowtham
ReplyDeleteThank you for a clear explanation
ReplyDeleteI have the .xml file as source then sam ecan be done through XML Source qaualifier right?
ReplyDeleteThank you.Well it was nice post and very helpful information on Informatica Online Training Hyderbad
ReplyDelete