Mapping Between XML Nodes and Data Packet Fields
Go Up to Defining Transformations
XML provides a text-based way to store or describe structured data. Datasets provide another way to store and describe structured data. Therefore, to convert an XML document into a dataset, you must identify the correspondences between the nodes in an XML document and the fields in a dataset.
Consider, for example, an XML document that represents a set of email messages. It might look like the following (containing a single message):
<?xml version="1.0" standalone="yes" ?> <email> <head> <from> <name>Dave Boss</name> <address>[email protected]</address> </from> <to> <name>Joe Engineer</name> <address>[email protected]</address> </to> <cc> <name>Robin Smith/name> <address>[email protected]</address> </cc> <cc> <name>Leonard Devon</name> <address>[email protected]</address> </cc> </head> <body> <subject>XML components</subject> <content> Joe, Attached is the specification for the XML component support in Delphi. This looks like a good solution to our buisness-to-buisness application! Also attached, please find the project schedule. Do you think its reasonable? Dave. </content> <attachment attachfile="XMLSpec.txt"/> <attachment attachfile="Schedule.txt"/> </body> </email>
One natural mapping between this document and a dataset would map each e-mail message to a single record. The record would have fields for the sender's name and address. Because an e-mail message can have multiple recipients, the recipient (<to> would map to a nested dataset. Similarly, the cc list maps to a nested dataset. The subject line would map to a string field while the message itself (<content>) would probably be a memo field. The names of attachment files would map to a nested dataset because one message can have several attachments. Thus, the e-mail above would map to a dataset something like the following:
SenderName | SenderAddress | To | CC | Subject | Content | Attach |
---|---|---|---|---|---|---|
Dave Boss |
(DataSet) |
(DataSet) |
XML components |
(MEMO) |
(DataSet) |
where the nested dataset in the "To" field is
Name | Address |
---|---|
Joe Engineer |
the nested dataset in the "CC" field is
Name | Address |
---|---|
Robin Smith |
|
Leonard Devon |
and the nested dataset in the "Attach" field is
Attachfile |
---|
XMLSpec.txt |
Schedule.txt |
Defining such a mapping involves identifying those nodes of the XML document that can be repeated and mapping them to nested datasets. Tagged elements that have values and appear only once (such as <content>...</content>) map to fields whose datatype reflects the type of data that can appear as the value. Attributes of a tag (such as the AttachFile attribute of the attachment tag) also map to fields.
Note that not all tags in the XML document appear in the corresponding dataset. For example, the <head>...<head/> element has no corresponding element in the resulting dataset. Typically, only elements that have values, elements that can be repeated, or the attributes of a tag, map to the fields (including nested dataset fields) of a dataset. The exception to this rule is when a parent node in the XML document maps to a field whose value is built up from the values of the child nodes. For example, an XML document might contain a set of tags such as
<FullName> <Title> Mr. </Title> <FirstName> John </FirstName> <LastName> Smith </LastName> </FullName>
which could map to a single dataset field with the value
Mr. John Smith