Tuesday, July 5, 2016

How do you get at the details of the XML content in R?

In a 2006 post regarding getting the innards of an XML document, I had used Microsoft.XMLDOM as an ActiveX object.

The Document is the root of an XML Document and has constituent parts, Element, Node, Attribute and Text. In the present post I am using the same XML document I probed in my previous posts shown here.


In a recent post I showed how to parse (see here as well)  an XML document using R. In this post I describe how to get the 'innards' of an XML document using the function xmlSApply().

With these three lines of statements in R you can parse the XML document here and here.
> library(XML)
>fileName="C:\\Users\\Jayaram\\Desktop\\SQLServer2016D\\R Server\\Mystudents.xml"
> xmlFile

The xmlFile has the document.

The document (r) is the root of the XML document and you get it using:
-------------
> r < -xmlRoot(xmlFile)
-----------<-xmlroot br="" xmlfile="">
Now you can look at the details of the elements as shown here using xmlName, xmlValue, xmlAttrs and xmlChildren:
---------
> xmlSApply(r[[2]], xmlName)
         name   legacySkill
       "name" "legacySkill"
--------------------
xmlSApply(r[[1]],xmlValue)
           name     legacySkill
  "Linda Jones" "Access, VB5.0"
------------
> xmlSApply(r[[1]],xmlAttrs)$name
NULL

$legacySkill
NULL
---------------
> xmlSApply(r[[1]], xmlChildren)
$name.text
Linda Jones

$legacySkill.text
Access, VB5.0

>

No comments: