Friday, July 1, 2016

How to parse a XML document in R?

This post shows how to parse a XML document in R. I often use the following simple xml file for my posts and examples. Please find some precautions at the end of this post.


MyStudentsXMLDoc

If your xml doccument is not on the local server, you can easily place it on the server by copying and pasting the xml document(file) to the local server root (inetpub/wwwroot).

The steps to parse use the xml function xmlTreeParse as shown by the following:
The program uses the readLines() function to read the document as shown and you code these after launching R Studio or R Gui.


MyStudnetsXMLDoc_2

You immediately get the following response as shown here:

$doc
$file
[1] ""

$version
[1

$children
$children$wclass

 

  Linda Jones
  Access, VB5.0
 

 
  Adam Davidson
  Cobol, Mainframe
 

 
  Charles Boyer
  HTML, Photoshop
 

 
  Charles Amos
  Cobol, Mainframe
 


attr(,"class")
[1] "XMLDocumentContent"

$dtd
$external
NULL

$internal
NULL

attr(,"class")
[1] "DTDList"


1. R programs are case sensitive and pay attention to how they are typed-in
2. The Mystudents.xml file should have a final carriage return. If not you will end up with an error:
 incomplete final line found on 'http://localhost/Mystudents.xml'


No comments: