Data Science Training in Mumbai : – Reading XML files in R Programming using Visual Studio.

XML is an official approved W3C file format which shares both the data and file format on the open internet aka World Wide Web, intranets, and elsewhere using standard ASCII text.

Here XML stands for Extensible Mark-up Language which similar to HTML it contains mark-up tags.

But unlike HTML where the mark-up tag describes structure of the page where as in XML the mark-up tags describe the meaning of the data contained into our file.

1

The design goals of XML emphasize generality, simplicity, and it also implements usability across the Internet.

XML is a textual data format with strong support via Unicode for different human languages.

The language is widely used for the representation of arbitrary data structures such as those used in web services although the design of XML focuses on documents.

To create and run our R program in Visual Studio we must first install some tools.

Now open Visual Studio on your pc and then click on Get Tools and features… inside Tools.

2

Now in the feature installation window check and install the following features

For Visual Studio 2017

Contains Tools for Data Science

3



After this please install R tools from the following link if the above package is not installed properly from the following link

But it is not necessary

https://docs.microsoft.com/en-us/visualstudio/rtvs/installing-r-tools-for-visual-studio

You can also install R programming tool go to the following link for better support and download the software and install it.

For Windows Users:

https://ftp.iitm.ac.in/cran/

4





Type the following command in Visual Studio or R Interactive:

install.packages(“XML”)

For Linux Users:

If you are on Linux platform then you can use this fast and easy command used in Linux which can be used to install R. The yum command is used for installing like this:
$ yum install R

For Ubuntu Linux or other Debian-related OSs, a more direct method is:
$ apt-get install r-base

Now let us start creating our R application in visual studio
Go to Files ->Project and add a new R project

5

This is our project structure:

6

Now on the top left side is a new R file (script.R) where we can edit source code with all of Visual Studio Ide editing features.
Also on the bottom left of Visual Studio is where you can find an R Interactive window in which you can interactively develop and test code.

We can directly use R Interactive without the need of opening any new project.

Before we can start coding it is required to create a XML file in our desired location with .xml extension and use the following dummy data in it.

<RECORDS>
   <EMPLOYEE>
      <ID>1</ID>
      <NAME>VV</NAME>
      <SALARY>623.3</SALARY>
      <STARTDATE>1/1/2012</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>
<EMPLOYEE>
      <ID>2</ID>
      <NAME>ww</NAME>
      <SALARY>515.2</SALARY>
      <STARTDATE>9/23/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>
<EMPLOYEE>
      <ID>3</ID>
      <NAME>MM</NAME>
      <SALARY>611</SALARY>
      <STARTDATE>11/15/2014</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>
<EMPLOYEE>
      <ID>4</ID>
      <NAME>Ryan</NAME>
      <SALARY>729</SALARY>
      <STARTDATE>5/11/2014</STARTDATE>
      <DEPT>HR</DEPT>
   </EMPLOYEE>
<EMPLOYEE>
      <ID>5</ID>
      <NAME>Gary</NAME>
      <SALARY>843.25</SALARY>
      <STARTDATE>3/27/2015</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>
<EMPLOYEE>
      <ID>6</ID>
      <NAME>Nina</NAME>
      <SALARY>578</SALARY>
      <STARTDATE>5/21/2013</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>
<EMPLOYEE>
      <ID>7</ID>
      <NAME>Simon</NAME>
      <SALARY>632.8</SALARY>
      <STARTDATE>7/30/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>
<EMPLOYEE>
      <ID>8</ID>
      <NAME>Guru</NAME>
      <SALARY>722.5</SALARY>
      <STARTDATE>6/17/2014</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>
	
</RECORDS>

Now let’s look at the code

# Load the package required to read XML files.
library(“XML”)

# Also load the other required package.
library(“methods”)

# Give the input file name to the function and 1.xml is our xml file that we have created with the data shown above.
result <- xmlParse(file = "D:\\Data Science\\RProgramme\\1.xml")

# Print the result.
print(result)

Output:

<?xml version="1.0"?>
<RECORDS>
  <EMPLOYEE>
    <ID>1</ID>
    <NAME>VV</NAME>
    <SALARY>623.3</SALARY>
    <STARTDATE>1/1/2012</STARTDATE>
    <DEPT>IT</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>2</ID>
    <NAME>ww</NAME>
    <SALARY>515.2</SALARY>
    <STARTDATE>9/23/2013</STARTDATE>
    <DEPT>Operations</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>3</ID>
    <NAME>MM</NAME>
    <SALARY>611</SALARY>
    <STARTDATE>11/15/2014</STARTDATE>
    <DEPT>IT</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>4</ID>
    <NAME>Ryan</NAME>
    <SALARY>729</SALARY>
    <STARTDATE>5/11/2014</STARTDATE>
    <DEPT>HR</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>5</ID>
    <NAME>Gary</NAME>
    <SALARY>843.25</SALARY>
    <STARTDATE>3/27/2015</STARTDATE>
    <DEPT>Finance</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>6</ID>
    <NAME>Nina</NAME>
    <SALARY>578</SALARY>
    <STARTDATE>5/21/2013</STARTDATE>
    <DEPT>IT</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>7
    <NAME>Simon</NAME>
    <SALARY>632.8</SALARY>
    <STARTDATE>7/30/2013</STARTDATE>
    <DEPT>Operations</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>8</ID>
    <NAME>Guru</NAME>
    <SALARY>722.5</SALARY>
    <STARTDATE>6/17/2014</STARTDATE>
    <DEPT>Finance</DEPT>
  </EMPLOYEE>
</RECORDS>

# Getting the first node in the file
rootnode <- xmlRoot(result)
print(rootnode[1])

O/P:

7







# Find number of nodes in the root.
rootsize <- xmlSize(rootnode)

# Print the result.
print(rootsize)

O/P:

8



# converting XML data to Dataframe
xmldataframe <- xmlToDataFrame("D:\\Data Science\\RProgramme\\1.xml") print(xmldataframe)

O/P:

9







Syllabus of Data Science training in Mumbai

Comments

comments

This entry was posted in Class Room Training and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *