A Decision Tree is a decision support tool which is a graphical representation or a model of possible solutions to a decision based on certain condition.
It is called a decision tree because it initially starts from a single box (or root), which then branches off into a number of solutions, just like branches on a tree.
Decision trees are mainly used in operations research and more specifically in decision analysis.
It helps us to identity and decide a strategy most likely to reach a goal also being a popular tool in machine learning.
In the following demonstration we will use R programming to perform this test
Initial Environment Setup
But before we get into creating a program let’s just first set up a proper environment for R programming.
To install R programming tool go to the following link and download the software and install it.
For Windows Users:
https://ftp.iitm.ac.in/cran/
And Type and Run the following code RGui
install.packages(“ggplot2”)
install.packages(“party”)
For Linux Users:
If you are on Linux platform then you can use this fast and easy command used in Linux which can be used to install R. The yum command is used for installing like this:
$ yum install R
For Ubuntu Linux or other Debian-related OSs, a more direct method is:
$ apt-get install r-base
R has a package called “party” has the function ctree() which is used to create and analyse decision tree.
The most important syntax for creating decision tree is as follows
ctree(formula, data)
Where Formula is used to describe the predictor and response variables it has to offer while data represents the name of the dataset used.
We can use the R in-built data set named
readingSkills which we will use to create a decision tree.
The Data describes the score of someone’s
readingSkills if we know the variables
“age”,”shoesize”,”score” also whether the person is a native speaker of the language or not.
Let’s start coding in RGui
# Load the party package. It will automatically load other dependent packages.
library(party)
# Print some records from data set readingSkills.
print(head(readingSkills))
The data in our dataframe readingSkills is read using the above command and printed on the console as shown below:
Using ctree() to generate a decision tree we have include in the code below
# Load the party package. It will automatically load other dependent packages
library(party)
# Create the input data frame.
input.dat <- readingSkills[c(1:105),]
print(input.dat)
# Create the tree.
output.tree <- ctree(
nativeSpeaker ~ age + shoeSize + score,
data = input.dat)
# Plot the tree.
plot(output.tree)
# Give the chart file a name.
png(file = “decision_tree.png”)
# Save the file.
dev.off()
On executing the code we get the following Output:
From the above chart of decision tree we can conclude that anyone whose reading skills are less than 38.306 and age more than 6 is not a native speaker of the language.
Syllabus of Data Science training in Mumbai