0% found this document useful (0 votes)
51 views9 pages

Name: Suprit Darshan Shrestha Reg - no:19BCE2584: Lab DA1 Machine Learning Lab

The document summarizes machine learning experiments on a dataset using decision tree algorithms like ID3 and CART. It performs data cleaning, splits the data into training and test sets, builds decision tree models and evaluates their accuracy. It also visualizes the decision tree using scikit-learn and pydotplus to get a clean visualization. The conclusion states that decision trees are simple yet effective tools for discrimination problems that can be easily explained to non-experts without complex math.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views9 pages

Name: Suprit Darshan Shrestha Reg - no:19BCE2584: Lab DA1 Machine Learning Lab

The document summarizes machine learning experiments on a dataset using decision tree algorithms like ID3 and CART. It performs data cleaning, splits the data into training and test sets, builds decision tree models and evaluates their accuracy. It also visualizes the decision tree using scikit-learn and pydotplus to get a clean visualization. The conclusion states that decision trees are simple yet effective tools for discrimination problems that can be easily explained to non-experts without complex math.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Lab DA1

Machine Learning Lab


Name: Suprit Darshan Shrestha
Reg.no:19BCE2584
Data cleaning

Combined different columns with close and similar values to dcrease the
number of columns.

S_algo:
Code:
import csv
a = []
with open('Data1.csv', 'r') as csvfile:
next(csvfile)
for row in csv.reader(csvfile):
a.append(row)
print(a)

print("\nThe total number of training instances are : ",len(a))

num_attribute = len(a[0])-1

print("\nThe initial hypothesis is : ")


hypothesis = ['0']*num_attribute
print(hypothesis)

for i in range(0, len(a)):


if a[i][num_attribute] == "0":
print ("\nInstance ", i+1, "is", a[i], " and is Positive Instance")
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("The hypothesis for the training instance", i+1," is: " , hypothesis, "\n")

if a[i][num_attribute] == "1":
print ("\nInstance ", i+1, "is", a[i], " and is Negative Instance Hence Ignored")
print("The hypothesis for the training instance", i+1," is: " , hypothesis, "\n")

print("\nThe Maximally specific hypothesis for the training instance is ", hypothesis)
Output:
Decision Tree:
Code:
import pandas as pd
dataset=pd.read_csv('Data1.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.tree import DecisionTreeClassifier


classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,


max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0,
random_state=0, splitter='best')

y_pred = classifier.predict(X_test)
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
Output:

ID3:
Code:
import pandas as pd
import math
import numpy as np

data = pd.read_csv('Data1.csv')
features = [feat for feat in data]
features.remove("Diagnosis")

class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""

def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["Diagnosis"] == 0:
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))

def info_gain(examples, attr):


uniq = np.unique(examples[attr])
print ("\n",uniq)
gain = entropy(examples)
#print ("\n",gain)
for u in uniq:
subdata = examples[examples[attr] == u]
#print ("\n",subdata)
sub_e = entropy(subdata)
gain -= (float(len(subdata)) / float(len(examples))) * sub_e
#print ("\n",gain)
return gain

def ID3(examples, attrs):


root = Node()

max_gain = 0
max_feat = ""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq = np.unique(examples[max_feat])
#print ("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] == u]
#print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["Diagnosis"])
root.children.append(newNode)
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root

def printTree(root: Node, depth=0):


for i in range(depth):
print("\t", end="")
print(root.value, end="")
if root.isLeaf:
print(" -> ", root.pred)
print()
for child in root.children:
printTree(child, depth + 1)

root = ID3(data, features)


printTree(root)
Output:
Clean visualization:
Code:
feature_cols = ['area','peri','ECC','solidity','orient_wbc','nuc,AVG','entropy_cyt','AXIS','Diagnosis']

from sklearn.datasets import load_iris


from sklearn import tree
import six
import sys
sys.modules['sklearn.externals.six'] = six
from sklearn.tree import export_graphviz
from sklearn.externals.six import StringIO
from IPython.display import Image
import pydotplus
dot_data = StringIO()
clf = DecisionTreeClassifier()
clf = clf.fit(X_train,y_train)
export_graphviz(clf, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names =feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('diabetes.png')
Image(graph.create_png())
Output:
Conclusion:
Decision trees are simply responding to a discriminating problem, and they are one of the few approaches
for processing data that can be presented rapidly to a non-specialist public without getting lost in hard
mathematical formulas.

You might also like