A=c("f","t","t","f","t","f","f","f","t","f")
B=c("t","t","t","t","t","f","f","f","t","t")
class=c("+","+","+","-","+","-","-","-","-","-")
df=data.frame(A,B,class)
df
A B class
1 f t +
2 t t +
3 t t +
4 f t -
5 t t +
6 f f -
7 f f -
8 f f -
9 t t -
10 f t -
I partitioned attribute A or B due to the class as follows :
{A}
[T , F]
/ \
------- -------
[3+,1-] [1+,5-]
{B}
[T , F]
/ \
------- -------
[4+,3-] [0+,3-]
depending on the above formula I calculated entropy by this code in R .
1- for attribute A
t=table(A,class)
t
class
A - +
f 5 1
t 1 3
prop1=t[1,]/sum(t[1,])
prop1
- +
0.8333333 0.1666667
prop2=t[2,]/sum(t[2,])
prop2
- +
0.25 0.75
H1=-(prop1[1]*log2(prop1[1]))-(prop1[2]*log2(prop1[2]))
H1
0.6500224
H2=-(prop2[1]*log2(prop2[1]))-(prop2[2]*log2(prop2[2]))
H2
0.8112781
entropy=(table(A)[1]/length(A))*H1 +(table(A)[2]/length(A))*H2
entropy
0.7145247
2- for attribute B
t=table(B,class)
t
class
B - +
f 3 0
t 3 4
prop1=t[1,]/sum(t[1,])
prop1
- +
1 0
prop2=t[2,]/sum(t[2,])
prop2
- +
0.4285714 0.5714286
H1=-(prop1[1]*log2(prop1[1]))-(prop1[2]*log2(prop1[2]))
H1
NaN
H2=-(prop2[1]*log2(prop2[1]))-(prop2[2]*log2(prop2[2]))
H2
0.9852281
entropy=(table(B)[1]/length(B))*H1 +(table(B)[2]/length(B))*H2
entropy
NaN
when I calculate entropy for attribute B the result give me NaN that is due to zero(0) (log2(0) is error ) . in such situation how can I fix this error or how can make H1
give me zero instead of NaN