Entropy and Information Gain in Decision Tree

  • Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute.
  • It calculates how much information a feature provides us about a class.
  • According to the value of information gain, we split the node and build the decision tree.
  • A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula:
    • Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)  
    • Entropy:Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be calculated as:
    • Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

    Where,

    • S= Total number of samples
    • P(yes)= probability of yes
    • P(no)= probability of no

Leave a Reply

Your email address will not be published. Required fields are marked *