Wednesday, July 8, 2020

Decision Tree

Decision Tree Decision Tree: How To Create A Perfect Decision Tree? Back Home Categories Online Courses Mock Interviews Webinars NEW Community Write for Us Categories Artificial Intelligence AI vs Machine Learning vs Deep LearningMachine Learning AlgorithmsArtificial Intelligence TutorialWhat is Deep LearningDeep Learning TutorialInstall TensorFlowDeep Learning with PythonBackpropagationTensorFlow TutorialConvolutional Neural Network TutorialVIEW ALL BI and Visualization What is TableauTableau TutorialTableau Interview QuestionsWhat is InformaticaInformatica Interview QuestionsPower BI TutorialPower BI Interview QuestionsOLTP vs OLAPQlikView TutorialAdvanced Excel Formulas TutorialVIEW ALL Big Data What is HadoopHadoop ArchitectureHadoop TutorialHadoop Interview QuestionsHadoop EcosystemData Science vs Big Data vs Data AnalyticsWhat is Big DataMapReduce TutorialPig TutorialSpark TutorialSpark Interview QuestionsBig Data TutorialHive TutorialVIEW ALL Blockchain Blockchain TutorialWhat is BlockchainHyperledger FabricWhat Is EthereumEthereum TutorialB lockchain ApplicationsSolidity TutorialBlockchain ProgrammingHow Blockchain WorksVIEW ALL Cloud Computing What is AWSAWS TutorialAWS CertificationAzure Interview QuestionsAzure TutorialWhat Is Cloud ComputingWhat Is SalesforceIoT TutorialSalesforce TutorialSalesforce Interview QuestionsVIEW ALL Cyber Security Cloud SecurityWhat is CryptographyNmap TutorialSQL Injection AttacksHow To Install Kali LinuxHow to become an Ethical Hacker?Footprinting in Ethical HackingNetwork Scanning for Ethical HackingARP SpoofingApplication SecurityVIEW ALL Data Science Python Pandas TutorialWhat is Machine LearningMachine Learning TutorialMachine Learning ProjectsMachine Learning Interview QuestionsWhat Is Data ScienceSAS TutorialR TutorialData Science ProjectsHow to become a data scientistData Science Interview QuestionsData Scientist SalaryVIEW ALL Data Warehousing and ETL What is Data WarehouseDimension Table in Data WarehousingData Warehousing Interview QuestionsData warehouse architectureTalend T utorialTalend ETL ToolTalend Interview QuestionsFact Table and its TypesInformatica TransformationsInformatica TutorialVIEW ALL Databases What is MySQLMySQL Data TypesSQL JoinsSQL Data TypesWhat is MongoDBMongoDB Interview QuestionsMySQL TutorialSQL Interview QuestionsSQL CommandsMySQL Interview QuestionsVIEW ALL DevOps What is DevOpsDevOps vs AgileDevOps ToolsDevOps TutorialHow To Become A DevOps EngineerDevOps Interview QuestionsWhat Is DockerDocker TutorialDocker Interview QuestionsWhat Is ChefWhat Is KubernetesKubernetes TutorialVIEW ALL Front End Web Development What is JavaScript รข€" All You Need To Know About JavaScriptJavaScript TutorialJavaScript Interview QuestionsJavaScript FrameworksAngular TutorialAngular Interview QuestionsWhat is REST API?React TutorialReact vs AngularjQuery TutorialNode TutorialReact Interview QuestionsVIEW ALL Mobile Development Android TutorialAndroid Interview QuestionsAndroid ArchitectureAndroid SQLite DatabaseProgramming Data Science... Researc h Analyst, Tech Enthusiast, Currently working on Azure IoT Data Science with previous experience in Data Analytics Business Intelligence.2 Comments Bookmark 9 / 11 Blog from Supervised Learning Become a Certified Professional A Decision Tree has many analogies in real life and turns out, it has influenced a wide area of Machine Learning, covering both Classificationand Regression. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making.So the outline of what Ill be covering in this blog is as follows.What is a Decision Tree?Advantages and Disadvantages of a Decision TreeCreating a Decision TreeWhat is a Decision Tree?A decision tree is a map of the possible outcomes of a series of related choices. It allows an individual or organization to weigh possible actions against one another based on their costs, probabilities, and benefits. As the name goes, it uses a tree-like model of decisions.They can be used either to dri ve informal discussion or to map out an algorithm that predicts the best choice mathematically.A decision tree typically starts with a single node, which branches into possible outcomes. Each of those outcomes leads to additional nodes, which branch off into other possibilities. This gives it a tree-like shape.There are three different types of nodes: chance nodes, decision nodes, and end nodes. A chance node, represented by a circle, shows the probabilities of certain results. A decision node, represented by a square, shows a decision to be made, and an end node shows the final outcome of a decision path.Advantages Disadvantages of Decision TreesAdvantagesDecision trees generate understandable rules.Decision trees perform classification without requiring much computation.Decision trees are capable of handling both continuous and categorical variables.Decision trees provide a clear indication of which fields are most important for prediction or classification.DisadvantagesDecision trees are less appropriate for estimation tasks where the goal is to predict the value of a continuous attribute.Decision trees are prone to errors in classification problems with many class and a relatively small number of training examples.Decision trees can be computationally expensive to train. The process of growing a decision tree is computationally expensive. At each node, each candidate splitting field must be sorted before its best split can be found. In some algorithms, combinations of fields are used and a search must be made for optimal combining weights. Pruning algorithms can also be expensive since many candidate sub-trees must be formed and compared.Creating a Decision TreeLet us consider a scenario where a new planet is discovered by a group of astronomers. Now the question is whether it could be the next earth? The answer to this question will revolutionize the way people live. Well, literally!There is n number of deciding factors which need to be thoroughly resear ched to take an intelligent decision. These factors can be whether water is present on the planet, what is the temperature, whether the surface is prone to continuous storms, flora and fauna survives the climate or not, etc.Let us create a decision tree to find out whether we have discovered a new habitat.The habitable temperature falls into the range 0 to 100 Celsius.Whether water is present or not?Whether flora and fauna flourishes?The planet has a stormy surface?Thus, we a have a decision tree with us. Classification Rules:Classification rules are the cases in which all the scenarios are taken into consideration and a class variable is assigned to each.Class Variable:Each leaf node is assigned a class-variable. A class-variable is the final output which leads to our decision.Let us derive the classification rules from the Decision Tree created:1. If Temperature is not between 273 to 373K, - Survival Difficult2.If Temperature is between 273 to 373K, and water is not present, - Sur vival Difficult3.If Temperature is between 273 to 373K, water is present, and flora and fauna is not present - Survival Difficult4.If Temperature is between 273 to 373K, water is present, flora and fauna is present, and a stormy surface is not present - Survival Probable5. If Temperature is between 273 to 373K, water is present, flora and fauna is present, and a stormy surface is present - Survival DifficultDecision TreeA decision tree has the following constituents :Root Node: The factor of temperature is considered as the root in this case.Internal Node: The nodes with one incoming edge and 2 or more outgoing edges.Leaf Node: This is the terminal node with no out-going edge.As the decision tree is now constructed, starting from the root-node we check the test condition and assign the control to one of the outgoing edges, and so the condition is again tested and a node is assigned. The decision tree is said to be complete when all the test conditions lead to a leaf node. The leaf n ode contains the class-labels, which vote in favor or against the decision.Now, you might think why did we start with the temperature attribute at the root? If you choose any other attribute, the decision tree constructed will be different.Correct. For a particular set of attributes, there can be numerous different trees created. We need to choose the optimal tree which is done by following an algorithmic approach. We will now see the greedy approach to create a perfect decision tree.The Greedy ApproachGreedy Approach is based on the concept of Heuristic Problem Solving by making an optimal local choice at each node. By making these local optimal choices, we reach the approximate optimal solution globally.The algorithm can be summarized as :1. At each stage (node), pick out the best feature as the test condition.2. Now split the node into the possible outcomes (internal nodes).3. Repeat the above steps till all the test conditions have been exhausted into leaf nodes.When you start t o implement the algorithm, the first question is: How to pick the starting test condition?The answer to this question lies in the values of Entropy and Information Gain. Let us see what are they and how do they impact our decision tree creation.Entropy:Entropy in Decision Tree stands for homogeneity. If the data is completely homogenous, the entropy is 0, else if the data is divided (50-50%) entropy is 1.Information Gain:Information Gain is the decrease/increase in Entropy value when the node is split.An attribute should have the highest information gain to be selected for splitting.Based on the computed values of Entropy and Information Gain, we choose the best attribute at any particular step.Let us consider the following data:There can be n number of decision trees that can be formulated from these set of attributes.Tree Creation Trial-1 :Here we take up the attribute Student as the initial test condition.Tree Creation Trial-2 :Similarly, why to choose Student? We can choose Inco me as the test condition.Creating the Perfect Decision Tree With Greedy ApproachLet us follow the Greedy Approach and construct the optimal decision tree.There are two classes involved: Yes i.e. whether the person buys a computer or No i.e. he does not. To calculate Entropy and Information Gain, we are computing the value of Probability for each of these 2 classes.Positive: For buys_computer=yes probability will come out to be :Negative: For buys_computer=no probability comes out to be :Entropy in D: We now put calculate the Entropy by putting probability values in the formula stated above.We have already classified the values of Entropy, which are:Entropy =0: Data is completely homogenous (pure)Entropy =1: Data is divided into 50- 50 % (impure)Our value of Entropy is 0.940, which means our set is almost impure.Lets delve deep, to find out the suitable attribute and calculate the Information Gain.What is information gain if we split on Age?This data represents how many people fallin g into a specific age bracket, buy and do not buy the product.For example, for people with Age 30 or less, 2 people buy (Yes) and 3 people do not buy (No) the product, the Info (D) is calculated for these 3 categories of people, that is represented in the last column.The Info (D) for the age attribute is computed by the total of these 3 ranges of age values. Now, the question is what is the information gain if we split on Age attribute.The difference of the total Information value (0.940) and the information computed for age attribute (0.694) gives the information gain.This is the deciding factor for whether we should split at Age or any other attribute. Similarly, we calculate the information gain for the rest of the attributes:Information Gain (Age) =0.246Information Gain (Income) =0.029Information Gain (Student) = 0.151Information Gain (credit_rating) =0.048On comparing these values of gain for all the attributes, we find out that the information gain for Age is the highest. Thus , splitting at age is a good decision.Similarly, at each split, we compare the information gain to find out whether that attribute should be chosen for split or not.Thus, the optimal tree created looks like :The classification rules for this tree can be jotted down as:If a persons age is less than 30 and he is not a student, he will not buy the product.Age(30) ^ student(no) = NOIf a persons age is less than 30 and he is a student, he will buy the product. Age(30) ^ student(yes) = YESIf a persons age is between 31 and 40, he is most likely to buy.Age(3140) = YESIf a persons age is greater than 40 and has an excellent credit rating, he will not buy.Age(40) ^ credit_rating(excellent) = NOIf a persons age is greater than 40, with a fair credit rating, he will probably buy. Age(40) ^ credit_rating(fair) = YesThus, we achieve the perfect Decision Tree!!Now that you have gone throughour Decision Tree blog, you can check out EdurekasData Science Certification Training.Got a question for us? Please mention it in the comments section and we will get back to you.Recommended videos for you Machine Learning With Python Python Machine Learning Tutorial Watch Now Python Classes Python Programming Tutorial Watch Now Linear Regression With R Watch Now Sentiment Analysis In Retail Domain Watch Now The Whys and Hows of Predictive Modeling-II Watch Now Introduction to Business Analytics with R Watch Now Business Analytics with R Watch Now Business Analytics Decision Tree in R Watch Now Android Development : Using Android 5.0 Lollipop Watch Now Web Scraping And Analytics With Python Watch Now Application of Clustering in Data Science Using Real-Time Examples Watch Now Python List, Tuple, String, Set And Dictonary Python Sequences Watch Now Diversity Of Python Programming Watch Now Know The Science Behind Product Recommendation With R Programming Watch Now Python Programming Learn Python Programming From Scratch Watch Now The Whys and Hows of Predictive Modelling-I Watch Now 3 Scenarios Where Predictive Analytics is a Must Watch Now Python Tutorial All You Need To Know In Python Programming Watch Now Python Numpy Tutorial Arrays In Python Watch Now Python Loops While, For and Nested Loops in Python Programming Watch NowRecommended blogs for you Top 10 Applications of Machine Learning : Machine Learning Applications in Daily Life Read Article How to Implement Decorators in Python? Read Article How To Implement Find-S Algorithm In Machine Learning? Read Article Top 65 Data Analyst Interview Questions You Must Prepare In 2020 Read Article Big Data Analytics: BigQuery, Impala, and Drill Read Article SAS Tutorial: All You Need To Know About SAS Read Article Python Constructors: Everything You Need To Know Read Article What is Data Analytics? Introduction to Data Analysis Read Article How To Best Utilize Python CGI In Day To Day Coding? Read Article FIFA World Cup 2018 Best XI: Analyzing Fifa Dataset Using Python Read Article Top 10 Data Science Applications Read Article Why Learn R? Read Article How To Implement 2-D arrays in Python? Read Article What is KeyError in Python? Dictionary and Handling Them Read Article How to Implement Power Function in Python Read Article How To Best Utilize Count Function In Python? Read Article Introduction to Atom Python Text Editor and how to configure it Read Article All you Need to Know About File Handling in Python Read Article Top Python Projects You Should Consider Learning Read Article Data Science vs Machine Learning Whats The Difference? Read Article Comments 2 Comments Trending Courses in Data Science Python Certification Training for Data Scienc ...66k Enrolled LearnersWeekend/WeekdayLive Class Reviews 5 (26200)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.