Public

Decision Systems

Master this deck with 21 terms through effective study methods.

Generated from uploaded pdf

Created by @malax

What is a decision tree?

A decision tree is a flowchart-like structure used for decision-making and classification. It consists of nodes that represent tests on attributes, branches that represent the outcomes of those tests, and leaves that represent class labels.

How does a decision tree classify data?

A decision tree classifies data by asking a series of questions based on the features of the data. Each question leads to a branch that narrows down the possibilities until a final classification is reached at the leaf nodes.

What is the first question in the loan decision tree example?

The first question in the loan decision tree example is, 'Are you over 30?' This question separates applicants into two groups: those over 30 and those under 30.

What happens if an applicant is over 30 and has more than two kids?

If an applicant is over 30 and has more than two kids, the decision tree indicates that they will not receive the loan. This is based on the assumption that they may have a higher risk of not being able to repay.

What is the significance of leaf nodes in a decision tree?

Leaf nodes in a decision tree represent the final outcomes or class labels after all questions have been answered. They indicate the classification result based on the path taken through the tree.

What are the advantages of using decision trees?

Decision trees are easy to understand and interpret, can handle both numerical and categorical data, require little data preparation, and perform well with large datasets.

What is overfitting in the context of decision trees?

Overfitting occurs when a decision tree model becomes too complex and captures noise in the training data rather than the underlying pattern. This can lead to poor performance on unseen data.

What does it mean for a problem to be NP-Complete in relation to decision trees?

An NP-Complete problem is one that is computationally difficult to solve optimally. Learning an optimal decision tree from a dataset is NP-Complete because it involves selecting features at each stage from a combinatorial number of possibilities.

How do decision trees handle both numerical and categorical data?

Decision trees can handle both types of data by allowing questions that can be based on numerical values (like age) or categorical values (like credit score). This versatility makes them powerful for various classification tasks.

What is the role of features in a decision tree?

Features are the attributes or characteristics of the data that are used to make decisions in a decision tree. Each feature is evaluated at different nodes to guide the classification process.

What is cross-validation in the context of decision trees?

Cross-validation is a technique used to assess how well a decision tree model generalizes to an independent dataset. It involves running the model on different subsets of the data to evaluate its accuracy and misclassification rate.

Why are greedy algorithms used in decision tree learning?

Greedy algorithms are used in decision tree learning because they make locally optimal choices at each stage to build the tree. While they may not always produce the optimal overall tree, they are efficient and practical for large datasets.

What is a binary classification problem?

A binary classification problem is a type of classification task where the outcome can take one of two possible classes. Examples include determining if an email is spam or not, or if a patient has a disease.

What is the importance of the classification rule in a decision tree?

The classification rule in a decision tree is the path taken from the root to a leaf node, which defines how the data is classified based on the answers to the questions at each node.

How does a decision tree mirror human decision-making?

A decision tree mirrors human decision-making by structuring questions in a way that reflects how people naturally think through choices. This makes the model intuitive and easy to interpret.

What is the impact of data perturbation on decision trees?

Data perturbation can affect the robustness of decision trees, as slight changes in the data may lead to different tree structures. This can sometimes result in changes to predictive capability.

What is the relationship between features and outcomes in classification?

In classification, features are the input variables that describe the data, while outcomes are the labels or categories assigned to the data based on the features. The goal is to predict the outcome based on the features.

What is the difference between binary and m-ary classification?

Binary classification involves two possible outcomes, while m-ary classification involves multiple categories. For example, distinguishing between a cat and a dog is binary, while classifying animals as cat, dog, cow, or elephant is m-ary.

What does it mean for a decision tree to be an open box model?

An open box model, like a decision tree, allows users to see and understand the decision-making process. Unlike black-box models, where the internal workings are hidden, decision trees provide transparency in how decisions are made.

How does a decision tree perform with large datasets?

Decision trees perform well with large datasets due to their ability to handle a significant amount of data without requiring extensive data preparation. They can efficiently classify large volumes of information.

What is the significance of the internal nodes in a decision tree?

Internal nodes in a decision tree represent tests or questions about the features of the data. Each internal node splits the data based on the answers to these questions, guiding the classification process.