Classification and Regression Trees

Introduction

CART (Classification and Regression Trees) is one of the most important tools used in modern Data Mining, Machine Learning and Predictive Analytics. Classification and Regression Trees has revolutionised the field of advanced analytics and inaugurated the current era of Data Science. CART model can quickly reveal important data relationships, automatically searches for patterns and uncover hidden structure even in highly complex data. CART can be used to generate accurate and reliable predictive models for a wide range of applications in all types of industry verticals. The most common applications include credit scoring, drug discovery, targeted marketing, fraud detection, financial market modelling, manufacturing quality control, engineering, clinical research, length of patient service, predictive maintenance, etc. CART supports high-speed deployments to predict in real-time on a larger scale. Moreover, understanding CART is essential, because it creates the foundation basis for other powerful algorithms like TreeNet Gradient Boosting, Random Forest and Multivariate Adaptive Regression Splines (MARS).

Origin of CART (Classification and Regression Trees)

Classification and Regression Tree methodology, also known as the CART was introduced in 1984 by four world-renowned statisticians (Leo Breiman, Jerome H. Friedman, Richard A. Olshen and Charles J. Stone) at Stanford University and the University of California at Berkeley. When the CART monograph was first published it revolutionised the emerging field of decision trees in advanced analytics. An entire methodology was introduced for the first time that included multiple tree-growing methods, tree pruning, methods to deal with unbalanced target classes, adapting to the cost of learning and the cost of mistakes, self-testing strategies, and cross-validation. Based on decades of machine learning and statistical research, CART provides reliable performance and accurate results.

What is CART?

CART stands for Classification and Regression Trees. CART is a robust decision-tree tool used for data mining, machine learning and predictive modelling. In order to understand the Classification and Regression Trees better, we need to first know the concept of a decision tree. The decision tree has a tree-like structure with its parent or root node at the top, and use multiple algorithms to splits it in further sub-nodes. The decision tree is known to be a commonly used technique in supervised learning. The goal is to create a model that predicts the value of a target variable based on several input variables whether it can be for categorical or continuous variables. CART also includes special provisions for handling ordered categorical data and the growing of probability trees.

CART-Basics

Decision Tree Basics

Types of Decision Trees

  • Classification Tree : Classification Tree is used to create a decision tree for a categorical response (commonly known as target) with many categorical or continuous predictors (factors). The categorical response can be in the form of binomial or multinomial (e.g. Pass/Fail, high, medium & low, etc.). It illustrates important patterns and relationships between a categorical response and important predictors within highly complicated data, without using parametric methods. Also, identify groups in the data with desirable characteristics, and to predict response values for new observations. For e.g., a credit card company can use classification tree to identify customers that will take credit card or not based on several predictors.
  • Regression Tree : Regression Tree is used to create a decision tree for a continuous response (commonly known as target) with many categorical or continuous predictors (factors). The continuous response can be in the form of a real number (e.g. piston diameter, blood pressure level, etc.). It also illustrates the important patterns and relationships between a continuous response and predictors within highly complicated data, without using parametric methods. Also, identify groups in the data with desirable characteristics, and to predict response values for new observations. For example, a pharmaceutical company can use regression tree to identify the potential predictors which are affecting the dissolution rate based on several predictors.

Classification and Regression Trees (CART) in Minitab

Minitab is a Data Analytics Software, where we can predict, visualize, analyse and harness the power of data. Dive deep into the data, forecast your business to make better decisions, reduce costs and stop mistakes before it happens. Identify the significant factors which are affecting your process and uncover the hidden relationship between variables. Business Analytics tools are also available to ease you in your toughest business problems. Minitab applications can be found in various processes like automotive, healthcare, energy, agriculture, pharmaceuticals, marketing, telecom, etc.

Now with recent update (Minitab® 19.2020.1 or higher), CART has been added in new Minitab menu ”Predictive Analytics”. Under Predictive Analytics, we have two options – Cart Classification and CART Regression. Available for all types of response – categorical or continuous.

Classification and Regression Trees (CART) in Salford Predictive Modeler

Salford Predictive Modeler is an integrated suite of Machine learning and Predictive Analytics Software. It includes various data mining techniques like classification, clustering, association and prediction. Some of the other methods are regression, survival analysis, missing value analysis, data binning and many more. SPM is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from databases and datasets of any size, complexity, or organisation. The Salford Predictive Modeler software suite includes CART, MARS, TreeNet, Random Forests, as well as powerful new automation and modelling capabilities not found elsewhere.

In both the software (Minitab and Salford Predictive Modeler), CART is the only decision-tree methodology based on the original CART code developed by world-renowned professors from Stanford University and the University of California at Berkeley. The CART methodology remains proprietary and includes enhancements from decades of experience with practical applications. Only Minitab and Salford Predictive Modeler have access to this code, which now includes enhancements co-developed by Minitab and CART’s originators.

cart-screens
      • Available for all types of response – categorical & continuous.
      • Easily handle extreme outliers and many missing values.
      • Ideal for large data sets.
      • No need to follow any assumptions as it is a non-parametric method.
      • The output is simple to understand and interpret.

We conduct various training programs – Statistical Training and Minitab Software Training. Some of the Statistical training certified courses are Predictive Analytics Masterclass, Essential Statistics For Business Analytics, SPC Masterclass, DOE Masterclass, etc. (Basic to Advanced Level). Some of the Minitab software training certified courses are Minitab Essentials, Statistical Tools for Pharmaceuticals, Statistical Quality Analysis & Factorial Designs, etc. (Basic to Advanced Level).

We also provide a wide range of Analytics Solutions like Business Analytics, Digital Process Automation, Enterprise Information Management, Enterprise Decisions Management and Business Consulting Services for Organisations to enhance their decision support systems.