All Courses

What Are Ensemble Learning..

Sulochana Kamshetty

3 years ago

Table of Content
  • Ensembles
  • Advantages
  • Disadvantages
  • What is Stacking?
  • Radial Basis Functions Stacking
  • Description about Radial basis functions
  • Bagging
  • Buidling a Meta Classifier
           The other day, we were working for the project from insideAIML. There we faced an issue finding out, which is the best model to be consider overall, which model gave the better output, and lot more confusion things. To overcome these circumstances we found a solution out of which ensembles gave the satisfies results according to the requirement.
Let’s cover important points of ensembles content,
  • What are ensembles?
  • Important points in ensembles
  •  coding used to work with ensembles        
  • Final output and understanding about ensembles



Combine multiple classifiers.


Split data and create multiple classifiers on different training data.


Boost the ability of a classifier to learn specific samples in the dataset.

Build Ensemble Classifiers


  • This method helps in improving predictive performance,
  • Which also improve the predictive performance,
  • Improve other types of classifiers, automatically.
  • Easy to implement.
  • Require not much parameter tuning.


The combined classifier is not so transparent (black box),Not a compact representation.

What is stacking?

Stacking is an ensemble learning technique, that combines multiple classification or regression models. via a meta-classifier, or a meta-regressor. The base level models are trained based on a complete training set, then the meta-model is trained on the outputs of the base level model as features.
The base level often consists of different learning algorithms and therefore stacking ensembles are often heterogeneous.
Better idea to implement for this stacking, is Use simple classifiers at Combination stage.

Example: Logistic regression, voting, mean

Radial Basis Functions Stacking

       Networks Radial functions are simply a class of functions. In principle they could be employed in any sort of model,“ linear or nonlinear”, and any sort of network either “single /multi layer” However since “Broomhead ”and “Lowes” seminal paper in 1988had discovered, radial basis function networks. RBF networks have traditionally been associated with radial functions in a single layer network.

Description about Radial basis functions (CRAN: RSNNS) above pictures…

Required steps to be follow from the above picture.
  • Choice of Kernel function
  • K-means cluster centers — Gaussian:
  • This method is also called as Gaussian method.
  • Training output weights 𝑤𝑗𝑘 — Back-propagation

RBF References 013/08/15/radial-basis-function-networkrbfn-tutorial s/pr/pr_l19.pdf


Bagging is good for unstable learners as it reduces variance and overfitting — How do I generate a large number of unstable learners
  • choose records randomly
  • Choose attributes randomly


  • Logistic regression, Linear regression fail with bootstrap of records —but ok for bootstrap of features
  • Rules, decision trees — should work wonders with selection of records or features These apply a biased model to data These are unbiased but can select best attribute.
  • Training o Given a dataset S, at each iteration i, a training set Si is sampled with replacement from S o Si may contain only a few of the features using in So A classifier Ci is learned for each Si


  • Given an unseen sample X, o Each classifier Ci returns its class prediction, The bagged classifier H counts the votes and assigns the class with the most votes to X.


Can be applied to the prediction of continuous values by taking the average value of each prediction.

Random forests

  • Select a large number of data sets through bagging (same number of samples in all sets).
  • Use m random input variables at each node of each tree. m should be much less than M (total attributes).
  • Each tree is fully grown and not pruned
  • Mode (for classification) or average for regression of all the trees is used as prediction.
  • The Random Forests Algorithm Given a training set S For i = 1 to k do: Build subset Si by sampling with replacement from S Learn tree Ti from Si At each node: Choose best split from random subset of F features Each tree grows to the largest extent, and no pruning Make predictions according to majority vote of the set of k trees
Random Forests
  • It is one of the most accurate learning algorithms available
  • Accuracy would be best when the trees are least correlated and each one is strong
Q: How to do attribute selection? — You built 200 trees in a random forest, and 180 trees selected CCAvg as feature at first node; 130 trees did not use Family in any of the nodes. — What if you counted the level at which a feature was used?

Ensemble Methods

  • Random forests (also true for many machine learning algorithms) is an example of a tool that is useful in doing analyses of scientific data.
  • But the cleverest algorithms are no substitute for human intelligence and knowledge of the data in the problem.
  • Take the output of random forests not as absolute truth, but as smart computer generated guesses that may be helpful in leading to a deeper understanding of the problem.
The best place for students to learn Applied Engineering 22

Gradient Boosting:

  • Given Dataset S[x,y]:
  • Build model F(x) to: A. Optimize an error measure or a loss function: e.g. sum of squared errors
  •  Compute residual error hi = F(xi ) — yi :
  •  Build dataset S1 [x, h]
  •  Repeat steps 1–3 using dataset Sn
  • Classifier is a combination of S till S


It describes about How do classifiers learn?

Decision Trees:

Entropy computed by number of times an attribute leads to target variable

Neural network

Which i s mostly used for Backpropagation using outputi -truei


Optimization problem for max 𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗 𝑥𝑖 . 𝑥𝑗 Can we boost or enhance the learning of some samples? Count some samples multiple times Change to 𝐼𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑐𝑒𝑖 𝑜𝑢𝑡𝑝𝑢𝑡𝑖 − 𝑡𝑟𝑢𝑒𝑖 Add constraints to some ofthe 𝛼1


Is weak learning related to strong learning?
• If the weak learner gives accuracy on all possible distributions, yes.
  • Put together an algorithm — Do a classifier — Learn the second classifier that works better on instances where the first one failed — Third one where both failed
  • General Ensembles Technique: Randomization
  • Can randomize learning algorithms instead of inputs
  • Some algorithms already have random component: e.g. random initialization
  • Most algorithms can be randomized o Pick from the N best options at random instead of always picking the best one o Split rule in decision tree
  • Random projection in kNN (Freund and Dasgupta 08).
  • Simple Ensembles: Mixture of experts.
  • Use multiple learners.
  • Use a control switch which applies the suitable learner, for each region of the data.
  • Control is done using expectation maximization.

coding with ensembles in python

Concepts to Learn.

lets start some coding part, which requires similar preprocessing steps. for these above methods. but here are just covering important steps one has minimum things to do.. so not necessary to be in same format but can use this as the standard codes.

step 1:- importing the important libraries

import os
import pandas as pd

Check the dimensions and type

1. Check the dimensions and type
2.Print Columns names and check the datatypes of columns(dtypes)
3.Check the missing values
4.Check the Frequency of Target Varaible(value_counts())
5.Drop Id column
6.Do necessary type conversions such as numeric to category(astype(‘category’))
7.Check the summary of dataframe(describe())
8.Convert Categorical to dummies(pd.get_dummies)
9.SPLIT THE data in to train and test(use sklearn package)
10.Standardize the data (numerical attributes only)( import StandardScaler)

Now It is time for Model Building.


Import the below following libraries, to work with bagging which also contained some of the important parameters to work with.
clf = BaggingClassifier(n_estimators=10), y=y_train)
BaggingClassifier(base_estimator=None, bootstrap=True, bootstrap_features=False,
max_features=1.0, max_samples=1.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=None, verbose=0,warm_start=False)

GridSearch with cross validation:

from sklearn.tree import DecisionTreeClassifier
param_grid = {
'base_estimator__max_depth' : [1, 2, 3, 4, 5],'max_samples' : [0.05, 0.1, 0.2, 0.5]
clf = GridSearchCV(BaggingClassifier(DecisionTreeClassifier(),
n_estimators = 100, max_features = 0.5),param_grid,scoring='accuracy')
%time, y_train)
 FutureWarning: The default value of cv will change from 3 to 5 in version 0.22.
 Specify it explicitly to silence this warning.warnings.warn(CV_WARNING,
 FutureWarning)CPU times: user 30.2 s, sys: 588 ms, 
total: 30.8 sWall time: 33.1 s


from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
import numpy as npimport statistics as statmodel1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()

Buidling a Meta Classifier

Hope this deep information of Ensembles gave enough knowledge about this content to acquire ideas for more such related topics follow
Happy Learning...:-)

Submit Review