Data Wrangling in Python

Shashank Shanu

9 months ago

Data Wrangling in Python | Insideaiml
Data Wrangling in Python | Insideaiml
Data wrangling is a process to manipulate the data in various formats. It involves processes such as merging, grouping, concatenating etc. Data wrangling is required for analyzing or getting them ready to be used with other data. It is an important part of any data science projects and many data scientists spent much of their time in it.
Python provides us some good built-in features to apply these wrangling methods to various data sets to achieve the analytical goal as per requirements.
How to Merge different dataset?
Python, panda’s library provides us a function, merge, which help us to join different datasets according to our requirements. Its works similar to the join operations between the DataFrame as standard databases.
pd.merge(left, right, how='inner', on=none, left_on=none, right_on=none,
left_index=false, right_index=false, sort=true)
Let’s take an example to get better understanding
# import the pandas library

import pandas as pd

left_df = pd.DataFrame({


         'Name': ['Alex', 'Amy', 'Allen',
'Alice', 'Ayoung'],


right_df = pd.DataFrame(


         'Name': ['Billy', 'Brian', 'Bran',
'Bryce', 'Betty'],



   S.No.    Name  subjects

0      1    Alex     Maths

1      2     Amy   Physics

2      3   Allen Chemistry

3      4   Alice   Biology

4      5  Ayoung     Civics

   S.No.   Name  subjects

0      1  Billy   Physics

1      2  Brian Chemistry

2      3   Bran   History

3      4  Bryce   Biology

4      5  Betty     Civics
How to Group Data in a Dataframe?
While perform Exploratory data analysis many times we need to group our dataframe where we need the result in terms of various groups present in the data set. Panadas provides us with in-built methods which can roll the data into various groups.
Let’s take an example
Below I have shown you how we can group the data by year and then get the result for a specific year.
# import the pandas library

import pandas as pd

ipl = {'Team name': ['Riders', 'Riders', 'Devils',
'Devils', 'Kings',

'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],

[1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],



df1 = pd.DataFrame(ipl)

grouped = df.groupby('Year')

     Team Rank  Year  Points

0  Riders    1  2014     876

2  Devils    2  2014     863

4   Kings    3  2014     741

9  Royals    4  2014     701
How to Concatenate different Dataframe?
Many times, we also need to concatenate different dataframe for our task to accomplished. Python Pandas package also provides various facilities for easily combining together Series, DataFrame, and Panel objects.
Let’s take an example
pandas as pd

= pd.DataFrame({

         'Name': ['Alex', 'Amy', 'Allen',
'Alice', 'Ayoung'],




= pd.DataFrame({

         'Name': ['Billy', 'Brian', 'Bran',
'Bryce', 'Betty'],




     Name subject_id  Marks_scored

1    Alex       sub1            98

2     Amy       sub2            90

3   Allen       sub4            87

4   Alice       sub6            69

5  Ayoung       sub5            78

1   Billy       sub2            89

2   Brian       sub4     

3    Bran       sub3            79

4   Bryce       sub6            97

5   Betty       sub5            88
I hope you enjoyed reading this article and finally, you came to know about Data Wrangling in Python.
For more such blogs/courses on data science, machine learning, artificial intelligence and emerging new technologies do visit us at InsideAIML.
Thanks for reading…
Happy Learning…

Submit Review