All Courses

Reindexing in Python Pandas

Shashank Shanu

a year ago

Table of Content.
  • Reindexing
  • Example
  • How to Reindex to Align with Other Objects?
  • How to Fill values while ReIndexing?
  • How to Limit on Filling values while Reindexing?
  • How to Rename in Python?
       Reindexing is used to change the row labels and column labels of a Data Frame.
It means to conform the data to match a given set of labels along a particular axis.
It helps us to perform Multiple operations through indexing like –
  • To insert a missing value (NaN) markers in label locations where no data for the label existed before.
  • To reorder the existing data to match a new set of labels.

Example

import pandas as pd
import numpy as np
N=20
data = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})
#reindexing the DataFrame
data_reindexed = data.reindex(index=[0,2,5], columns=['A', 'C', 'B'])
print(data_reindexed)
Output:
           A     C   B
0 2016-01-01  High NaN
2 2016-01-03   Low NaN
5 2016-01-06  High NaN

How to Reindex to Align with Other Objects?

Lets us consider if you we want to take an object and reindex its axes and labeled the same as another object.
Take an example to get better understanding

Example:

import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.randn(10,3),columns=['column1','column2','column3'])
data2 = pd.DataFrame(np.random.randn(7,3),columns=['column1','column2','column3'])
data1 = data1.reindex_like(data2)
print(data1)
Output:

    column1   column2   column3
0  0.271240  0.201199 -0.151743
1 -0.269379  0.262300  0.019942
2  0.685737 -0.233194 -0.652832
3 -1.416394 -0.587026  1.065789
4 -0.590154 -2.194137  0.707365
5  0.393549  1.801881 -2.529611
6  0.062660 -0.996452 -0.029740
Note − Here, the data1 DataFrame is altered and reindexed like data2. If the column names do not should be matched NaN will be added for the entire column label.

How to Fill values while ReIndexing?

We can also fill the missing value while we are reindexing the dataset.
Pandas reindex() method takes an optional parameter which helps to fill the values. The parameters are as follows-
  • pad/ffill – It fills values in forward direction.
  • bfill/backfill – It fills the values backward direction.
  • nearest – It fills the values from the nearest index values.

Example

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
# Padding NAN's
print(df2.reindex_like(df1))
# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill:")
print (df2.reindex_like(df1,method='ffill'))
Output
       col1      col2      col3
0 -1.046918  0.608691  1.081329
1 -0.396384 -0.176895 -1.896393
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill:
       col1      col2      col3
0 -1.046918  0.608691  1.081329
1 -0.396384 -0.176895 -1.896393
2 -0.396384 -0.176895 -1.896393
3 -0.396384 -0.176895 -1.896393
4 -0.396384 -0.176895 -1.896393
5 -0.396384 -0.176895 -1.896393
Note – In the above example the last four rows are padded.

How to Limit on Filling values while Reindexing?

Reindex() function also takes a parameter “limit” which is used to a maximum count of the consecutive matches.
Let’s understand with an example-
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
# Padding NAN's
print(df2.reindex_like(df1))
# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
print(df2.reindex_like(df1,method='ffill',limit=1))
Output
       col1      col2      col3
0  0.824697  0.122557 -0.156242
1  0.528174 -1.140847 -1.158778
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill limiting to 1:
       col1      col2      col3
0  0.824697  0.122557 -0.156242
1  0.528174 -1.140847 -1.158778
2  0.528174 -1.140847 -1.158778
3       NaN       NaN       NaN
4       NaN       NaN       NaN
Note – In the above we can observe that only the 7th row is filled by the preceding 6th row. Then, the rows are left as they are.

How to Rename in Python?

        Python provides a rename() method which allows us to relabel an axis based on the same mapping (a dict or a Series) or an arbitrary function.
Let’s take an example to understand
import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print(data1)
print ("After renaming the rows and columns:")
print(data1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},
index = {0 : 'apple', 1 : 'banana', 2 : 'mango'}))
Output
       col1      col2      col3
0  0.047170  0.378306 -1.198150
1  1.183208 -2.195630 -0.798192
2  0.256581  0.627994 -0.674260
3  0.240853  1.677340  1.497613
4  0.820688  0.920151 -1.431485
5 -0.010474 -0.228373 -0.392640
After renaming the rows and columns:
              c1        c2      col3
apple   0.047170  0.378306 -1.198150
banana  1.183208 -2.195630 -0.798192
mango   0.256581  0.627994 -0.674260
3       0.240853  1.677340  1.497613
4       0.820688  0.920151 -1.431485
5      -0.010474 -0.228373 -0.392640
This rename() method provides an inplace named parameter, which by default is false and copies the underlying data. Pass inplace=true to rename the data in place.
 I hope you enjoyed reading this article and finally, you came to know about Reindexing in Python Pandas.
For more such blogs/courses on data science, machine learning, artificial intelligence and emerging new technologies do visit us at InsideAIML.
Thanks for reading…
Happy Learning…

Submit Review