#### World's Best AI Learning Platform with profoundly Demanding Certification Programs

Designed by IITian's, only for AI Learners.

Designed by IITian's, only for AI Learners.

New to InsideAIML? Create an account

Employer? Create an account

Download our e-book of Introduction To Python

How to leave/exit/deactivate a Python virtualenvironment Exception Type: JSONDecodeError at /update/ Exception Value: Expecting value: line 1 column 1 (char 0) How to split numpy array with mask? How can i change terminal path in vscode? Create a dataframe with number of column and and column names according to the user How to Delete Python Object? Automatically run %matplotlib inline in IPython Notebook Explain Scopes in Python? Join Discussion

4.5 (1,292 Ratings)

547 Learners

Oct 13th (7:00 PM) 431 Registered

Nimish Khurana

10 months ago

- Percent_change

- Covariance

- Cov Series

- Correlation

- Data Ranking

Pandas is a well-known and widely used Python library for data
manipulation and analysis. It provides numerous methods and functions that
expedite data analysis and preprocessing steps. On top of that, pandas also provide
statistical functions that can be used to further understand the data.

Statistical
methods help in understanding and analyzing the behavior of data.

In
this article we will try to learn about a few statistical functions, which are
commonly used on Pandas objects.

Percent_change can be used with Series, DataFrames, and Panel. It
compares every element with its prior element and computes the change
percentage.

```
import pandas as pd
import numpy as np
s = pd.Series([1,2,3,4,5,4])
print s.pct_change()
df = pd.DataFrame(np.random.randn(5, 2))
print df.pct_change()
```

Its the **output **is as follows −

```
0 NaN
1 1.000000
2 0.500000
3 0.333333
4 0.250000
5 -0.200000
dtype: float64
0 1
0 NaN NaN
1 -15.151902 0.174730
2 -0.746374 -1.449088
3 -3.582229 -3.165836
4 15.601150 -1.860434
```

By
default, the pct_change() operates on columns; if you want to apply the same
row-wise, then use axis=1() argument.

Covariance is applied
on series data. The Series object has a method cov to compute covariance
between series objects. NA will be excluded automatically.

```
import pandas as pd
import numpy as np
s1 = pd.Series(np.random.randn(10))
s2 = pd.Series(np.random.randn(10))
print s1.cov(s2)
```

Its the **output **is as follows −

Covariance method when
applied on a DataFrame, computes cov between all the columns.

```
import pandas as pd
import numpy as np
frame = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
print frame['a'].cov(frame['b'])
print frame.cov()
```

Its the **output **is as follows −

```
-0.58312921152741437
a b c d e
a 1.780628 -0.583129 -0.185575 0.003679 -0.136558
b -0.583129 1.297011 0.136530 -0.523719 0.251064
c -0.185575 0.136530 0.915227 -0.053881 -0.058926
d 0.003679 -0.523719 -0.053881 1.521426 -0.487694
e -0.136558 0.251064 -0.058926 -0.487694 0.960761
```

Correlation
gives the linear relationship between any two arrays of values (series). There
are multiple methods to compute the correlation like Pearson(default), spearman, and Kendall.

```
import pandas as pd
import numpy as np
frame = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
print frame['a'].corr(frame['b'])
print frame.corr()
```

Its the **output **is as follows −

```
-0.383712785514
a b c d e
a 1.000000 -0.383713 -0.145368 0.002235 -0.104405
b -0.383713 1.000000 0.125311 -0.372821 0.224908
c -0.145368 0.125311 1.000000 -0.045661 -0.062840
d 0.002235 -0.372821 -0.045661 1.000000 -0.403380
e -0.104405 0.224908 -0.062840 -0.403380 1.000000
```

If
any non-numeric column is present in the DataFrame, it is excluded
automatically.

Data Ranking produces a ranking for each element in the array of elements. In case of ties, assigns the
mean rank.

```
import pandas as pd
import numpy as np
s = pd.Series(np.random.np.random.randn(5), index=list('abcde'))
s['d'] = s['b'] # so there's a tie
print s.rank()
```

Its
**output **is as follows −

```
a 1.0
b 3.5
c 2.0
d 3.5
e 5.0
dtype: float64
```

Rank
optionally takes a parameter ascending which by default is true; when false,
data is reverse-ranked, with larger values assigned a smaller rank.

Rank
supports different tie-breaking methods, specified with the method parameter −

**average**− average the rank of tied group**min**− lowest rank in the group**max**− highest rank in the group**first**− ranks assigned in the order they appear in the array

Get to learn more about Python pandas InsideAIML.

Enjoyed reading this blog? Then why not share it with others. Help us make this AI community stronger.

You can also ask direct queries related to Artificial Intelligence, Deep Learning, Data Science and Machine Learning on our live insideAIML discussion forum.

Keep Learning. Keep Growing.