Python - Processing JSON Data

Neha Kumawat

5 months ago

Python - Processing JSON Data | insideAIML
Table of Contents
  • Introduction
  • Input Data
  • Read the JSON File
               Importing Pandas library
               Pandas read_json()
               Convert the object to a JSON string using dataframe.to_json

Introduction

          JSON (JavaScript Object Notation) is the most widely used data format for data exchange on the Web. This data exchange can take place between two computer applications located in different geographical locations or executed on the same machine. The good thing is that JSON is human readable as well as machine readable. So while applications / libraries can parse JSON documents, humans can also examine data and make sense of it.
The JSON dataset analysis using Pandas is much more convenient.Pandas allows you to convert a list of lists to a datafame and specify the names of the columns separately.

Input Data

            Create a JSON file by copying the following data into a text editor such as Notepad. Save the file with the .json extension and choose the file type as All Files (*. *).
input.json
{
	"employees": [
		{
			"id": 1,
			"name": "Amit",
			"dept": "IT"
		},
		{
			"id": 2,
			"name": "Asif",
			"dept": "HR"
		},
		{
			"id": 3,
			"name": "Rohit",
			"dept": "IT"
		}
	]
}

Read the JSON File

Importing Pandas library

JSON manipulation is done using the Python data analysis library, called pandas.
import pandas as pd

Pandas read_json()

         To read a JSON file via Pandas, we'll use the read_json () method and pass it the path of the file we want to read. The method returns a Pandas DataFrame that stores the data as columns and rows.  
Syntax
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, convert_axes=None,
 convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None,
 encoding=None, encoding_errors='strict', lines=False, chunksize=None, compression='infer',
 nrows=None, storage_options=None)
Parameters
  • path_or_buf : a valid JSON str, path object or file-like object
  • orient : str
  • typ : {‘frame’, ‘series’}, default ‘frame’
  • dtype : bool or dict, default None
  • convert_axes : bool, default None
  • convert_dates : bool or list of str, default True
  • keep_default_dates : bool, default True
  • numpy : bool, default False
  • precise_float : bool, default False
  • date_unit : str, default None
  • encoding : str, default is ‘utf-8’
  • encoding_errors : str, optional, default “strict”
  • lines : bool, default False
  • chunksize : int, optional
  • compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’
  • nrows : int, optional
  • storage_options : dict, optional
Returns
  • Series or DataFrame
Example

import pandas as pd

data = pd.read_json('input.json')
print (data)
Output
                                  employees
0   {'id': 1, 'name': 'Amit', 'dept': 'IT'}
1   {'id': 2, 'name': 'Asif', 'dept': 'HR'}
2   {'id': 3, 'name': 'Rohit', 'dept': 'IT'}

Convert the object to a JSON string using dataframe.to_json

Syntax
DataFrame.to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10,
 force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', 
index=True, indent=None, storage_options=None)
Parameters
  • path_or_buf : str or file handle, optional
  • orient : str
  • date_format : {None, ‘epoch’, ‘iso’}
  • double_precision : int, default 10
  • force_ascii : bool, default True
  • date_unit : str, default ‘ms’ (milliseconds)
  • default_handler : callable, default None
  • lines : bool, default False
  • compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}
  • index : bool, default True
  • indent : int, optional
  • storage_options : dict, optional 
Returns
  • None or str
Example
import pandas as pd
import json
df = pd.DataFrame(
    [["x", "y"], ["w", "z"]],
    index=["row_1", "row_2"],
    columns=["col_1", "col_2"],
)
result = df.to_json(orient="split")
parsed = json.loads(result)
print(json.dumps(parsed, indent=3) )
Output
{
   "columns": [
      "col_1",
      "col_2"
   ],
   "index": [
      "row_1",
      "row_2"
   ],
   "data": [
      [
         "x",
         "y"
      ],
      [
         "w",
         "z"
      ]
   ]
}
Like the Blog, then Share it with your friends and colleagues to make this AI community stronger. 
To learn more about nuances of Artificial Intelligence, Python Programming, Deep Learning, Data Science and Machine Learning, visit our insideAIML blog page.
Keep Learning. Keep Growing.

Submit Review