How garbage collection implemented in python?

By Albert, 2 months ago
  • Bookmark

what is the methodology behind the garbage collector of python ?

Garbage collector
1 Answer

There are two aspects to memory management and garbage collection in Python:

1. Reference counting

2. Generational garbage collection

Reference counting :

a Python object’s reference count is incremented whenever the object is referenced, and it’s decremented when an object is dereferenced. If an object’s reference count is 0, the memory for the object is deallocated.

Viewing reference counts in Python

You can use the sys module from the Python standard library to check reference counts for a particular object. There are a few ways to increase the reference count for an object, such as 

  • Assigning an object to a variable.
  • Adding an object to a data structure, such as appending to a list or adding as a property on a class instance.
  • Passing the object as an argument to a function.

Example :

import the sys module, then create a variable and check its reference count:

>>> import sys
>>> c = 'variable'     #count = 1
>>> sys.getrefcount(c) #count = 2

Notice that there are two references to our variable a. One is from creating the variable. The second is when we pass the variable a to the sys.getrefcount() function.

If you add the variable to a data structure, such as a list or a dictionary, you’ll see the reference count increase:

>>> import sys
>>> c = 'variable'   				                #count = 1
>>> x = [c] # Make a list with c as an element.	    #count = 2
>>> y = { 'key': c } # Create a dictionary with a as one of the values. #count = 3
>>> sys.getrefcount(a)  				            #count = 4

As shown above, the reference count of a increases when added to a list or a dictionary.

Generational garbage collection :

In addition to the reference counting strategy for memory management, Python also uses a method called a generational garbage collector.

Example :

>>> class MyClass(object):
>>> a = MyClass()
>>> a.obj = a
>>> del a

In the example above, we defined a new class. We then created an instance of the class and assigned the instance to be a property on itself. Finally, we deleted the instance.

By deleting the instance, it’s no longer accessible in our Python program. However, Python didn’t destroy the instance from memory. The instance doesn’t have a reference count of zero because it has a reference to itself.

We call this type of problem a reference cycle, and you can’t solve it by reference counting. This is the point of the generational garbage collector, which is accessible by the gc module in the standard library.

Using the gc module :

Import the gc module. You can then check the configured thresholds of your garbage collector with the get_threshold() method:

>>> import gc
>>> gc.get_threshold()
(700, 10, 10)

By default, Python has a threshold of 700 for the youngest generation and 10 for each of the two older generations.

You can check the number of objects in each of your generations with the get_count() method:

>>> import gc
>>> gc.get_count()
(211, 7, 1)

In this example, we have 596 objects in our youngest generation, two objects in the next generation, and one object in the oldest generation.

As you can see, Python creates a number of objects by default before you even start executing your program. You can trigger a manual garbage collection process by using the gc.collect() method:

>>> gc.get_count()
(211, 7, 1)
>>> gc.collect()
>>> gc.get_count()
(69, 0, 0)

Running a garbage collection process cleans up a huge amount of objects—577 in the first generation and three more in the older generations.

You can alter the thresholds for triggering garbage collection by using the set_threshold() method in the gc module:

>>> import gc
>>> gc.get_threshold()
(700, 10, 10)
>>> gc.set_threshold(800, 15, 15)
>>> gc.get_threshold()
(800, 15, 15)

In the example above, we increase each of our thresholds from their defaults. Increasing the threshold will reduce the frequency at which the garbage collector runs. This will be less computationally expensive in your program at the expense of keeping dead objects around longer.

Your Answer


How To Land a Job in Data Science

Jun 24th (7:00 PM) 239 Registered
More webinars

Related Discussions

Running random forest algorithm with one variable

View More