All Courses

Beautiful Soup - Beautiful Objects

David Bane

2 years ago

Beautiful Soup - Beautiful Objects | insideAIML
Table of Contents
  • Introduction
  • Comparing objects for equality
  • Copying Beautiful Soup objects

Introduction

          The starting point of any BeautifulSoup project, is the BeautifulSoup object. A BeautifulSoup object represents the input HTML/XML document used for its creation.
We can either pass a string or a file-like object for Beautiful Soup, where files (objects) are either locally stored in our machine or a web page.
The most common BeautifulSoup Objects are −
  • Tag
  • NavigableString
  • BeautifulSoup
  • Comment

Comparing objects for equality

          As per the beautiful soup, two navigable string or tag objects are equal if they represent the same HTML/XML markup.
Now let us see the below example, where the two tags are treated as equal, even though they live in different parts of the object tree, because they both look like “Java”.
markup = "Learn Python and Java and advanced Java! from Tutorialspoint"
soup = BeautifulSoup(markup, "html.parser")
first_b, second_b = soup.find_all('b')
print(first_b == second_b)
True
print(first_b.previous_element == second_b.previous_element)
False
However, to check if the two variables refer to the same objects, you can use the following−

Copying Beautiful Soup objects

          To create a copy of any tag or NavigableString, use copy.copy() function, just like below −
import copy
p_copy = copy.copy(soup.p)
print(p_copy)
Learn Python and Java and advanced Java! from Tutorialspoint
Although the two copies (original and copied one) contain the same markup however, the two do not represent the same object −
The only real difference is that the copy is completely detached from the original Beautiful Soup object tree, just as if extract() had been called on it.
Above behavior is due to two different tag objects which cannot occupy the same space at the same time.
  
Like the Blog, then Share it with your friends and colleagues to make this AI community stronger. 
To learn more about nuances of Artificial Intelligence, Python Programming, Deep Learning, Data Science and Machine Learning, visit our insideAIML blog page.
Keep Learning. Keep Growing. 

Submit Review