Beautiful Soup - Beautiful Objects

David Bane

a year ago

The starting point of any BeautifulSoup project, is the BeautifulSoup object. A BeautifulSoup object represents the input HTML/XML document used for its creation.
We can either pass a string or a file-like object for Beautiful Soup, where files (objects) are either locally stored in our machine or a web page.
The most common BeautifulSoup Objects are −
  • Tag
  • NavigableString
  • BeautifulSoup
  • Comment
  • NavigableString
  • BeautifulSoup
  • Comment
  • BeautifulSoup
  • Comment
  • Comment

Comparing objects for equality

As per the beautiful soup, two navigable string or tag objects are equal if they represent the same HTML/XML markup.
Now let us see the below example, where the two tags are treated as equal, even though they live in different parts of the object tree, because they both look like “Java”.
markup = "Learn Python and Java and advanced Java! from Tutorialspoint"
soup = BeautifulSoup(markup, "html.parser")
first_b, second_b = soup.find_all('b')
print(first_b == second_b)
print(first_b.previous_element == second_b.previous_element)
However, to check if the two variables refer to the same objects, you can use the following−

Copying Beautiful Soup objects

To create a copy of any tag or NavigableString, use copy.copy() function, just like below −
import copy
p_copy = copy.copy(soup.p)
Learn Python and Java and advanced Java! from Tutorialspoint
Although the two copies (original and copied one) contain the same markup however, the two do not represent the same object −
The only real difference is that the copy is completely detached from the original Beautiful Soup object tree, just as if extract() had been called on it.
Above behavior is due to two different tag objects which cannot occupy the same space at the same time.

Submit Review