Python - Building URLs

Neha Kumawat

5 months ago

Python - Building URLs | insideAIML
Table of Contents
  • Build_URL
  • Split the URLS
  • URL Quoting
           The requests module can help us build the URLs and manipulate the URL value dynamically. Any sub-directory of the URL can be fetched programmatically and then some parts of it can be substituted with new values to build new URLs.

Build_URL

          The below example uses urljoin to fetch the different subfolders in the URL path. The urljoin method is used to add new values to the base URL.
from requests.compat import urljoin
base='https://stackoverflow.com/questions/3764291'
print(urljoin(base,'.'))
print(urljoin(base,'..'))
print(urljoin(base,'...'))
print(urljoin(base,'/3764299/'))
url_query = urljoin(base,'?vers=1.0')
print(url_query)
url_sec = urljoin(url_query,'#section-5.4')
print(url_sec)
When we run the above program, we get the following output −
https://stackoverflow.com/questions/
https://stackoverflow.com/
https://stackoverflow.com/questions/...
https://stackoverflow.com/3764299/
https://stackoverflow.com/questions/3764291?vers=1.0
https://stackoverflow.com/questions/3764291?vers=1.0#section-5.4

Split the URLS

           The URLs can also be split into many parts beyond the main address. The additional parameters which are used for a specific query or tags attached to the URL are separated by using the urlparse method as shown below.
Parse a URL into 5 components:
<scheme>://<netloc>/<path>?<query>#<fragment>
Return a 5-tuple: (scheme, netloc, path, query, fragment).
Note that we don't break the components up in smaller bits
Example 1:
from requests.compat import urlparse
url1 = 'https://docs.python.org/2/py-modindex.html#cap-f'
url2='https://docs.python.org/2/search.html?q=urlparse'
print(urlparse(url1))
print(urlparse(url2))
When we run the above program, we get the following output −

ParseResult(scheme='https', netloc='docs.python.org', path='/2/py-modindex.html', params='', query='', fragment='cap-f')
ParseResult(scheme='https', netloc='docs.python.org', path='/2/search.html', params='', query='q=urlparse', fragment='')
Example 2:
import urlparse
print urlparse.urlsplit("https://www.insideaiml.com/blog/")
Output
SplitResult(scheme='https', netloc='/www.insideaiml.com', path='/blog', query='', fragment='')

URL Quoting

          Each part of a URL, e.g. the path info, the query, etc., has a different set of reserved characters that must be quoted.
RFC 3986 Uniform Resource Identifiers (URI): Generic Syntax lists the following reserved characters.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," | "~"
Each of these characters is reserved in some component of a URL, but not necessarily in all of them.
Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings. Now, "~" is included in the set of reserved characters.
By default, the quote function is intended for quoting the path section of a URL. Thus, it will not encode '/'. This character is reserved, but in typical usage the quote function is being called on a path where the existing slash characters are used as reserved characters.
string and safe may be either str or bytes objects. encoding and errors must not be specified if string is a bytes object.
The optional encoding and errors parameters specify how to deal with non-ASCII characters, as accepted by the str.encode method. By default, encoding='utf-8' (characters are encoded with UTF-8), and
errors='strict' (unsupported characters raise a UnicodeEncodeError).
Example: 
quote('/El Niño/') yields '/El%20Ni%C3%B1o/'
To learn more about python, visit the InsideAIML page.
I hope you enjoyed reading this article and finally, you came to know about Python - Building URLs.
For more such blogs/courses on data science, machine learning, artificial intelligence and emerging new technologies do visit us at InsideAIML.
Thanks for reading…
Happy Learning…

Submit Review