Download our e-book of Introduction To Python

Top Discussion

How can I write Python code to change a date string from "mm/dd/yy hh: mm" format to "YYYY-MM-DD HH: mm" format? Which sorting technique is used by sort() and sorted() functions of python? How to use Enum in python? Can you please help me with this error? I was just selecting some random columns from the diabetes dataset of sklearn. Decision tree is a classification algo...How can it be applied to load diabetes dataset which has DV continuous Objects in Python are mutable or immutable? How can unclassified data in a dataset be effectively managed when utilizing a decision tree-based classification model in Python? How to leave/exit/deactivate a Python virtualenvironment Join Discussion

Top Courses

Webinars

More webinars

Python - Building URLs

Neha Kumawat

2 years ago

Build_URL

Split the URLS
URL Quoting

The requests module can help us build the URLs and manipulate the URL value dynamically. Any sub-directory of the URL can be fetched programmatically and then some parts of it can be substituted with new values to build new URLs.

Build_URL

The below example uses urljoin to fetch the different subfolders in the URL path. The urljoin method is used to add new values to the base URL.

from requests.compat import urljoin
base='https://stackoverflow.com/questions/3764291'
print(urljoin(base,'.'))
print(urljoin(base,'..'))
print(urljoin(base,'...'))
print(urljoin(base,'/3764299/'))
url_query = urljoin(base,'?vers=1.0')
print(url_query)
url_sec = urljoin(url_query,'#section-5.4')
print(url_sec)

When we run the above program, we get the following output −

https://stackoverflow.com/questions/
https://stackoverflow.com/
https://stackoverflow.com/questions/...
https://stackoverflow.com/3764299/
https://stackoverflow.com/questions/3764291?vers=1.0
https://stackoverflow.com/questions/3764291?vers=1.0#section-5.4

Split the URLS

The URLs can also be split into many parts beyond the main address. The additional parameters which are used for a specific query or tags attached to the URL are separated by using the urlparse method as shown below.

Parse a URL into 5 components:

Return a 5-tuple: (scheme, netloc, path, query, fragment).

Note that we don't break the components up in smaller bits

Example 1:

from requests.compat import urlparse
url1 = 'https://docs.python.org/2/py-modindex.html#cap-f'
url2='https://docs.python.org/2/search.html?q=urlparse'
print(urlparse(url1))
print(urlparse(url2))

When we run the above program, we get the following output −


ParseResult(scheme='https', netloc='docs.python.org', path='/2/py-modindex.html', params='', query='', fragment='cap-f')
ParseResult(scheme='https', netloc='docs.python.org', path='/2/search.html', params='', query='q=urlparse', fragment='')

Example 2:

import urlparse
print urlparse.urlsplit("https://www.insideaiml.com/blog/")

Output

SplitResult(scheme='https', netloc='/www.insideaiml.com', path='/blog', query='', fragment='')

URL Quoting

Each part of a URL, e.g. the path info, the query, etc., has a different set of reserved characters that must be quoted.

RFC 3986 Uniform Resource Identifiers (URI): Generic Syntax lists the following reserved characters.

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," | "~"

Each of these characters is reserved in some component of a URL, but not necessarily in all of them.

Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings. Now, "~" is included in the set of reserved characters.

By default, the quote function is intended for quoting the path section of a URL. Thus, it will not encode '/'. This character is reserved, but in typical usage the quote function is being called on a path where the existing slash characters are used as reserved characters.

string and safe may be either str or bytes objects. encoding and errors must not be specified if string is a bytes object.

The optional encoding and errors parameters specify how to deal with non-ASCII characters, as accepted by the str.encode method. By default, encoding='utf-8' (characters are encoded with UTF-8), and

errors='strict' (unsupported characters raise a UnicodeEncodeError).

Example:

quote('/El Niño/') yields '/El%20Ni%C3%B1o/'

To learn more about python, visit the InsideAIML page.

I hope you enjoyed reading this article and finally, you came to know about Python - Building URLs.

For more such blogs/courses on data science, machine learning, artificial intelligence and emerging new technologies do visit us at InsideAIML.

Thanks for reading…

Happy Learning…

Related Blog