[python]Threading

This is a revision for myself on threading.
To start threading in python a package threading has to be imported.

urls = {
    "google": "https://google.com",
    "youtube": "https://youtube.com",
    "twitter": "https://twitter.com",
    "facebook": "https://facebook.com",
    "reddit": "https://reddit.com",
    "linkedin": "https://linkedin.com",
    "stackoverflow": "https://stackoverflow.com",
    "python": "https://www.python.org",
    "distrowatch": "https://distrowatch.org"
}

The dictionary for a list of websites, the script is to save the body of the website into a file.

For each key in urls, a thread object is created. The thread object takes in these argument target is the function to call, name is the name of the thread this argument is optional, and args accepts a tuple of the arguments required for the target

After the thread object is created the thread is then appended to a thread list, then start the thread.

After the iteration finished, finished threads will be join together. The below waits for all threads to finish.

    for thread in threads:
        thread.join()

Before threading…
Without threading the for loop is run sequentially one after another.
t1.PNG
The time taken to finish the task was about 9 seconds.

import requests
from time import time


urls = {
    "google": "https://google.com",
    "youtube": "https://youtube.com",
    "twitter": "https://twitter.com",
    "facebook": "https://facebook.com",
    "reddit": "https://reddit.com",
    "linkedin": "https://linkedin.com",
    "stackoverflow": "https://stackoverflow.com",
    "python": "https://www.python.org",
    "distrowatch": "https://distrowatch.org"
}


def get_http_data(filename, url):
    resp = requests.get(url)
    with open(filename, "wb") as file:
        file.write(resp.content)


if __name__ == '__main__':
    start = time()
    for key in urls.keys():
        print("Saving {}...".format(key))
        get_http_data(key, urls[key])
        print("{} saved...".format(key))
    print("End time:{}".format(time() - start))

After threading is used…
After threading is used, the time taken to finish downloading and saving the contents became 3 seconds.
t2

import requests
import threading
from time import time


urls = {
    "google": "https://google.com",
    "youtube": "https://youtube.com",
    "twitter": "https://twitter.com",
    "facebook": "https://facebook.com",
    "reddit": "https://reddit.com",
    "linkedin": "https://linkedin.com",
    "stackoverflow": "https://stackoverflow.com",
    "python": "https://www.python.org",
    "distrowatch": "https://distrowatch.org"
}

threads = list()


def get_http_data(filename, url):
    resp = requests.get(url)
    with open(filename, "wb") as file:
        file.write(resp.content)


if __name__ == '__main__':
    start = time()
    for key in urls.keys():
        t = threading.Thread(target=get_http_data, name=key, args=(key, urls[key]))
        threads.append(t)
        print("Start thread {} saving {}".format(threading.current_thread().name, key))
        t.start()

    for thread in threads:
        thread.join()
        print("Thread {} finished...".format(thread.name))

    print("Finished all threads at {} seconds...".format(time() - start))
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s