[python]Threading

This is a revision for myself on threading.
To start threading in python a package threading has to be imported.

urls = {
    "google": "https://google.com",
    "youtube": "https://youtube.com",
    "twitter": "https://twitter.com",
    "facebook": "https://facebook.com",
    "reddit": "https://reddit.com",
    "linkedin": "https://linkedin.com",
    "stackoverflow": "https://stackoverflow.com",
    "python": "https://www.python.org",
    "distrowatch": "https://distrowatch.org"
}

The dictionary for a list of websites, the script is to save the body of the website into a file.

For each key in urls, a thread object is created. The thread object takes in these argument target is the function to call, name is the name of the thread this argument is optional, and args accepts a tuple of the arguments required for the target

After the thread object is created the thread is then appended to a thread list, then start the thread.

After the iteration finished, finished threads will be join together. The below waits for all threads to finish.

    for thread in threads:
        thread.join()

Before threading…
Without threading the for loop is run sequentially one after another.

The time taken to finish the task was about 9 seconds.

import requests
from time import time


urls = {
    "google": "https://google.com",
    "youtube": "https://youtube.com",
    "twitter": "https://twitter.com",
    "facebook": "https://facebook.com",
    "reddit": "https://reddit.com",
    "linkedin": "https://linkedin.com",
    "stackoverflow": "https://stackoverflow.com",
    "python": "https://www.python.org",
    "distrowatch": "https://distrowatch.org"
}


def get_http_data(filename, url):
    resp = requests.get(url)
    with open(filename, "wb") as file:
        file.write(resp.content)


if __name__ == '__main__':
    start = time()
    for key in urls.keys():
        print("Saving {}...".format(key))
        get_http_data(key, urls[key])
        print("{} saved...".format(key))
    print("End time:{}".format(time() - start))

After threading is used…
After threading is used, the time taken to finish downloading and saving the contents became 3 seconds.

import requests
import threading
from time import time


urls = {
    "google": "https://google.com",
    "youtube": "https://youtube.com",
    "twitter": "https://twitter.com",
    "facebook": "https://facebook.com",
    "reddit": "https://reddit.com",
    "linkedin": "https://linkedin.com",
    "stackoverflow": "https://stackoverflow.com",
    "python": "https://www.python.org",
    "distrowatch": "https://distrowatch.org"
}

threads = list()


def get_http_data(filename, url):
    resp = requests.get(url)
    with open(filename, "wb") as file:
        file.write(resp.content)


if __name__ == '__main__':
    start = time()
    for key in urls.keys():
        t = threading.Thread(target=get_http_data, name=key, args=(key, urls[key]))
        threads.append(t)
        print("Start thread {} saving {}".format(threading.current_thread().name, key))
        t.start()

    for thread in threads:
        thread.join()
        print("Thread {} finished...".format(thread.name))

    print("Finished all threads at {} seconds...".format(time() - start))

[python]Threading

Published by cyruslab

Leave a comment Cancel reply

Share this:

Related

Published by cyruslab

Leave a comment Cancel reply