This is a revision for myself on threading.
To start threading in python a package threading
has to be imported.
urls = { "google": "https://google.com", "youtube": "https://youtube.com", "twitter": "https://twitter.com", "facebook": "https://facebook.com", "reddit": "https://reddit.com", "linkedin": "https://linkedin.com", "stackoverflow": "https://stackoverflow.com", "python": "https://www.python.org", "distrowatch": "https://distrowatch.org" }
The dictionary for a list of websites, the script is to save the body of the website into a file.
For each key in urls, a thread object is created. The thread object takes in these argument target
is the function to call, name
is the name of the thread this argument is optional, and args
accepts a tuple of the arguments required for the target
After the thread object is created the thread is then appended to a thread list, then start the thread.
After the iteration finished, finished threads will be join together. The below waits for all threads to finish.
for thread in threads: thread.join()
Before threading…
Without threading the for loop is run sequentially one after another.
The time taken to finish the task was about 9 seconds.
import requests from time import time urls = { "google": "https://google.com", "youtube": "https://youtube.com", "twitter": "https://twitter.com", "facebook": "https://facebook.com", "reddit": "https://reddit.com", "linkedin": "https://linkedin.com", "stackoverflow": "https://stackoverflow.com", "python": "https://www.python.org", "distrowatch": "https://distrowatch.org" } def get_http_data(filename, url): resp = requests.get(url) with open(filename, "wb") as file: file.write(resp.content) if __name__ == '__main__': start = time() for key in urls.keys(): print("Saving {}...".format(key)) get_http_data(key, urls[key]) print("{} saved...".format(key)) print("End time:{}".format(time() - start))
After threading is used…
After threading is used, the time taken to finish downloading and saving the contents became 3 seconds.
import requests import threading from time import time urls = { "google": "https://google.com", "youtube": "https://youtube.com", "twitter": "https://twitter.com", "facebook": "https://facebook.com", "reddit": "https://reddit.com", "linkedin": "https://linkedin.com", "stackoverflow": "https://stackoverflow.com", "python": "https://www.python.org", "distrowatch": "https://distrowatch.org" } threads = list() def get_http_data(filename, url): resp = requests.get(url) with open(filename, "wb") as file: file.write(resp.content) if __name__ == '__main__': start = time() for key in urls.keys(): t = threading.Thread(target=get_http_data, name=key, args=(key, urls[key])) threads.append(t) print("Start thread {} saving {}".format(threading.current_thread().name, key)) t.start() for thread in threads: thread.join() print("Thread {} finished...".format(thread.name)) print("Finished all threads at {} seconds...".format(time() - start))