[python]Threading, acquiring and releasing lock to access shared resource.

On previous post about threading, there is no worry of data corruption as each fetch and write is to different destination and write to different file respectively.

What if the requirement is to append all data fetched from multiple websites and write them into a single file? There will be race condition which the same file to write to is accessed by two or more threads at the same time.

To solve the problem of consistency and data corruption on a single shared resource due to race condition, the use of lock is required.

A thread that finishes acquire the lock to access the file and write to it, after the content is written the thread release the lock so that other finished threads can acquire the lock to continue the operation.

import requests
import threading
from time import time


urls = {
    "google": "https://google.com",
    "youtube": "https://youtube.com",
    "twitter": "https://twitter.com",
    "facebook": "https://facebook.com",
    "reddit": "https://reddit.com",
    "linkedin": "https://linkedin.com",
    "stackoverflow": "https://stackoverflow.com",
    "python": "https://www.python.org",
    "distrowatch": "https://distrowatch.org"
}

threads = list()


def get_http_data(filename, url):
    get_http_lock = threading.Lock()
    resp = requests.get(url)
    # thread acquire lock to write to the same file.
    get_http_lock.acquire()
    with open(filename, "ab") as file:
        file.write(resp.content)
    # release the lock for other threads to write in after finished.
    get_http_lock.release()


if __name__ == '__main__':
    start = time()
    for key in urls.keys():
        # on each thread the function is called to acquire url content and write to the same file.
        t = threading.Thread(target=get_http_data, name=key, args=("onefile.txt", urls[key]))
        # add thread to the threads list.
        threads.append(t)
        print("Start thread {} saving content of {} to onefile.txt".format(threading.current_thread().name, key))
        # start the threading
        t.start()

    # wait for thread to finish one by one before exiting the main thread.
    for thread in threads:
        thread.join()
        print("Thread {} finished...".format(thread.name))

    print("Finished all threads at {} seconds...".format(time() - start))

The output
t3

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s