[python]Change interesting contents of a text file into dictionary

Introduction

This exercise downloads the CHECKSUM file from the centos download page, the file downloaded is a temporary file and is destroyed once it is closed after used. The data read from the temporary file is then processed into a dictionary, the dictionary key is the filename and the value is the hash digest.

Tempfile usage

I am using the NamedTemporaryFile function to create a tempfile object for creating and reading. NamedTemporaryFile allows me to customize the prefix of the temporary file name. The mode use is w+b because the contents is bytes from the requests stream.

from tempfile import NamedTemporaryFile

with NamedTemporaryFile(mode="w+b", prefix="tmp") as tmp:
    tmp.write(r.content)
    tmp.seek(0)
    data = tmp.read().decode("utf-8")

with context is used so that I do not need to worry about forgetting to close the file after used. In order to read the entire contents of the temporary file tmp.seek(0) is used. tmp.seek(0) points to the beginning of the file. The usage is the same as the open function of a normal file operation in python, if you know how to use open function learning how to use tempfile is no-brainer.

Download CHECKSUM from website

Download file from web site can be achieved by using the requests module, by using the get method and turn on the stream then download the content from the web site and save to as a file in the computer.

Below is the code which downloads the checksum file from centos, and use the content of the tempfile.

import requests
from tempfile import NamedTemporaryFile

checksum_dict = dict()
base_url = "http://download.nus.edu.sg/mirror/centos/8.2.2004/isos/x86_64/"
CHECKSUM = base_url + "CHECKSUM"
with requests.get(CHECKSUM, stream=True) as r, NamedTemporaryFile(mode="w+b", prefix="tmp") as tmp:
    tmp.write(r.content)
    # Reset the file pointer to the beginning of the file in order
    # to read its content from its beginning.
    tmp.seek(0)
    # The data has to be converted to string from byte, so that string method
    # splitlines can be used.
    data = tmp.read().decode("utf-8")
for line in data.splitlines():
    if "SHA256" in line:
        # Filter only the sha256 digest.
        # The key is the file name, and the value is the digest.
        checksum_dict.update({
            line.split()[1].lstrip("(").rstrip(")"): line.split()[-1]
        })
print(checksum_dict)

The original checksum file content is like this:

CHECKSUM FILE CONTENT FROM CENTOS.

After the script the information is processed into dictionary.

The string is converted into dictionary to make the data more structured.
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s