[python]Download files with FTP client with multithreading

You may wish to enroll the course by Ardit Sulce, I will not reveal the username and password on his ftp.pyclass.com, to be fair to him in order to use his resource for practice and learning data science you got to enroll to his course.

The code presented here has 5 parts:

  1. security.crypto, for encrypting and decrypting credentials
  2. ftp_cmd, for downloading file from ftp.
  3. init_cred, to first put in username and password for the ftp server, the json file will be returned that stores the encrypted credential filename and the key name
  4. main.py, this will execute the codes of ftp_cmd
  5. network_threads.get_files uses threading to download all files

security.crypto


Contains functions of decrypting, and encrypting, this is to protect the username and password and store as encrypted text file, this is used when there is no vault available. You will notice I have been using a lot of cryptography.fernet when doing encryption and decryption due to its ease of use, fernet is a recipe that uses AES 128 bit.

This section I will break the functions down to bite size instead of putting the entire code with comments for clearer documentation.

check_key() function


This function is used to check if there is any symmetric key, if there is no encryption key yet create one, and write to a file. The caveat is if the key is lost and the key was used to encrypt the data then the data is lost forever.

def check_key(key_file):
    if not exists(key_file):
        key = Fernet.generate_key()
        with open(key_file, "wb") as file:
            file.write(key)

prompt_credential()


This function prompts user to enter username and password, getpass() gets the password from user, when user types the password the password is obscured on the console.
This function will then returns the dictionary of the username and password.

def prompt_credential():
    username = input("Username:")
    password = getpass()
    return {
        "username": username,
        "password": password
    }

use_key(key_file)


This function read the key binary from the key file, and return the key binary.

def use_key(key_file):
    check_key(key_file)
    with open(key_file, "rb") as file:
        key_byte = file.read()
    return key_byte

encrypt(filename, data, key)


This function encrypts the data to a specified filename with the key generated from the fernet recipe.

def encrypt(filename, data, key):
    key_byte = use_key(key)
    cipher = Fernet(key_byte)
    cipher_text = cipher.encrypt(data.encode('utf-8'))
    with open(filename, "wb") as file:
        file.write(cipher_text)

def decrypt(filename, key, convert_to_json=False)


This function decrypts the data read from the encrypted file which contains the credential dictionary, the default is not to convert to json.
The binary has to be decoded with utf-8 to become a string.

def decrypt(filename, key, convert_to_json=False):
    key_byte = use_key(key)
    with open(filename, "rb") as file:
        cipher_text = file.read()
    cipher = Fernet(key_byte)
    plain_text = cipher.decrypt(cipher_text)
    if convert_to_json:
        return dict(json.loads(plain_text.decode('utf-8')))
    else:
        return plain_text.decode('utf-8')

init_credential()


This function is used to get the username and password, then encrypt the data and saved as an encrypted file, this main function uses the encrypt, and check_key functions.
init_credential() function returns a dictionary of key filename and encrypted filename for future use.

def init_credential():
    cred = prompt_credential()
    key_filename = input("Key file name:")
    cipher_text_filename = input("Filename for encrypted file:")
    print("Creating key {}...\n".format(key_filename))
    check_key(key_filename)
    print("Key {} created...\n".format(key_filename))
    encrypt(cipher_text_filename, json.dumps(cred), key_filename)
    print("Encrypted data into file {}".format(cipher_text_filename))
    return {
        "key_filename": key_filename,
        "encrypted_filename": cipher_text_filename
    }

Entire code in action

from cryptography.fernet import Fernet
from os.path import exists
from getpass import getpass
import json


def check_key(key_file):
    if not exists(key_file):
        key = Fernet.generate_key()
        with open(key_file, "wb") as file:
            file.write(key)


def prompt_credential():
    username = input("Username:")
    password = getpass()
    return {
        "username": username,
        "password": password
    }


def use_key(key_file):
    check_key(key_file)
    with open(key_file, "rb") as file:
        key_byte = file.read()
    return key_byte


def encrypt(filename, data, key):
    key_byte = use_key(key)
    cipher = Fernet(key_byte)
    cipher_text = cipher.encrypt(data.encode('utf-8'))
    with open(filename, "wb") as file:
        file.write(cipher_text)


def decrypt(filename, key, convert_to_json=False):
    key_byte = use_key(key)
    with open(filename, "rb") as file:
        cipher_text = file.read()
    cipher = Fernet(key_byte)
    plain_text = cipher.decrypt(cipher_text)
    if convert_to_json:
        return dict(json.loads(plain_text.decode('utf-8')))
    else:
        return plain_text.decode('utf-8')


def init_credential():
    cred = prompt_credential()
    key_filename = input("Key file name:")
    cipher_text_filename = input("Filename for encrypted file:")
    print("Creating key {}...\n".format(key_filename))
    check_key(key_filename)
    print("Key {} created...\n".format(key_filename))
    encrypt(cipher_text_filename, json.dumps(cred), key_filename)
    print("Encrypted data into file {}".format(cipher_text_filename))
    return {
        "key_filename": key_filename,
        "encrypted_filename": cipher_text_filename
    }

init_cred.py run this first


Run this script to get the username and password from user.
The json.dump function is to convert a string data into json to a file, dump method requires a file pointer which is j and the data.

from security.crypto import init_credential
import json

crypto_result = init_credential()
with open("crypto_result.json", "w") as j:
    json.dump(crypto_result, j)

ftp_cmd.py is the collection of ftp command function


Currently this script only has one function for download files.

from ftplib import FTP
from os import chdir

def ftp_downloader(filename, dir, cred_dict, host="ftp.pyclass.com"):
    # use the with context to automatically close the ftp connection.
    with FTP(host, cred_dict['username'], cred_dict['password']) as ftp_client:
        # ftp command to change working directory
        ftp_client.cwd(dir)
        # save the downloads to the specified windows directory.
        chdir("D:\\temp")
        with open(filename, "wb") as file:
            # write the binary to the file.
            # note that file.write is not file.write().
            ftp_client.retrbinary("RETR {}".format(filename), file.write)

Threading to download files


This is a subclass to use threading to download files from the ftp server.

from threading import Thread
from ftp_cmd import ftp_downloader


# sub class of Thread
class GetFilesFromFTP(Thread):
    def __init__(self, filename, dir, cred_dict, name=None):
        super().__init__()
        self.filename = filename
        self.dir = dir
        self.cred_dict = cred_dict
        self.name = name

    # overriding the run method of Thread class
    def run(self):
        print("Downloading from /{}/{}".format(self.dir, self.filename))
        ftp_downloader(self.filename, self.dir, self.cred_dict)

main.py runs all scripts together


This function runs all scripts together, it creates a thread from the subclass GetFilesFromFTP.

from network_threads.get_files import GetFilesFromFTP
from security.crypto import decrypt
import json
from time import time
from os.path import getsize

# get the file names so that these names are used for reading credential.
with open("crypto_result.json", "r") as j:
    j_data = json.load(j)

# grab the credential
cred_dict = decrypt(j_data.get("encrypted_filename"),
                    j_data["key_filename"],
                    convert_to_json=True)

# files I need to download from ftp server.
files = ["data-format.txt",
         "data-technical-document.txt",
         "isd-lite-format.pdf",
         "station-info-metadata.txt",
         "station-info.txt"]

# for collecting child threads.
threads = list()

if __name__ == '__main__':
    start = time()
    # for each file create a thread.
    for file in files:
        t = GetFilesFromFTP(file, "Data", cred_dict, name=file)
        threads.append(t)
        print("Starting thread to download {}".format(file))
        t.start()

    for thread in threads:
        t.join()
        print("Thread {} is finished, joining back to main thread...".format(thread.name))
    print("All threads finished, total {} seconds...".format(time() - start))

    print("Checking the downloaded file size in bytes...\r")
    for file in files:
        print("{} is {} bytes..\r".format(file, getsize(file)))

How the concurrent download looks like and how long it takes?


ftp_threads

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s