[python]Download multiple files from different directories from ftp server

File patterns
This is to download multiple files of the same pattern form different directory. In ftp.pyclass.com there are directories organized in year, and within each year there are gunzip files that has this pattern stationId-Year on this example I will be downloading the gunzip files which has a station id 010010-99999, between the year 2009 and 2015 hence the gunzip file will look like 010010-99999-2009.gz, 010010-99999-2010.gz and so on.

Limitation on using multithreaded connection
The ftp server allows maximum of 8 connections, hence at any one time the most is download 8 files at a time, the script can be modified to launch threaded download instead of threaded connections.

Speed of downloading concurrently
speed1

The codes modified
This is the main.py codes which runs all scripts together.

from network_threads.get_files import GetFilesFromFTP
from security.crypto import decrypt
import json
from time import time


# get the file names so that these names are used for reading credential.
with open("crypto_result.json", "r") as j:
    j_data = json.load(j)

# grab the credential
cred_dict = decrypt(j_data.get("encrypted_filename"),
                    j_data["key_filename"],
                    convert_to_json=True)


# for collecting child threads.
threads = list()
stationId = "010010-99999"
ftp_dir = "Data"
if __name__ == '__main__':
    start = time()
    # for each file create a thread.
    for year in range(2009, 2015):
        t = GetFilesFromFTP(ftp_dir, cred_dict, year, stationId, name="{}-{}.gz".format(stationId, year))
        threads.append(t)
        print("Starting thread to download /{}/{}/{}-{}.gz".format(ftp_dir, year, stationId, year))
        t.start()

    for thread in threads:
        t.join()
        print("Thread {} is finished, joining back to main thread...".format(thread.name))
    print("All threads finished, total {} seconds...".format(time() - start))

The script ftp_cmd.py has added one more function.

def ftp_downloader_year(cred_dict, dir, year, stationId, host="ftp.pyclass.com"):
    with FTP(host, cred_dict['username'], cred_dict['password']) as ftp_client:
        ftp_client.cwd(dir)
        chdir("D:\\temp")
        fullpath = "/{}/{}/{}-{}.gz".format(dir, year, stationId, year)
        filename = basename(fullpath)
        try:
            with open(filename, "wb") as file:
                ftp_client.retrbinary("RETR {}".format(fullpath), file.write)
        except error_perm:
            print("{} is not available".format(filename))
            remove(filename)

The class in get_files.py is modified as follows:

from threading import Thread
from ftp_cmd import ftp_downloader_year


# sub class of Thread
class GetFilesFromFTP(Thread):
    def __init__(self, dir, cred_dict, year, stationId, name=None):
        super().__init__()
        self.year = year
        self.dir = dir
        self.cred_dict = cred_dict
        self.name = name
        self.stationId = stationId

    # overriding the run method of Thread class
    def run(self):
        print("Downloading from /{}/{}/{}.gz".format(self.dir, self.year, self.stationId))
        ftp_downloader_year(self.cred_dict, self.dir, self.year, self.stationId)
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s