File patterns
This is to download multiple files of the same pattern form different directory. In ftp.pyclass.com there are directories organized in year, and within each year there are gunzip files that has this pattern stationId-Year
on this example I will be downloading the gunzip files which has a station id 010010-99999, between the year 2009 and 2015 hence the gunzip file will look like 010010-99999-2009.gz, 010010-99999-2010.gz and so on.
Limitation on using multithreaded connection
The ftp server allows maximum of 8 connections, hence at any one time the most is download 8 files at a time, the script can be modified to launch threaded download instead of threaded connections.
Speed of downloading concurrently
The codes modified
This is the main.py codes which runs all scripts together.
from network_threads.get_files import GetFilesFromFTP from security.crypto import decrypt import json from time import time # get the file names so that these names are used for reading credential. with open("crypto_result.json", "r") as j: j_data = json.load(j) # grab the credential cred_dict = decrypt(j_data.get("encrypted_filename"), j_data["key_filename"], convert_to_json=True) # for collecting child threads. threads = list() stationId = "010010-99999" ftp_dir = "Data" if __name__ == '__main__': start = time() # for each file create a thread. for year in range(2009, 2015): t = GetFilesFromFTP(ftp_dir, cred_dict, year, stationId, name="{}-{}.gz".format(stationId, year)) threads.append(t) print("Starting thread to download /{}/{}/{}-{}.gz".format(ftp_dir, year, stationId, year)) t.start() for thread in threads: t.join() print("Thread {} is finished, joining back to main thread...".format(thread.name)) print("All threads finished, total {} seconds...".format(time() - start))
The script ftp_cmd.py
has added one more function.
def ftp_downloader_year(cred_dict, dir, year, stationId, host="ftp.pyclass.com"): with FTP(host, cred_dict['username'], cred_dict['password']) as ftp_client: ftp_client.cwd(dir) chdir("D:\\temp") fullpath = "/{}/{}/{}-{}.gz".format(dir, year, stationId, year) filename = basename(fullpath) try: with open(filename, "wb") as file: ftp_client.retrbinary("RETR {}".format(fullpath), file.write) except error_perm: print("{} is not available".format(filename)) remove(filename)
The class in get_files.py is modified as follows:
from threading import Thread from ftp_cmd import ftp_downloader_year # sub class of Thread class GetFilesFromFTP(Thread): def __init__(self, dir, cred_dict, year, stationId, name=None): super().__init__() self.year = year self.dir = dir self.cred_dict = cred_dict self.name = name self.stationId = stationId # overriding the run method of Thread class def run(self): print("Downloading from /{}/{}/{}.gz".format(self.dir, self.year, self.stationId)) ftp_downloader_year(self.cred_dict, self.dir, self.year, self.stationId)