I used to use PDFSam to do PDF file merging when submitting my claims which consist of many receipts and claim application form which are all in PDF format, however since I know python an easier and free way to do PDF merging is to use the PyPDF2 module. Credits go to the PyPDF2 team for abstracting the complicated PDF processing from python scriptors.
This is the main module for processing PDF files in a simple way, the main classes are PdfFileMerger and PdfFileReader, I have also imported the PdfReadError for catching exception should there be problems opening the PDF files.
I create a merger object with PdfFileMerger then append each PDF file, and finally write them to a fresh PDF file which contains the merged content of all PDF files specified.
The PdfFileReader is to make sure the PDF file specified by the user is indeed a PDF file and not something else.
The Argparse module is for specifying arguments in command line, there are two options:
- -f / –file – accepts a list of PDF file names, this option is mandatory as the script needs to know which PDF files are required to merge.
- -o / –output – accepts a filename of the merged PDF file, if not specified the default filename – merged.pdf – is used.
Argparse module is an ideal module to use for command line python script as it contain less codes compared to getopt module.
For argument that accepts a collection of items you need to specify the nargs option when using the parser, valid nargs are:
- nargs=’+’ – 1 or more arguments
- nargs=’?’ – 0 or 1 argument
- nargs=’*’ – 0 or more arguments
If you know regex the symbols used in nargs are simple. If you have not learned regular expression you should take up the knowledge now, regex is a must know topic for programming. I recommend taking The Complete Regular Expressions(Regex) Course For Beginners.
""" Merging PDFs into one big PDF has never been so easy and free thanks to creators of pyPDF2 """ from PyPDF2 import PdfFileMerger, PdfFileReader from PyPDF2.utils import PdfReadError from argparse import ArgumentParser def is_pdf(pdf_path): try: with open(pdf_path, "rb") as pdf_file: PdfFileReader(pdf_file) return True except (PdfReadError, OSError, FileNotFoundError): """ Possible exceptions: PdfReadError - When problem opening a PDF file. OSError - When a non-pdf file such as txt is attempted to be opened. PyPDF2 throws OSError: [Errno 22] Invalid argument FileNotFoundError - When the filename is not found in the argument list. """ return False def merge_pdfs(list_of_pdf, merged_filename="merged.pdf"): merger = PdfFileMerger() for pdf in list_of_pdf: if is_pdf(pdf): merger.append(pdf) try: merger.write(merged_filename) except (PdfReadError, AttributeError) as e: print(e) if __name__ == '__main__': parser = ArgumentParser() # -f / --file accepts a list of arguments, the nargs=+ means it accepts 1 or more parser.add_argument('-f', '--file', nargs='+', dest="user_inputs", help='pdf files to merge', required=True) parser.add_argument('-o', '--outfile', dest='output_filename', help="filename after merged pdf, default is " "merged.pdf if not specified", required=False) args = parser.parse_args() if args.output_filename: merge_pdfs(args.user_inputs, args.output_filename) else: merge_pdfs(args.user_inputs)