[Django]-How would you write an `is_pdf(path_to_file)` function in Python?

0👍

I’ve found this pypi.org/project/pdfminer.six . I produced a simple example. See if it is useful to you. a.pdf is an empty file. I don’t know what it will do when trying to read a pdf file which is still being processed by another program.

from pdfminer.high_level import extract_text

try:
 text = extract_text("D:\\a.pdf")
 print(text)
except :
 print("invalid PDF file")
else:
 pass

— update —

Alternatively, I have seen an example of PDFDocument on pdfminer github,
https://github.com/pdfminer/pdfminer.six/blob/develop/tools/pdfstats.py
on line 53.

I produced a similar example code:

from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfparser import PDFParser

try:
 pdf_file = open("D:\\a.pdf", 'rb')
 parser = PDFParser(pdf_file)
 password = ''
 document = PDFDocument(parser, password)
 print(document.info)
 print(document.xrefs)
except :
 print("invalid PDF file")
else:
 pass

In my example, since a.pdf is empty; open() function throws the exception. In your case, I’m guessing it will be able to open the file but PDFParser or PDFDocument may throw an exception. If no exception is thrown, PDFDocument.info attribute might be useful.

— update 2 —

I’ve realized that document object has xrefs attribute. there is an explanation in PdfParser class : "It also reads XRefs at the end of every PDF file." Checking the value of document.xrefs might be useful.

-1👍

I suspect you could just write a script to email yourself or team distribution and simply list all the files in the directory. However, if you’re only asking how to natively search a directory without installing modules. I would import os and re.

# ***** Search File *****
files = os.listdir(r"C:\Users\PATH")
print(files)

Leave a comment