[Django]-Django python-magic identify ppt,docx,word uploaded file as 'application/zip'

4๐Ÿ‘

I too came across this issue recently. Python-magic uses the Unix command file which uses a database file to identify documents (see man file). By default this database does not include instructions on how to identify .docx, .pptx, and .xlsx file types.

You can give additional information to file command to identify these types by adding instructions to /etc/magic (see https://serverfault.com/a/377792).

This should then work:

magic.from_file("path_to_the_file.docx", mime=True)

Returns 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'

One thing to note from the python-magic usage instruction on GitHub โ€“ this does not seem work for .docx, .pptx, and .xlsx file types (with the additional information in /etc/magic):

magic.from_buffer(open("testdata/test.pdf").read(1024), mime=True)

Returns 'application/zip'

It seems you need to give it more data to correctly identify these file types:

magic.from_buffer(open("testdata/test.pdf").read(2000), mime=True)

Returns 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'

Iโ€™m not sure of the exact amount needed.

๐Ÿ‘คSion

Leave a comment