[Django]-Large PDFs taking exponentially longer time with ReportLab

2๐Ÿ‘

I spent a lot of time to find the cause of the problem above. Instead of LongTable you can try to use my BigDataTable class, optimized for processing big data.

GIST BigDataTable faster LongTable on the big data

Tested with 6500 rows and 7 columns:

  • LongTable: > 1 hour of total document build time processing
  • BigDataTable: ~ 24.2 seconds of total document build time processing
๐Ÿ‘คDShost

0๐Ÿ‘

35k pages is not exactly mainstream PDF use, so any glitches are not entirely unexpected. A few ideas to explore:

  • It could simply be that the machine simply runs out of RAM dealing with the amount of data, and a hardware upgrade would help.
  • You could try splitting the data into several tables rather than one big one to see if this improves performance.
  • Would it be possible to split the content either temporarily (to be stitched back together into one file with a different tool like GhostScript) or permanently into several files?
  • Would it be possible to handle pagination yourself (e.g. if the length of the content elements is predictable)? It may (or may not) be that pagination of very large tables gets out of hand.
  • You could try testing a different data structure than LongTable that runs over the same length to check if the problem is related to that particular structure; if it is; you might be able to find an alternative.
  • Finally (or firstly, depending on your inclination), you could look into the relevant code and/or raise an issue with the ReportLab team.
๐Ÿ‘คEndre Both

Leave a comment