[Django]-Save memory in Python. How to iterate over the lines and save them efficiently with a 2million line file?

5👍

✅

Make sure that Django’s DEBUG setting is set to False

[Django]-Best option for Google App Engine Datastore and external database?

2👍

This looks perfectly fine to me. Iterating over the file like that or using xreadlines() will read each line as needed (with sane buffering behind the scenes). Memory usage should not grow as you read in more and more data.

As for performance, you should profile your app. Most likely the bottleneck is somewhere in a deeper function, like POI.save().

👤Max Shawabkeh

2👍

There’s no reason to worry in the data you’ve given us: is memory consumption going UP as you read more and more lines? Now that would be cause for worry — but there’s no indication that this would happen in the code you’ve shown, assuming that p.save() saves the object to some database or file and not in memory, of course. There’s nothing real to be gained by adding del statements, as the memory is getting recycled at each leg of the loop anyway.

This could be sped up if there’s a faster way to populate a POI instance than binding its attributes one by one — e.g., passing those attributes (maybe as keyword arguments? positional would be faster…) to the POI constructor. But whether that’s the case depends on that geonames.models module, of which I know nothing, so I can only offer very generic advice — e.g., if the module lets you save a bunch of POIs in a single gulp, then making them (say) 100 at a time and saving them in bunches should yield a speedup (at the cost of slightly higher memory consumption).

👤Alex Martelli

Source:stackexchange.com

Leave a comment Cancel reply