2👍
✅
I suggest you offload the crawl out of Django and keep it as another JSON service your web app is working on– likely needing a higher max timeout than normal calls.
There are three ways you could approach this:
- Use something like ScrapyRT and have your Django app request the URL you put the spider server on.
- Have Django access an SQLite database where the Scrapy spiders are configured to drop the data, and deploy your spiders to a scrapyd server.
- Run your spiders on Scrapinghub’s Scrapy Cloud. Gives you the same as #2, but then also lets you pick up the data from by making a call to the items endpoint on the API.
Source:stackexchange.com