[Django]-Python django scrapy return item to controller

3👍

I didn’t actually test the integration with Django REST framework, but the following snippet would allow you to run a Spider from a python script, collecting the resulting items to handle them later.

from scrapy import signals
from scrapy.crawler import Crawler, CrawlerProcess
from ... import MysiteSpider

items = []
def collect_items(item, response, spider):
    items.append(item)

crawler = Crawler(MysiteSpider)
crawler.signals.connect(collect_items, signals.item_scraped)

process = CrawlerProcess()
process.crawl(crawler)
process.start()  # the script will block here until the crawling is finished

# at this point, the "items" variable holds the scraped items 

For the record, this works, but there might be better ways to do it 🙂

Further reading:

Leave a comment