5👍
Okay, I have found a solution to my problem. It’s a bit ugly but it works. Since the Django project’s manage.py
command does not accept Scrapy’s command line options, I split the options string into two arguments which are accepted by manage.py
. After successful parsing, I rejoin the two arguments and pass them to Scrapy.
That is, instead of writing
python manage.py scrapy crawl domain.com -o scraped_data.json -t json
I put spaces in between the options like this
python manage.py scrapy crawl domain.com - o scraped_data.json - t json
My handle function looks like this:
def handle(self, *args, **options):
arguments = self._argv[1:]
for arg in arguments:
if arg in ('-', '--'):
i = arguments.index(arg)
new_arg = ''.join((arguments[i], arguments[i+1]))
del arguments[i:i+2]
arguments.insert(i, new_arg)
from scrapy.cmdline import execute
execute(arguments)
Meanwhile, Mikhail Korobov has provided the optimal solution. See here:
# -*- coding: utf-8 -*-
# myapp/management/commands/scrapy.py
from __future__ import absolute_import
from django.core.management.base import BaseCommand
class Command(BaseCommand):
def run_from_argv(self, argv):
self._argv = argv
self.execute()
def handle(self, *args, **options):
from scrapy.cmdline import execute
execute(self._argv[1:])
3👍
I think you’re really looking for Guideline 10 of the POSIX argument syntax conventions:
The argument — should be accepted as a delimiter indicating the end of options.
Any following arguments should be treated as operands, even if they begin with
the ‘-‘ character. The — argument should not be used as an option or as an operand.
Python’s optparse
module behaves this way, even under windows.
I put the scrapy project settings module in the argument list, so I can create separate scrapy projects in independent apps:
# <app>/management/commands/scrapy.py
from __future__ import absolute_import
import os
from django.core.management.base import BaseCommand
class Command(BaseCommand):
def handle(self, *args, **options):
os.environ['SCRAPY_SETTINGS_MODULE'] = args[0]
from scrapy.cmdline import execute
# scrapy ignores args[0], requires a mutable seq
execute(list(args))
Invoked as follows:
python manage.py scrapy myapp.scrapyproj.settings crawl domain.com -- -o scraped_data.json -t json
Tested with scrapy 0.12 and django 1.3.1
- Django 1.4 Unknown command: 'runserver'
- How to layout a queue/worker structure to support large tasks for multiple environments?
- Save Matplotlib plot image into Django model