1👍
you may try raw_html extraction: https://github.com/grangier/python-goose#known-issues
you may do some encoding/decoding with the raw html.
0👍
Maybe it helps to use unicode for all strings:
Insert from __future__ import unicode_literals
at the very first line of your python file and re-try…
👤OBu
- [Answer]-Django, blog app, on local server can only see administrator page
- [Answer]-How django validates and saves inlineforms
- [Answer]-Using optimized exists() for a set
- [Answer]-Django: FILES Form Not Validating
0👍
Try adding a little u before the string. I don’t see any weird characters there, but I usually use hebrew in my django code and the bash at the top is not always enough
article = g.extract(url=u"http://www.sportingnews.com/ncaa-football/story/2013-09-17/week-4-exit-poll-johnny-manziel-alabama-oregon-texas-mack-brown-mariota")
👤yuvi
- [Answer]-Django queryset update performance and optimization
- [Answer]-How do I use a django join to select customer_name for every id that is returned in a result
- [Answer]-Sharing variables between urlpatterns in Django
0👍
Even though I can’t reproduce error with this URL, I had similar problems with python-goose. Try:
from goose.configuration import Configuration
from goose import Goose
config = Configuration()
config.parser_class = 'soupparser' # this helped me
g = Goose(config)
article = g.extract(url="http://www.sportingnews.com/ncaa-football/story/2013-09-17/week-4-exit-poll-johnny-manziel-alabama-oregon-texas-mack-brown-mariota")
- [Answer]-Django increment one 'hit' for views in the current day
- [Answer]-Django: is it possible to exclude ForeignKey in derived model
- [Answer]-Should I implement revisioning using database triggers or using django-reversion?
- [Answer]-Django – negative query in one-to-many relationship
- [Answer]-Django 1.6 urls showing unexpected erros
Source:stackexchange.com