1👍
If you are using the requests-aws4auth package, then you can use the following wrapper class in place of the AWS4Auth
class. It encodes the headers created by AWS4Auth
into byte strings thus avoiding the UnicodeDecodeError
downstream.
from requests_aws4auth import AWS4Auth
class AWS4AuthEncodingFix(AWS4Auth):
def __call__(self, request):
request = super(AWS4AuthEncodingFix, self).__call__(request)
for header_name in request.headers:
self._encode_header_to_utf8(request, header_name)
return request
def _encode_header_to_utf8(self, request, header_name):
value = request.headers[header_name]
if isinstance(value, unicode):
value = value.encode('utf-8')
if isinstance(header_name, unicode):
del request.headers[header_name]
header_name = header_name.encode('utf-8')
request.headers[header_name] = value
0👍
I suspect you’re correct about the arabic chars now showing up in the DB.
- https://github.com/elastic/elasticsearch-py/issues/392
- https://github.com/django-haystack/django-haystack/issues/1072
are also possibly related to this issue. The first link seems to have some kind of work around for it, but doesn’t have a lot of detail. I suspect what the author meant with
The proper fix is to use unicode type instead of str or set the default encoding properly to (I assume) utf-8.
is that you need to check that the the machine it’s running on is LANG=en_US.UTF-8
or at least some UTF-8 LANG
- Send JSON object back to server in Django
- Django / WagtailCMS – get attribute (eg body text) of child page using get_context
- Creating a header image with bootstrap
- In django form how to add checkboxes instead of radio button?
0👍
Elasticsearch supports different encoding so having arabic characters shouldn’t be the problem.
Since you are using AWS, I will assume you also use some authorization library like requests-aws4auth.
If that is the case, notice that during authorization, some unicode headers are added, like u'x-amz-date'
. That is a problem, since python’s httplib perfoms the following during _send_output(): msg = "\r\n".join(self._buffer)
where _buffer is a list of the HTTP headers. Having unicode headers makes msg
be of <type 'unicode'>
while it really should be of type str
(Here is a similar issue with different auth library).
The line that raises the exception, msg += message_body
raises it since python needs to decode message_body
to unicode so it matches the type of msg. The exception is rised since py-elasticsearch already took care of the encoding, so we end up of encoding to unicode twice, which cause the exception (as explained here).
You may want to try to replace the auth library (for example with DavidMuller/aws-requests-auth) and see if it fixes the problem.
- ImportError: No module named corsheaders
- IndexError at /delta/ – list index out of range – Django
- Django email attach method is not taking parameters right
- Django – TypeError: int() argument must be a string or a number, not 'dict'