[Django]-Django dumpdata UTF-8 (Unicode)

12👍

django-admin.py dumpdata yourapp could dump for that purpose.

Or if you use MySQL, you could use the mysqldump command to dump the whole database.

And this thread has many ways to dump data, including manual methods.

UPDATE: because OP edited the question.

To convert from JSON encoding string to human readable string you could use this:

open("mydata-new.json","wb").write(open("mydata.json").read().decode("unicode_escape").encode("utf8"))
👤YOU

18👍

After struggling with similar issues, I’ve just found, that xml formatter handles UTF8 properly.

manage.py dumpdata --format=xml > output.xml

I had to transfer data from Django 0.96 to Django 1.3. After numerous tries with dump/load data, I’ve finally succeeded using xml. No side effects for now.

Hope this will help someone, as I’ve landed at this thread when looking for a solution..

👤Tisho

14👍

This solution worked for me from @Julian Polard’s post.

Basically just add -Xutf8 in front of py or python when running this command:

python -Xutf8 manage.py dumpdata > data.json

Please upvote his answer as well if this worked for you ^_^

6👍

You need to either find the call to json.dump*() in the Django code and pass the additional option ensure_ascii=False and then encode the result after, or you need to use json.load*() to load the JSON and then dump it with that option.

5👍

Here I wrote a snippet for that.
Works for me!

👤dir01

4👍

You can create your own serializer which passes ensure_ascii=False argument to json.dumps function:

# serfializers/json_no_uescape.py
from django.core.serializers.json import *


class Serializer(Serializer):

    def _init_options(self):
        super(Serializer, self)._init_options()
        self.json_kwargs['ensure_ascii'] = False

Then register new serializer (for example in your app __init__.py file):

from django.core.serializers import register_serializer

register_serializer('json-no-uescape', 'serializers.json_no_uescape')

Then you can run:

manage.py dumpdata --format=json-no-uescape > output.json

2👍

As YOU has provided a good answer that is accepted, it should be considered that python 3 distincts text and binary data, so both files must be opened in binary mode:

open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))

Otherwise, the error AttributeError: 'str' object has no attribute 'decode' will be raised.

1👍

I’m usually add next strings in my Makefile:

.PONY: dump

# make APP=core MODEL=Schema dump
dump:
    @python manage.py dumpdata --indent=2 --natural-foreign --natural-primary ${APP}.${MODEL} | \
    python -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" \
    > ${APP}/fixtures/${MODEL}.json

It’s ok for standard django project structure, fix if your project structure is different.

1👍

This problem has been fixed for both JSON and YAML in Django 3.1.

1👍

here’s a new solution.

I just shared a repo on github: django-dump-load-utf8.

However, I think this is a bug of django, and hope someone can merge my project to django.

A not bad solution, but I think fix the bug in django would be better.

manage.py dumpdatautf8 --output data.json
manage.py loaddatautf8 data.json

0👍

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)

0👍

I encountered the same issue. After reading all the answers, I came up with a mix of Ali and darthwade‘s answers:

manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)

In Python 3, I had to open the file in binary mode and decode as unicode-escape. Also I added utf-8 when I open in write (binary) mode.

I hope it helps 🙂

0👍

Here is the solution from djangoproject.com
You go to Settings there’s a "Use Unicode UTF-8 for worldwide language support", box in "Language" – "Administrative Language Settings" – "Change system locale" – "Region Settings". If we apply that, and reboot, then we get a sensible, modern, default encoding from Python.
djangoproject.com

0👍

In 2023, I still had a rough time with this. I had to follow @wertartem’s suggestion and then Change the file encoding of the outputted file to get it to work. It seems the "-Xutf8" tag wasn’t necessary for me, but someone reading this might need to follow all 3 steps.

I also had a smaller issue I solved by excluding the admin.logentry from the export (added these tags "-e auth -e contenttypes -e auth.Permission -e admin.logentry")

My full process:

  1. For proper encoding, at least for Windows, make sure utf-8 for
    worldwide language support is enabled. To do this, (at least for
    Windows 11) go to "Time & Language" > "Language & Region". Under
    "Related Settings", click "Administrative Language Settings". Click
    "Change System Locale". Check the box for "Beta: Use Unicode UTF-8
    for worldwide language support". Restart the computer. Once enabled,
    skip this step for future exports.
  2. Run this command in terminal (here, I’m exporting to a subdirectory and excluding several apps and models from the export): python -Xutf8 manage.py dumpdata
    –format=json –natural-foreign –natural-primary -e auth -e contenttypes -e auth.Permission -e admin.logentry >
    databases/seeds/dump.json
  3. Open this "dump.json" file and run the vscode command
    "Change File Encoding" to save with UTF-8 encoding. If vscode
    crashes, this can be done in sublime text instead by opening the file and
    saving with encoding from the file menu.
  4. Change connection to the new database.
  5. python manage.py reset_db
  6. python manage.py migrate
  7. python manage.py loaddata "databases/seeds/dump.json"

Your step 2 command may desire (but not require) slight modification. Check out this: https://docs.djangoproject.com/en/4.2/ref/django-admin/#dumpdata

Leave a comment