1👍
From the python documentation:
\w:
When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database.
You just need to add the flag re.UNICODE
for it to work and convert the string to unicode (as u'mystring'
or unicode(string)
).
>>> re.findall(r'\w+', '/review_metas/2108/발견/24986/')
['review_metas', '2108', '24986']
>>> re.findall(r'\w+', u'/review_metas/2108/발견/24986/', re.UNICODE)
[u'review_metas', u'2108', u'\ubc1c\uacac', u'24986']
In your example:
>>> expr = r'^/review_metas/(?P<review_meta_id>\d+)/(?P<slug>[-~\w]+)/(?P<review_thread_id>\d+)/$'
>>> url = u'/review_metas/2108/발견/24986/'
>>> re.match(expr, url)
None
>>> f = re.match(expr, url, re.UNICODE)
>>> f
<_sre.SRE_Match at 0x7f2e08dd8620>
>>> f.group('slug')
u'\ubc1c\uacac'
Just by passing a proper unicode
string and adding the re.UNICODE
flag your parser works fine.
I don’t know how does Django handle the URLS internally (never used Django before), but if there is no way you can provide the unicode flag to Django, you can replace your slug pattern \w+
with [^/]+
.
r'^/review_metas/(?P<review_meta_id>\d+)/(?P<slug>[^/]+)/(?P<review_thread_id>\d+)/$'
It read as anything but '/'
.
1👍
use
re.findall(pattern, string, flags = re.U)
or just
re.findall(pattern, string, re.U)
You’ll deal with the same problem if you have to parse any language using non-canonical latin letters (i.e., Czech, Russian or Chinese).
- [Answered ]-Showing hyperlinked reverse foreign key in Django Admin
- [Answered ]-Upload any file in django
0👍
Use this:
r'^/review_metas/(?P<review_meta_id>\d+)/(?P<slug>.*)/(?P<review_thread_id>\d+)/$'