[Answered ]-Use Escaped url in Django url regex mismatch

1👍

Consider a simple project like this:

urls.py

from django.contrib import admin
from django.urls import path, re_path
from . import views

urlpatterns = [
    re_path(r"https?[-a-zA-Z0-9%._+~#=]+", views.test, name="test"),
    path('admin/', admin.site.urls),
]

views.py

from django.http import HttpResponse

def test(request, obj, field):
    print(f"The object is {obj}")
    print(f"The field is {field}")
    return HttpResponse("Test test")

When visiting the following URL: /objects/http%3A%2F%2F0.0.0.0%3A3030%2Fu%2F%3Fid%3Dc789793d-9538-4a27-9dd0-7bb487253da1/foo

You get this error:

URL error page
(I’ve outlined the relevant part with red.)

Django automatically decodes the encoded URL and only then applies the regex match. objects/http%3A%2F%2F0.0.0.0%3A3030%2Fu%2F%3Fid%3Dc789793d-9538-4a27-9dd0-7bb487253da1/foo becomes objects/http://0.0.0.0:3030/u/?id=c789793d-9538-4a27-9dd0-7bb487253da1/foo. You will have to write the equivalent regex expression that matches against the decoded URL.

Something like this will work:

urls.py

from django.contrib import admin
from django.urls import path, re_path
from . import views

urlpatterns = [
    re_path(r"(?P<obj>https?:\/\/.*\?id=[\d\w-]+)\/(?P<field>foo|bar)", views.test, name="test"),
    path('admin/', admin.site.urls),
]

views.py

from django.http import HttpResponse

def test(request, obj, field):
    print(f"The object is {obj}")
    print(f"The field is {field}")
    return HttpResponse("Test test")

Visiting the URL /objects/http%3A%2F%2F0.0.0.0%3A3030%2Fu%2F%3Fid%3Dc789793d-9538-4a27-9dd0-7bb487253da1/foo will print the following to the console:

The object is http://0.0.0.0:3030/u/?id=c789793d-9538-4a27-9dd0-7bb487253da1 
The field is foo

0👍

If I am understanding your issue properly, it looks like you are attempting to get a regex match and immediately send a request to the resultant url?

If that is the case, you are sending the request to an improperly formatted url. The first regex you posted looks like it works just fine to get the result you are asking for, however it results in a url that is still encoded.

You need to "unquote" the url prior to making the request.

import re
from urllib.parse import unquote

path = '/objects/http%3A%2F%2F0.0.0.0%3A3030%2Fu%2F%3Fid%3Dc789793d-9538-4a27-9dd0-7bb487253da1/foo'

resp = re.search("https?[-a-zA-Z0-9%._+~#=]+", path)
url = resp[0]
print(url)
print(unquote(url))

results in and output of:

http%3A%2F%2F0.0.0.0%3A3030%2Fu%2F%3Fid%3Dc789793d-9538-4a27-9dd0-7bb487253da1

http://0.0.0.0:3030/u/?id=c789793d-9538-4a27-9dd0-7bb487253da1

Leave a comment