1👍
✅
Dealing directly with HTML code is never a good idea, if you simply do a replace on the html text you may get into problems like this:
<img src="static.example.com/jinja-templating"/>
becoming:
<img src="static.example.com/<a href='/glossary?word=jinja'>jinja</a>-templating"/>
which is absolutely destructive. No words.
So what can I do?
HTML Parser
I highly recommend learning and using an HTML parser like BeautifulSoup
Regex
Regex is also not considered safe when dealing directly with html, however at
times it can get the job done. For your case I decided to come up with a regular
expression which might get it done.
import re
html = '<div id="term"><span style="term:10px">term</span><img src="static.example.com/term"/></div><div>the technology term is amazing</div>'
glossaried = re.sub(r'>([^<>]*)term([^<>]*)<',r'>\1<a href="/glossary?word=term">term</a>\2<', html)
print glossaried
'<div id="term"><span style="term:10px"><a href="/glossary?word=term">term</a></span><img src="static.example.com/term"/></div><div>the technology <a href="/glossary?word=term">term</a> is amazing</div>'
Source:stackexchange.com