Python 3 issue with re

Python 3 interprets string literals as Unicode strings, and therefore \s is treated as an escaped Unicode character.

Declare your pattern as a raw string instead by prepending r
This commit is contained in:
Joseph Marie Alba 2021-05-16 05:55:35 +08:00 committed by GitHub
parent d84d02349c
commit 95bb5e5599
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -912,7 +912,7 @@ def extract_images_from_html(doc, content):
return '<img src="{file_url}"'.format(file_url=file_url)
if content and isinstance(content, string_types):
content = re.sub('<img[^>]*src\s*=\s*["\'](?=data:)(.*?)["\']', _save_file, content)
content = re.sub(r'<img[^>]*src\s*=\s*["\'](?=data:)(.*?)["\']', _save_file, content)
return content