Regex Help - Python - Extract All Image Url From Css
I am trying to extract all the image (.jpg, .png, .gif) uri's from css files. Sample css .blockpricecont{width:660px;height:75px;background:url('../images/postBack.jpg') repeat-x;
Solution 1:
print re.findall('url\(([^)]+)\)',target_text)
I think that should work
Solution 2:
The simplest way would be to eliminate comments before matching:
css = re.sub(r'(?s)/\*.*\*/', '', css)
However, I do agree with Matthew that using a dedicated parser would be better. Here's an example with tinycss:
import tinycss
defurls_from_css(css):
parser = tinycss.make_parser()
for r in parser.parse_stylesheet(css).rules:
for d in r.declarations:
for tok in d.value:
if tok.type == 'URI':
yield tok.value
for url in urls_from_css(css):
print url
Solution 3:
maybe, this way, first strip comments with re.sub then re.findall the goodies.
example_css = """.blockpricecont{width:660px;height:75px;background:url('../images/postBack.jpg')
repeat-x;/*background:url('../images/tabdata.jpg') repeat-x;*/border: 1px solid #B7B7B7;"""import re
css_comments_removed = re.sub(r'\/\*.*?\*\/', '', example_css)
pattern = re.compile(r"(\'.*?\.[a-z]{3}\')")
matches = pattern.findall(css_comments_removed)
for i in matches:
print(i)
prints
'../images/postBack.jpg'
Solution 4:
This would probably be better suited to a css parser. I haven't used it, but I've seen this one recommended before.
Post a Comment for "Regex Help - Python - Extract All Image Url From Css"