Skip to content Skip to sidebar Skip to footer

Regex Help - Python - Extract All Image Url From Css

I am trying to extract all the image (.jpg, .png, .gif) uri's from css files. Sample css .blockpricecont{width:660px;height:75px;background:url('../images/postBack.jpg') repeat-x;

Solution 1:

print re.findall('url\(([^)]+)\)',target_text)

I think that should work

Solution 2:

The simplest way would be to eliminate comments before matching:

css = re.sub(r'(?s)/\*.*\*/', '', css)

However, I do agree with Matthew that using a dedicated parser would be better. Here's an example with tinycss:

import tinycss

defurls_from_css(css):
    parser = tinycss.make_parser()
    for r in parser.parse_stylesheet(css).rules:
        for d in r.declarations:
            for tok in d.value:
                if tok.type == 'URI':
                    yield tok.value

for url in urls_from_css(css):
    print url

Solution 3:

maybe, this way, first strip comments with re.sub then re.findall the goodies.

example_css = """.blockpricecont{width:660px;height:75px;background:url('../images/postBack.jpg') 
repeat-x;/*background:url('../images/tabdata.jpg') repeat-x;*/border: 1px solid #B7B7B7;"""import re

css_comments_removed = re.sub(r'\/\*.*?\*\/', '', example_css)

pattern = re.compile(r"(\'.*?\.[a-z]{3}\')")
matches = pattern.findall(css_comments_removed)
for i in matches:
    print(i)

prints

'../images/postBack.jpg'

Solution 4:

This would probably be better suited to a css parser. I haven't used it, but I've seen this one recommended before.

Post a Comment for "Regex Help - Python - Extract All Image Url From Css"