python - how to make a selection for the dictionary? -
help please write xpath-expression.
html:
<div class="tabitem"> <p><strong>product composition</strong></p> <p>93% polyamide 7% elastane</p> <p>lining: 100% polyester</p><p>dress length: 90 cm</p> <p><strong>product attributes;</strong></p> <p>: boat neck, long sleeve, midi, zip, concealed, laced, side</p> <p>lining type: full lining</p> </div>
this need following html dictionary:
data['product composition'] = '93% polyamide 7% elastane lining: 100% polyester</p><p>dress length: 90 cm' data['product attributes;'] = ': boat neck, long sleeve, midi, zip, concealed, laced, side lining type: full lining'
it important number of elements can vary. ie need universal solution
get every strong
tag inside p
, it's parent , next parent's siblings until there p
tag strong
tag inside or no more siblings left:
from lxml.html import fromstring html_data = """<div class="tabitem"> <p><strong>product composition</strong></p> <p>93% polyamide 7% elastane</p> <p>lining: 100% polyester</p><p>dress length: 90 cm</p> <p><strong>product attributes;</strong></p> <p>: boat neck, long sleeve, midi, zip, concealed, laced, side</p> <p>lining type: full lining</p> </div>""" tree = fromstring(html_data) data = {} strong in tree.xpath('//p/strong'): parent = strong.getparent() description = [] next_p = parent.getnext() while next_p not none , not next_p.xpath('.//strong'): description.append(next_p.text) next_p = next_p.getnext() data[strong.text] = " ".join(description) print data
prints:
{'product composition': '93% polyamide 7% elastane lining: 100% polyester', 'product attributes;': ': boat neck, long sleeve, midi, zip, concealed, laced, side lining type: full lining'}
Comments
Post a Comment