python - BeautifulSoup help, how to extract content from not proper tags text in html file? -

July 15, 2012

<tr> <td nowrap> good1 </td> <td class = "td_left" nowrap=""> 1 </td> </tr>  <tr0> <td nowrap> good2 </td> <td class = "td_left" nowrap="">  </td> </tr0>

how using python parse it? please help. want result list ['good1',1,'good2',none]

find tr tags , tds it:

from bs4 import beautifulsoup   page = """<tr> <td nowrap> good1 </td> <td nowrap class = "td_left"> 1 </td> </tr>  <tr> <td nowrap> good2 </td> <td nowrap class = "td_left"> 2 </td> </tr>"""  soup = beautifulsoup(page) rows = soup.body.find_all('tr') print [td.text.strip() row in rows td in row.find_all('td')]

prints:

[u'good1', u'1', u'good2', u'2']

note, strip() helps rid of leading , trailing whitespaces.

hope helps.

Search This Blog

And

python - BeautifulSoup help, how to extract content from not proper tags text in html file? -

Comments

Post a Comment

Popular posts from this blog

google app engine - 403 Forbidden POST - Flask WTForms -

Android layout hidden on keyboard show -

Parse xml element into list in Python -