python - BeautifulSoup help, how to extract content from not proper tags text in html file? -


<tr> <td nowrap> good1 </td> <td class = "td_left" nowrap=""> 1 </td> </tr>  <tr0> <td nowrap> good2 </td> <td class = "td_left" nowrap="">  </td> </tr0> 

how using python parse it? please help. want result list ['good1',1,'good2',none]

find tr tags , tds it:

from bs4 import beautifulsoup   page = """<tr> <td nowrap> good1 </td> <td nowrap class = "td_left"> 1 </td> </tr>  <tr> <td nowrap> good2 </td> <td nowrap class = "td_left"> 2 </td> </tr>"""  soup = beautifulsoup(page) rows = soup.body.find_all('tr') print [td.text.strip() row in rows td in row.find_all('td')] 

prints:

[u'good1', u'1', u'good2', u'2'] 

note, strip() helps rid of leading , trailing whitespaces.

hope helps.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -