python 2.7 - Trying to extract from the deep node with scrapy, results are bad -

September 15, 2015

as beginner i'm having hard time, i'm here ask help. i'm trying extract prices html page, nested deeply:

enter image description here

second price location:

enter image description here

from scrapy.spider import spider scrapy.selector import selector  mymarket.items import mymarketitem  class myspider(spider):     name = "mymarket"     allowed_domains = ["url"]     start_urls = [         "http://url"             ]      def parse(self, response):         sel = selector(response)         titles = sel.xpath('//table[@class="tab_product_list"]//tr')         items = []         t in titles:             item = mymarketitem()             item["price"] = t.xpath('//tr//span[2]/text()').extract()             items.append(item)          return items

i'm trying export scraped prices csv. export being populated this:

enter image description here

and want them sorted in .csv:

enter image description here etc.

can point out faulty part of xpath or how can make prices sorted "properly" ?

it's difficult what's wrong path. install firepath extension firefox test xpath queries. 1 note now:

titles = sel.xpath('//table[@class="tab_product_list"]//tr')

in screenshot have nested tables, //tr give trs nested tables too.

def parse(self, response):         sel = selector(response)         titles = sel.xpath('//table[@class="tab_product_list"]/tr')  # or tbody         items = []         t in titles:             item = mymarketitem()             item["price"] = t.xpath('.//span[@style="color:red;"]/text()').extract()[0]             items.append(item)          return items

Search This Blog

And

python 2.7 - Trying to extract from the deep node with scrapy, results are bad -

Comments

Post a Comment

Popular posts from this blog

google app engine - 403 Forbidden POST - Flask WTForms -

Android layout hidden on keyboard show -

Parse xml element into list in Python -