i have simple code, shown below. scraping okay, can see print statements generating correct data. in pipeline,initialization working fine. however, process_item function not getting called, print statement @ start of function never executed.
spider: comosham.py
import scrapy scrapy.spider import spider scrapy.selector import selector scrapy.http import request activityadvisor.items import comoshamlocation activityadvisor.items import comoshamactivity activityadvisor.items import comoshamrates import re class comosham(spider): name = "comosham" allowed_domains = ["www.comoshambhala.com"] start_urls = [ "http://www.comoshambhala.com/singapore/classes/schedules", "http://www.comoshambhala.com/singapore/about/location-contact", "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes", "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes" ] def parse(self, response): category = (response.url)[39:44] print 'in parse' if category == 'class': pass """self.gen_req_class(response)""" elif category == 'about': print 'about call parse_location' self.parse_location(response) elif category == 'rates': pass """self.parse_rates(response)""" else: print 'cant find appropriate category! check check check!! raising level 5 alarm - moron :d' def parse_location(self, response): print 'in parse_location' item = comoshamlocation() item['category'] = 'location' loc = selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract() item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11] item['pin'] = (loc[5])[11:18] item['phone'] = (loc[9])[6:20] item['fax'] = (loc[10])[6:20] item['email'] = loc[12] print item['address'],item['pin'],item['phone'],item['fax'],item['email'] return item items file:
import scrapy scrapy.item import item, field class comoshamlocation(item): address = field() pin = field() phone = field() fax = field() email = field() category = field() pipeline file:
class comoshampipeline(object): def __init__(self): self.locationdump = csv.writer(open('./scraped data/comosham/comoshamlocation.csv','wb')) self.locationdump.writerow(['address','pin','phone','fax','email']) def process_item(self,item,spider): print 'processing item now' if item['category'] == 'location': print item['address'],item['pin'],item['phone'],item['fax'],item['email'] self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']]) else: pass
your problem never yielding item. parse_location returns item parse, parse never yields item.
the solution replace:
self.parse_location(response) with
yield self.parse_location(response) more specifically, process_item never gets called if no items yielded.
Comments
Post a Comment