Python,Scrapy, Pipeline: function "process_item" not getting called -


i have simple code, shown below. scraping okay, can see print statements generating correct data. in pipeline,initialization working fine. however, process_item function not getting called, print statement @ start of function never executed.

spider: comosham.py

import scrapy scrapy.spider import spider scrapy.selector import selector scrapy.http import request activityadvisor.items import comoshamlocation activityadvisor.items import comoshamactivity activityadvisor.items import comoshamrates import re   class comosham(spider):     name = "comosham"     allowed_domains = ["www.comoshambhala.com"]     start_urls = [         "http://www.comoshambhala.com/singapore/classes/schedules",         "http://www.comoshambhala.com/singapore/about/location-contact",         "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes",         "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"     ]      def parse(self, response):           category = (response.url)[39:44]         print 'in parse'         if category == 'class':             pass             """self.gen_req_class(response)"""         elif category == 'about':             print 'about call parse_location'             self.parse_location(response)         elif category == 'rates':             pass             """self.parse_rates(response)"""         else:             print 'cant find appropriate category! check check check!! raising level 5 alarm - moron :d'       def parse_location(self, response):         print 'in parse_location'                item = comoshamlocation()         item['category'] = 'location'         loc = selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()         item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]         item['pin'] = (loc[5])[11:18]         item['phone'] = (loc[9])[6:20]         item['fax'] = (loc[10])[6:20]         item['email'] = loc[12]         print item['address'],item['pin'],item['phone'],item['fax'],item['email']         return item 

items file:

import scrapy scrapy.item import item, field  class comoshamlocation(item):     address = field()     pin = field()     phone = field()     fax = field()     email = field()     category = field() 

pipeline file:

class comoshampipeline(object):     def __init__(self):         self.locationdump = csv.writer(open('./scraped data/comosham/comoshamlocation.csv','wb'))         self.locationdump.writerow(['address','pin','phone','fax','email'])       def process_item(self,item,spider):         print 'processing item now'         if item['category'] == 'location':             print item['address'],item['pin'],item['phone'],item['fax'],item['email']             self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']])         else:             pass 

your problem never yielding item. parse_location returns item parse, parse never yields item.

the solution replace:

self.parse_location(response) 

with

yield self.parse_location(response) 

more specifically, process_item never gets called if no items yielded.


Comments