i new scrapy, have base spider, similar example below:
class myspider(scrapy.spider): name = 'myspider' allowed_domains = ['example.com'] #the domain spider allowed crawl start_urls = ['http://www.example.com/content/'] #url spider start crawling page_incr = 1 flag = 0 def parse(self, response): sel=selector(response) stuffs = sel.xpath('//a/@href') stuff in stuffs: link = stuff.extract() req1 = request(url=link, callback=self.parse_item) yield req1 url = 'http://www.example.com/content/?q=ajax//date/%d&page=%d' % (self.page_incr, self.page_incr) req2 = request(url=url, headers={"referer": "http://www.example.com/content", "x-requested-with": "xmlhttprequest"}, callback=self.parse_xhr) yield req2 def parse_xhr(self, response): sel=selector(response) stuffs = sel.xpath('//a/@href') stuff in stuffs: link = stuff.extract() yield request(url=link, callback=self.parse_item) content = sel.xpath('//a/@href').extract() if content == []: self.flag +=1 if self.flag == 5: raise closespider('warning: <spider forced stop>') else: self.flag = 0 self.page_incr +=1 url = 'http://www.example.com/content/?q=ajax//date/%d&page=%d' % (self.page_incr, self.page_incr) req3 = request(url=url, headers={"referer": "http://www.example.com/content", "x-requested-with": "xmlhttprequest"}, callback=self.parse_xhr) yield req3 def parse_item(self, response): pass when try set crawl there error, this:
line 24, in parse req1 = request(url=link, callback=self.parse_item) exceptions.attributeerror: 'myspider' object has no attribute 'parse_item' i not getting it... please me seeing wrong! time , help.
your parse_item() method incorrectly indented (with 5 spaces instead of 4).
Comments
Post a Comment