Skip to content Skip to sidebar Skip to footer

Unable To Generate Csv For Next Pages Using Scrapy

I am newbie to python and scrapy Here is my code to get all the productname,price,image,title from all the next pages import scrapy class TestSpider(scrapy.Spider): na

Solution 1:

Your issue is below line

yield scrapy.Request(next_page, callback=self.parse)

The URL is coming as relative URL. So you should use

yield response.follow(next_page, callback=self.parse)

This will automatically resolve relative URLs

Edit-1

Just realized that your are browsing individual pages and you just need to extract the data from the results page. So your parse_post function is not needed at all. Below is how you need to do it

class TestSpider(scrapy.Spider):
    name = "testdoc1"
    allowed_domains = ['amazon.in']
    start_urls = [
        "https://www.amazon.in/s/ref=amb_link_46?ie=UTF8&bbn=1389432031&rh=i%3Aelectronics%2Cn%3A976419031%2Cn%3A%21976420031%2Cn%3A1389401031%2Cn%3A1389432031%2Cp_89%3AApple&pf_rd_m=A1VBAL9TL5WCBF&pf_rd_s=merchandised-search-leftnav&pf_rd_r=CYS25V3W021MSYPQ32FB&pf_rd_r=CYS25V3W021MSYPQ32FB&pf_rd_t=101&pf_rd_p=1ce3e975-c6e8-479a-8485-2e490b9f58a9&pf_rd_p=1ce3e975-c6e8-479a-8485-2e490b9f58a9&pf_rd_i=1389401031"]

    def parse(self, response):
        for post in response.css('li.s-result-item'):
            item = dict()
            item['Name'] = post.xpath(
                './/h2[contains(@class,"a-size-base s-inline  s-access-title  a-text-normal")]/text()').extract()
            item['Price'] = post.xpath(
                './/span[contains(@class,"a-size-base a-color-price s-price a-text-bold")]/text()').extract()
            item['Image'] = post.xpath('.//img[contains(@class,"s-access-image cfMarker")]/@src').extract()
            item['Link'] = post.xpath(
                './/a[contains(@class,"a-link-normal s-access-detail-page  s-color-twister-title-link a-text-normal")]/@href').extract()
            yield item


        # Checks if the main page has a link to next page if True keep parsing.
        next_page = response.xpath('(//a[@class="pagnNext"])[1]/@href').extract_first()
        if next_page:
            yield response.follow(next_page, callback=self.parse)

Post a Comment for "Unable To Generate Csv For Next Pages Using Scrapy"