Skip to content Skip to sidebar Skip to footer

Problems Writing Scraped Data To Csv With Slavic Characters (UnicodeEncodeError & TypeError)

Intention / Wanted result: To scrape the link titles (i.e. the text of the links with each item) from a Czech website: https://www.bezrealitky.cz/vypis/nabidka-prodej/byt/praha An

Solution 1:

What I use in order to scrape Czech websites and avoid this errors is unidecode module. What this module does is an ASCII transliterations of Unicode text.

# -*- coding: utf-8 -*-
from unidecode import unidecode

class BezrealitkySpider(scrapy.Spider):
    name = 'bezrealitky'
    start_urls = [
        'https://www.bezrealitky.cz/vypis/nabidka-prodej/byt/praha'
    ]
    def parse(self, response):
        item = BezrealitkyItem()
        items = []
        for records in response.xpath('//*[starts-with(@class,"record")]'):
            item['title'] = unidecode(response.xpath('.//div[@class="details"]/h2/a[@href]/text()').extract()[1].encode('utf-8'))
            items.append(item)
        return(items)

Since I use an ItemLoader my code look kind of like this:

# -*- coding: utf-8 -*-
from scrapy.loader import ItemLoader

class BaseItemLoader(ItemLoader):
    title_in = MapCompose(unidecode)

Post a Comment for "Problems Writing Scraped Data To Csv With Slavic Characters (UnicodeEncodeError & TypeError)"