Problems Writing Scraped Data To Csv With Slavic Characters (UnicodeEncodeError & TypeError)
Intention / Wanted result: To scrape the link titles (i.e. the text of the links with each item) from a Czech website: https://www.bezrealitky.cz/vypis/nabidka-prodej/byt/praha An
Solution 1:
What I use in order to scrape Czech websites and avoid this errors is unidecode module. What this module does is an ASCII transliterations of Unicode text.
# -*- coding: utf-8 -*-
from unidecode import unidecode
class BezrealitkySpider(scrapy.Spider):
name = 'bezrealitky'
start_urls = [
'https://www.bezrealitky.cz/vypis/nabidka-prodej/byt/praha'
]
def parse(self, response):
item = BezrealitkyItem()
items = []
for records in response.xpath('//*[starts-with(@class,"record")]'):
item['title'] = unidecode(response.xpath('.//div[@class="details"]/h2/a[@href]/text()').extract()[1].encode('utf-8'))
items.append(item)
return(items)
Since I use an ItemLoader my code look kind of like this:
# -*- coding: utf-8 -*-
from scrapy.loader import ItemLoader
class BaseItemLoader(ItemLoader):
title_in = MapCompose(unidecode)
Post a Comment for "Problems Writing Scraped Data To Csv With Slavic Characters (UnicodeEncodeError & TypeError)"