Feed Dataframe With Webscraping

November 26, 2023 Post a Comment

I'mt trying to append some scraped values to a dataframe. I have this code: import time import requests import pandas import pandas as pd from bs4 import BeautifulSoup from seleni

Solution 1:

The main problem you have are locators. 1 First, compare the locators I use and the ones in your code. 2 Second, Add explicit waits from selenium.webdriver.support import expected_conditions as EC 3 Third, remove unnecessary code.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
url = "https://www.remax.pt/comprar?searchQueryState={%22regionName%22:%22%22,%22businessType%22:1,%22listingClass%22:1,%22page%22:1,%22sort%22:{%22fieldToSort%22:%22ContractDate%22,%22order%22:1},%22mapIsOpen%22:false,%22listingTypes%22:[],%22prn%22:%22%22}"
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='row results-list ']/div")))
rows = driver.find_elements_by_xpath("//div[@class='row results-list ']/div")
data = []
for row in rows:
    price_p = row.find_element_by_xpath(".//p[@class='listing-price']").text
    address = row.find_element_by_xpath(".//h2[@class='listing-address']").text
    type = row.find_element_by_xpath(".//li[@class='listing-type']").text
    area = row.find_element_by_xpath(".//li[@class='listing-area']").text
    quartos = row.find_element_by_xpath(".//li[@class='listing-bedroom']").text
    data.append([price, address, price_p, area, quartos])
driver.close()
driver.quit()

Please note that I did in on Linux. Your Chrome driver location is different. Also, to print the list use:

for p in data:
    print(p.text, sep='\n')

You can modify it as you like. I received the following output:

['240 000 €', 'Lisboa -  Lisboa, Carnide', 'Apartamento', '54 m\n2', '1']['280 000 €', 'Lisboa -  Lisboa, Beato', 'Apartamento', '80 m\n2', '1']['285 000 €', 'Lisboa -  Lisboa, Beato', 'Apartamento', '83 m\n2', '1']['290 000 €', 'Lisboa -  Lisboa, Beato', 'Apartamento', '85 m\n2', '1']['280 000 €', 'Lisboa -  Lisboa, Beato', 'Apartamento', '80 m\n2', '1']['290 000 €', 'Lisboa -  Lisboa, Beato', 'Apartamento', '85 m\n2', '1']['285 000 €', 'Lisboa -  Lisboa, Beato', 'Apartamento', '83 m\n2', '1']['80 000 €', 'Santarém -  Cartaxo, Ereira e Lapa', 'Terreno', '12440 m\n2', '1']['260 000 €', 'Lisboa -  Sintra, Queluz e Belas', 'Prédio', '454 m\n2', '1']['37 500 €', 'Santarém -  Torres Novas, Torres Novas (Santa Maria, Salvador e Santiago)', 'Prédio', '92 m\n2', '1']['505 000 €', 'Lisboa -  Sintra, Algueirão-Mem Martins', 'Duplex', '357 m\n2', '1']['135 700 €', 'Lisboa -  Mafra, Milharado', 'Terreno', '310 m\n2', '1']['132 800 €', 'Lisboa -  Mafra, Milharado', 'Terreno', '310 m\n2', '1']['133 440 €', 'Lisboa -  Mafra, Milharado', 'Terreno', '310 m\n2', '1']['179 000 €', 'Lisboa -  Mafra, Milharado', 'Terreno', '310 m\n2', '1']['75 000 €', 'Lisboa -  Vila Franca de Xira, Vila Franca de Xira', 'Apartamento', '52 m\n2', '1']['575 000 €', 'Porto -  Matosinhos, Matosinhos e Leça da Palmeira', 'Apartamento', '140 m\n2', '1']['35 000 €', 'Setúbal -  Almada, Caparica e Trafaria', 'Outros - Habitação', '93 m\n2', '1']['550 000 €', 'Leiria -  Alcobaça, Évora de Alcobaça', 'Moradia', '160 m\n2', '1']['550 000 €', 'Lisboa -  Loures, Santa Iria de Azoia, São João da Talha e Bobadela', 'Moradia', '476 m\n2', '1']

Python Dummy

Feed Dataframe With Webscraping

Solution 1:

Post a Comment for "Feed Dataframe With Webscraping"