Python Bs4 Scraper Only Returning First 9 Results From Each Page
I got this code set up working as intended - only it's not quite working as intended... Everything seemed to be going great until I checked my csv output file and noticed that I'm
Solution 1:
Be informed that you hold the full responsibility for scraping
zillow
, This is a technical answer for vision manner as I've been warned by site dev before :).
import requests
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Pragma": "no-cache",
"Cache-Control": "no-cache"
}
defmain(url):
with requests.Session() as req:
req.headers.update(headers)
req.head('https://www.zillow.com/')
for item inrange(1, 2):
# item can be used here to loop by refactoring `cat1` to be `cat2` and so on
params = {
"searchQueryState": '{"pagination":{"currentPage":2},"usersSearchTerm":"Orange County, CA","mapBounds":{"west":-118.84559473828126,"east":-116.68678126171876,"south":33.34208982842918,"north":33.99173886991076},"regionSelection":[{"regionId":1286,"regionType":4}],"isMapVisible":true,"filterState":{"isAllHomes":{"value":true},"sortSelection":{"value":"globalrelevanceex"}},"isListVisible":true,"mapZoom":9}',
"wants": '{"cat1":["mapResults"]}'
}
r = req.get(url, params=params)
df = pd.DataFrame(r.json()['cat1']['searchResults']['mapResults'])
print(df)
df.to_csv('data.csv', index=False)
main('https://www.zillow.com/search/GetSearchPageState.htm')
Output:
zpid price ... streetViewMetadataURL streetViewURL
0 25608235 $990,900 ... NaN NaN
1 25586987 $1,070,100 ... NaN NaN
2 25154858 $681,100 ... NaN NaN
3 25486269 $834,200 ... NaN NaN
4 25762795 $696,900 ... NaN NaN
.. ... ... ... ... ...
495 25538170 $975,000 ... NaN NaN
496 25622055 $575,000 ... NaN NaN
497 25657278 $649,900 ... NaN NaN
498 63114426 $1,578,000 ... NaN NaN
499 25643107 $89,900 ... NaN NaN
[500 rows x 40 columns]
Post a Comment for "Python Bs4 Scraper Only Returning First 9 Results From Each Page"