Skip to content Skip to sidebar Skip to footer

Python Bs4 Scraper Only Returning First 9 Results From Each Page

I got this code set up working as intended - only it's not quite working as intended... Everything seemed to be going great until I checked my csv output file and noticed that I'm

Solution 1:

Be informed that you hold the full responsibility for scraping zillow, This is a technical answer for vision manner as I've been warned by site dev before :).

import requests
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache"
}


defmain(url):
    with requests.Session() as req:
        req.headers.update(headers)
        req.head('https://www.zillow.com/')
        for item inrange(1, 2):
            # item can be used here to loop by refactoring `cat1` to be `cat2` and so on
            params = {
                "searchQueryState": '{"pagination":{"currentPage":2},"usersSearchTerm":"Orange County, CA","mapBounds":{"west":-118.84559473828126,"east":-116.68678126171876,"south":33.34208982842918,"north":33.99173886991076},"regionSelection":[{"regionId":1286,"regionType":4}],"isMapVisible":true,"filterState":{"isAllHomes":{"value":true},"sortSelection":{"value":"globalrelevanceex"}},"isListVisible":true,"mapZoom":9}',
                "wants": '{"cat1":["mapResults"]}'
            }
            r = req.get(url, params=params)
            df = pd.DataFrame(r.json()['cat1']['searchResults']['mapResults'])
            print(df)
            df.to_csv('data.csv', index=False)


main('https://www.zillow.com/search/GetSearchPageState.htm')

Output:

         zpid       price  ... streetViewMetadataURL  streetViewURL
0    25608235    $990,900  ...                   NaN            NaN
1    25586987  $1,070,100  ...                   NaN            NaN
2    25154858    $681,100  ...                   NaN            NaN
3    25486269    $834,200  ...                   NaN            NaN
4    25762795    $696,900  ...                   NaN            NaN
..        ...         ...  ...                   ...            ...
495  25538170    $975,000  ...                   NaN            NaN
496  25622055    $575,000  ...                   NaN            NaN
497  25657278    $649,900  ...                   NaN            NaN
498  63114426  $1,578,000  ...                   NaN            NaN
499  25643107     $89,900  ...                   NaN            NaN

[500 rows x 40 columns]

Post a Comment for "Python Bs4 Scraper Only Returning First 9 Results From Each Page"