Urllib2 Returns A Different Page The Browser Does?

June 22, 2024 Post a Comment

I'm trying to scrape a page (my router's admin page) but the device seems to be serving a different page to urllib2 than to my browser. has anyone found this before? How can I get

Solution 1:

With firebug watch what headers and cookies are sent to server. Then with urllib2.Request and cookielib emulate the same request.

EDIT: Also you can use mechanize.

Solution 2:

Simpler than Wireshark may be to use Firebug to see the form of the request being made, and then emulating the same in your code.

Solution 3:

Use Wireshark to see what your browser's request looks like, and add the missing parts so that your request looks the same.

To tweak urllib2 headers, try this.

Solution 4:

Probably this isn't working because you haven't supplied credentials for the admin page

Use mechanize to load the login page and fill out the username/password.

Then you should have a cookie set to allow you to continue to the admin page.

It is much harder using just urllib2. You will need to manage the cookies yourself if you choose to stick to that route.

Solution 5:

in my case it was one of the following:

1) The website vould understood that the access was not from a browser, so i had to fake a browser in python like that:

# Build a opener to fake a browser... Google here I come!opener = urllib2.build_opener()
# To fake the browseropener.addheaders = [('User-agent', 'Mozilla/5.0')]
#Read the pagesoup = BeautifulSoup(opener.open(url).read())

2) The content of the page was filled dynamically by javascript. In that case read the following post: https://stackoverflow.com/a/11460633/2160507

Python Dummy