Skip to content Skip to sidebar Skip to footer

How To Resolve Requests Get Not Working Over Vpn?

I am trying to scrape a website using requests in python. url = 'https://stackoverflow.com/questions/23013220/max-retries-exceeded-with-url' # set the headers like we are a browse

Solution 1:

In my case, the problem was related to IPv6.

Our VPN used split tunneling, and it seems the VPN configuration does not support IPv6.

So for example this would hang forever:

requests.get('https://pokeapi.co/api/v2/pokemon')

But if you add a timeout, the request succeeds:

requests.get('https://pokeapi.co/api/v2/pokemon', timeout=1)

But not all machines were having this problem. So I compared the output of this among two different machines:

import socket

for line in socket.getaddrinfo('pokeapi.co', 443):
    print(line)

The working one only returned IPv4 addresses. The non-working machine returned both IPv4 and IPv6 addresses.

So with the timeout specified, my theory is that python fails quickly with IPv6 and then moves to IPv4, where the request succeeds.

Ultimately we resolved this by disabling IPv6 on the machine:

networksetup -setv6off "Wi-Fi"

But I assume that this could instead be resolved through VPN configuration.

Solution 2:

How about trying like this:

url = "https://stackoverflow.com/questions/23013220/max-retries-exceeded-with-url"ua = UserAgent()
headers = headers = {"User-Agent": ua.random}

# download the homepages = requests.Session()
s.trust_env = Falseresponse = s.get(url, headers=headers)

It seems to be caused by UserAgent() settings difference.

Solution 3:

Try to set trust_env = None

trust_env = None # Trust environment settings for proxy configuration, default authentication and similar.

Or you can disable proxies for a particular domain. The question

import osos.environ['NO_PROXY'] = 'stackoverflow.com'

Solution 4:

In my organization, I have to run my program under VPN for different geo locations. so we have multiple proxy configurations.

I found it simpler to use a package called PyPAC to get my proxy details automatically

from pypac import PACSession
from requests.auth import HTTPProxyAuth
session = PACSession()
# when the username and password is required# session = PACSession(proxy_auth=HTTPProxyAuth(name, password)) 

r = session.get('http://example.org')

How does this work:

The package locates the PAC file which is configured by the organization. This file consist of proxy configuration detail (more info).

Post a Comment for "How To Resolve Requests Get Not Working Over Vpn?"