Html Parsing Using Bs4
Solution 1:
The <p>
tag you're looking for is very common in the document, and it doesn't have any unique attributes, so we can't select it directly.
A possible solution would be to select the tag by index, as in bloopiebloopie's answer. However that won't work unless you know the exact position of the tag.
Another possible solution would be to find a neighbouring tag that has distinguishing attributes/text and select our tag in relation to that.
In this case we can find the previous tag with text: "Maps & Images", and use find_next
to select the next tag.
import requests
from bs4 importBeautifulSoupurl='http://www.fortwiki.com/Battery_Adair'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
b = soup.find('b', text='Maps & Images')
if b:
lat_long = b.find_next().text
This method should find the coordinates data in any www.fortwiki.com page that has a map.
Solution 2:
You can use re
to match partial text inside a tag.
import re
import requests
from bs4 import BeautifulSoup
url = 'http://www.fortwiki.com/Battery_Adair'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
lat_long = soup.find('p', text=re.compile('Lat:\s\d+\.\d+\sLong:')).text
print(lat_long)
# Lat: 24.5477038 Long: -81.8104541
Solution 3:
I am not exactly sure what you want but this works for me. There are probably neeter ways of doing it. I am new to python
soup = BeautifulSoup(requests.get("http://www.fortwiki.com/Battery_Adair").content, "html.parser")
x = soup.find("div", id="mw-content-text").find("table").find_all("p")[8]
x = x.get_text()
x = x.split("Long:")
lat = x[0].split(" ")[1]
long = x[1]
print("LAT = " + lat)
# LAT = 24.5477038
print("LNG = " + long)
# LNG = -81.8104541
Post a Comment for "Html Parsing Using Bs4"