Skip to content Skip to sidebar Skip to footer

Html Parsing Using Bs4

I am parsing an HTMl page and am having a hard time figuring out how to pull a certain 'p' tag without a class or on id. I am trying to reach the tag of 'p' with the lat and long.

Solution 1:

The <p> tag you're looking for is very common in the document, and it doesn't have any unique attributes, so we can't select it directly.

A possible solution would be to select the tag by index, as in bloopiebloopie's answer. However that won't work unless you know the exact position of the tag.

Another possible solution would be to find a neighbouring tag that has distinguishing attributes/text and select our tag in relation to that. In this case we can find the previous tag with text: "Maps & Images", and use find_next to select the next tag.

import requests
from bs4 importBeautifulSoupurl='http://www.fortwiki.com/Battery_Adair'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

b = soup.find('b', text='Maps & Images')
if b:
    lat_long = b.find_next().text

This method should find the coordinates data in any www.fortwiki.com page that has a map.

Solution 2:

You can use re to match partial text inside a tag.

import re
import requests
from bs4 import BeautifulSoup

url = 'http://www.fortwiki.com/Battery_Adair'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

lat_long = soup.find('p', text=re.compile('Lat:\s\d+\.\d+\sLong:')).text
print(lat_long)
# Lat: 24.5477038 Long: -81.8104541

Solution 3:

I am not exactly sure what you want but this works for me. There are probably neeter ways of doing it. I am new to python

soup = BeautifulSoup(requests.get("http://www.fortwiki.com/Battery_Adair").content, "html.parser")
x = soup.find("div", id="mw-content-text").find("table").find_all("p")[8]
x = x.get_text()
x = x.split("Long:")
lat = x[0].split(" ")[1]
long = x[1]
print("LAT = " + lat)
# LAT = 24.5477038 
print("LNG = " + long)
# LNG = -81.8104541

Post a Comment for "Html Parsing Using Bs4"