Python Parse File For Ip Addresses
Solution 1:
The $
anchor in your expression is preventing you from finding anything but the last entry. Remove that, then use the list returned by .findall()
:
found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text)
ips.extend(found)
re.findall()
will always return a list, which could be empty.
- if you only want unique addresses, use a set instead of a list.
- If you need to validate IP addresses (including ignoring private-use networks and local addresses), consider using the
ipaddress.IPV4Address()
class.
Solution 2:
The findall function returns an array of matches, you aren't iterating through each match.
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex isnotNone:
for match in regex:
if match notin ips:
ips.append(match)
Solution 3:
Extracting IP Addresses From File
I answered a similar question in this discussion. In short, it's a solution based on one of my ongoing projects for extracting Network and Host Based Indicators from different types of input data (e.g. string, file, blog posting, etc.): https://github.com/JohnnyWachter/intel
I would import the IPAddresses and Data classes, then use them to accomplish your task in the following manner:
#!/usr/bin/env/python"""Extract IPv4 Addresses From Input File."""from Data import CleanData # Format and Clean the Input Data.from IPAddresses import ExtractIPs # Extract IPs From Input Data.defget_ip_addresses(input_file_path):
""""
Read contents of input file and extract IPv4 Addresses.
:param iput_file_path: fully qualified path to input file. Expecting str
:returns: dictionary of IPv4 and IPv4-like Address lists
:rtype: dict
"""
input_data = [] # Empty list to house formatted input data.
input_data.extend(CleanData(input_file_path).to_list())
results = ExtractIPs(input_data).get_ipv4_results()
return results
Now that you have a dictionary of lists, you can easily access the data you want and output it in whatever way you desire. The below example makes use of the above function; printing the results to console, and writing them to a specified output file:
# Extract the desired data using the aforementioned function. ipv4_list = get_ip_addresses('/path/to/input/file') # Open your output file in 'append' mode.withopen('/path/to/output/file', 'a') as outfile: # Ensure that the list of valid IPv4 Addresses is not empty.if ipv4_list['valid_ips']: for ip_address in ipv4_list['valid_ips']: # Print to consoleprint(ip_address) # Write to output file. outfile.write(ip_address)
Solution 4:
Without re.MULTILINE
flag $
matches only at the end of string.
To make debugging easier split the code into several parts that you could test independently.
defextract_ips(data):
return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data)
the regex filters out some valid ips e.g.,
2130706433
, "1::1".And in reverse, the regex matches invalid strings e.g.,
999.999.999.999
. You could validate an ip string usingsocket.inet_aton()
or more generalsocket.inet_pton()
. You could even split the input into pieces without searching for ip and use these functions to keep valid ips.
If input file is small and you don't need to preserve original order of ips:
withopen(filename) as infile, open(outfilename, "w") as outfile:
outfile.write("\n".join(set(extract_ips(infile.read()))))
Otherwise:
withopen(filename) as infile, open(outfilename, "w") as outfile:
seen = set()
for line in infile:
for ip in extract_ips(line):
if ip notin seen:
seen.add(ip)
print >>outfile, ip
Post a Comment for "Python Parse File For Ip Addresses"