What Is Efficient Way To Check If Current Word Is Close To A Word In String?
Solution 1:
There are a lot of ways to approach this. This one solves all of your examples. I added a minimum similarity filter to return only the higher quality matches. This is what allows the 'ly' to be dropped in the last sample, as it is not all that close any any of the words.
You can install levenshtein with pip install python-Levenshtein
import Levenshtein
def find_match(str1,str2):
min_similarity = .75
output = []
results = [[Levenshtein.jaro_winkler(x,y) for x in str1.split()] for y in str2.split()]
for x in results:
if max(x) >= min_similarity:
output.append(str1.split()[x.index(max(x))])
return output
Each sample you proposed.
find_match("is looking good", "looks goo")
['looking','good']find_match("you are really looking good", "lok goo")
['looking','good']find_match("Stu is actually SEVERLY sunburnt....it hurts!!!", "hurts!!")
['hurts!!!']find_match("you guys were absolutely amazing tonight, a...", "ly amazin")
['amazing']
Solution 2:
Like this:
str1 = "wow...it looks amazing"
str2 = "looks amazi"
str3 = []
# Checking for similar strings in both strings:for n in str1.split():
for m in str2.split():
if m in n:
str3.append(n)
# If found 2 similar strings:if len(str3) == 2:
# If their indexes align:if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
print(' '.join(str3))
elif len(str3) == 1:
print(str3[0])
Output:
looks amazing
UPDATE with condition given by the OP:
str1 = "good..."
str2 = "god.."
str3 = []
# Checking for similar strings in both strings:for n in str1.split():
for m in str2.split():
# Calculating matching character in the 2 words:
c = ''for i in m:
if i in n:
c+=i
# If the amount of matching characters is greater or equal to 50% the length of the larger word# or the smaller word is in the larger word:iflen(list(c)) >= len(n)*0.50or m in n:
str3.append(n)
# If found 2 similar strings:iflen(str3) == 2:
# If their indexes align:if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
print(' '.join(str3))
eliflen(str3) == 1:
print(str3[0])
Solution 3:
I made through it with regular expressions
defcheck_regex(str1,str2):
#New list to store the updated value
str_new = []
for i in str2:
# regular expression for comparing the strings
x = ['['+i+']','^'+i,i+'$','('+i+')']
for k in x:
h=0for j in str1:
#Conditions to make sure the word is close enough to the particular wordif"".join(re.findall(k,j)) == i or ("".join(re.findall(k,j)) in i andabs(len("".join(re.findall(k,j)))-len(i)) == 1andlen(i)!=2):
str_new.append(j)
h=1breakif h==1:
breakreturn str_new
import re
str1 = input().split()
str2 = input().split()
print(" ".join(check_regex(str1,str2)))
Solution 4:
You can use Jacard coefficient in this case. First, you need to split your first and second string by space. After that, for every string in str2, take Jacard coefficient with every string in str1, then replace with which that gives you the highest Jacard coefficient.
You can use sklearn.metrics.jaccard_score
.
Post a Comment for "What Is Efficient Way To Check If Current Word Is Close To A Word In String?"