Skip to content Skip to sidebar Skip to footer

Matching Strings Between Two Characters

I answered a question the other day about finding the strings that occur between two specified characters. I ended up with this fairly basic regular expression: >>> impor

Solution 1:

I'm gonna explain why your one worked like that. For an overlapped matching, please see the answer already provided by cᴏʟᴅsᴘᴇᴇᴅ using the regex module's findall method with overlapped=True keyword argument.


Your one matches like that because the space at the Regex pattern start, matches the first space in the input, and then the non-greedy quantifier .*? matches the minimal between that space and next (. So, it is operating correctly. To better understand it, make the input string here is an example()another example().

Now, to get the shortest match in this case, you can use the zero-with negative lookahead to ensure that there is no space in between:

 (?!.* )(.*?)\(

So:

In [81]: re.findall(r' (?!.* )(.*?)\(', 'here is an example()')
Out[81]: ['example']

Solution 2:

You're looking for overlapping regex matching. Simply put, this is not easy to do with the default regex engine in python.

You can, however, use the regex module (pip install it first). Call regex.findall and set overlapped=True.

import regex 

a, b = ' ', '('
text = 'here is an example()'

regex.findall('{}(.*?){}'.format(*map(re.escape, (a, b))), text, overlapped=True)
['is an example', 'an example', 'example']

Post a Comment for "Matching Strings Between Two Characters"