Skip to content Skip to sidebar Skip to footer

Extract The Name And Span Of Regex Matched Groups

I have a regex that looks like: rgx = '(?PABC)(?PDEF)?(?PHIJK)' Getting the matched string is no problem m.group(name). However, I need to extra

Solution 1:

You iterate over the names of the matched groups (the keys of groupdict) and print the corresponding span attribute:

rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'
p = re.compile(rgx, re.IGNORECASE)
m = re.match(p, 'ABCDEFHIJKLM')forkeyin m.groupdict():
    print key, m.span(key)

This prints:

foo (0, 3)
bar (3, 6)
norf (6, 10)

Edit: Since the keys of a dictionary are unordered, you may wish to explicitly choose the order in which the keys are iterated over. In the example below, sorted(...) is a list of the group names sorted by the corresponding dictionary value (the span tuple):

forkeyin sorted(m.groupdict().keys(), key=m.groupdict().get):
    print key, m.span(key)

Solution 2:

You can use RegexObject.groupindex:

p = re.compile(rgx, re.IGNORECASE)
m = p.match('ABCDEFHIJK')

for name, n insorted(m.re.groupindex.items(), key=lambda x: x[1]):
    print name, m.group(n), m.span(n)

Post a Comment for "Extract The Name And Span Of Regex Matched Groups"