regex - Find all substrings with at least one group -


i try find in string substring meet condition.

let's we've got string:

s = 'some text 1a 2a 3 xx sometext 1b yyy text 2b.' 

i need apply search pattern {(one (group of words), 2 (another group of words), 3 (another group of words)), word}. first 3 positions optional, there should @ least 1 of them. if so, need word after them. output should be:

2a  1a  3 xx 1b  yyy 2b  

i wrote expression:

find_it = re.compile(r"((?p<one>\b1a\s|\b1b\s)|" +                     r"(?p<two>\b2a\s|\b2b\s)|" +                     r"(?p<three>\b3\s|\b3b\s))+" +                     r"(?p<word>\w+)?") 

every group contain set or different words (not 1a, 1b). , can't mix them 1 group. should none if group empty. result wrong.

find_it.findall(s) > 2a  1a  2a   3 xx > 1b  1b    yyy 

i grateful help!

you can use following regex :

>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s?)+(?:\w+|\.))') >>> reg.findall(s) ['1a 2a 3 xx', '1b yyy', '2b.'] 

here concise regex using character class , modifier ?.the following regex contain 2 part :

[12][ab]|3b? 

[12][ab] match 1a,1b,2a,2b , 3b? match 3b , 3.

and if don't want dot @ end of 2b can use following regex using positive ahead more general preceding regex (because making \s optional not idea in first group):

>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s)+\w+|(?:(?:[12][ab]|3b?))+(?=\.|$))') >>> reg.findall(s) ['1a 2a 3 xx', '1b yyy', '2b'] 

also if numbers , example substrings instances can use [0-9][a-z] general regex :

>>> reg=re.compile('((?:[0-9][a-z]?\s)+\w+|(?:[0-9][a-z]?)+(?=\.|$))') >>> reg.findall(s) ['1a 2a 3 xx', '1b yyy', '5h 9 7y examole', '2b'] 

Comments