i try find in string substring meet condition.
let's we've got string:
s = 'some text 1a 2a 3 xx sometext 1b yyy text 2b.' i need apply search pattern {(one (group of words), 2 (another group of words), 3 (another group of words)), word}. first 3 positions optional, there should @ least 1 of them. if so, need word after them. output should be:
2a 1a 3 xx 1b yyy 2b i wrote expression:
find_it = re.compile(r"((?p<one>\b1a\s|\b1b\s)|" + r"(?p<two>\b2a\s|\b2b\s)|" + r"(?p<three>\b3\s|\b3b\s))+" + r"(?p<word>\w+)?") every group contain set or different words (not 1a, 1b). , can't mix them 1 group. should none if group empty. result wrong.
find_it.findall(s) > 2a 1a 2a 3 xx > 1b 1b yyy i grateful help!
you can use following regex :
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s?)+(?:\w+|\.))') >>> reg.findall(s) ['1a 2a 3 xx', '1b yyy', '2b.'] here concise regex using character class , modifier ?.the following regex contain 2 part :
[12][ab]|3b? [12][ab] match 1a,1b,2a,2b , 3b? match 3b , 3.
and if don't want dot @ end of 2b can use following regex using positive ahead more general preceding regex (because making \s optional not idea in first group):
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s)+\w+|(?:(?:[12][ab]|3b?))+(?=\.|$))') >>> reg.findall(s) ['1a 2a 3 xx', '1b yyy', '2b'] also if numbers , example substrings instances can use [0-9][a-z] general regex :
>>> reg=re.compile('((?:[0-9][a-z]?\s)+\w+|(?:[0-9][a-z]?)+(?=\.|$))') >>> reg.findall(s) ['1a 2a 3 xx', '1b yyy', '5h 9 7y examole', '2b']
Comments
Post a Comment