i trying search exact words in file. read file lines , loop through lines find exact words. in keyword not suitable finding exact words, using regex pattern.
def findword(w): return re.compile(r'\b({0})\b'.format(w), flags=re.ignorecase).search the problem function is doesn't recognizes square brackets [xyz].
for example
findword('data_var_cod[0]')('cod_byte1 = data_var_cod[0]') returns none whereas
findword('data_var_cod')('cod_byte1 = data_var_cod') returns <_sre.sre_match object @ 0x0000000015622288>
can please me tweak regex pattern?
it's because of regex engine assume square brackets character class regex characters ride of problem need escape regex characters. can use re.escape function :
def findword(w): return re.compile(r'\b({0})\b'.format(re.escape(w)), flags=re.ignorecase).search also more pythonic way fro matches can use re.fildall() returns list of matches or re.finditer returns iterator contains matchobjects.
but still way not complete , efficient because when using word boundary inner word must contains 1 type characters.
>>> ss = 'hello string [processing] in python.' >>>re.compile(r'\b({0})\b'.format(re.escape('[processing]')),flags=re.ignorecase).search(ss) >>> >>>re.compile(r'({})'.format(re.escape('[processing]')),flags=re.ignorecase).search(ss).group(0) '[processing]' so suggest remove word boundaries if words contains none word characters.
but more general way can use following regex use positive around match words surround space or come @ end of string or leading:
r'(?: |^)({})(?=[. ]|$) '
Comments
Post a Comment