beginner programmer here, still learning lot. right working large text file, , want @ frequency of characters different chunks of text. example, how character "a" , "b" appear in text[0:600] verses [600:1200] versus [1200:1800], etc. right know how print text[0:600], don't know how write syntax tell python "a" , "b" in chunk of text.
i thinking maybe best way write like, "for each of these chunks have, tell me frequency counts of 'a' , 'b'." seem doable?
thank much!
here have far, if want see. simple:
f = open('text.txt') fa = f.read() fa = fa.lower() corn = re.sub(r'chr', '', fa) #delete chromosome title potato = re.sub(r'[^atcg]', '', corn) #delete other characters print potato[0:50]
you know how split text. general case is:
interval = 600 chunks = [text[idx:idx+interval] idx in range(0, len(text), interval)] and count occurrences of sub-string (this case a) in string:
term = 'a' term_counts = [chunk.count(term) chunk in chunks] # zip them make nicer (not zip returns iterator python 3.4) chunks_with_counts = zip(chunks, term_counts) example:
>>> text = "the quick brown fox jumps on lazy dog" >>> interval = 3 >>> chunks = [text[idx:idx+interval] idx in range(0, len(text), interval)] >>> chunks ['the', ' qu', 'ick', ' br', 'own', ' fo', 'x j', 'ump', 's o', 'ver', ' th', 'e l', 'azy', ' do', 'g'] >>> term='o' >>> term_counts = [chunk.count(term) chunk in chunks] >>> term_counts [0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0] >>> chunks_with_counts = zip(chunks, term_counts) >>> list(chunks_with_counts) [('the', 0), (' qu', 0), ('ick', 0), (' br', 0), ('own', 1), (' fo', 1), ('x j', 0), ('ump', 0), ('s o', 1), ('ver', 0), (' th', 0), ('e l', 0), ('azy', 0), (' do', 1), ('g', 0)]
Comments
Post a Comment