this code taking more half hour data set of 200000 floats.
import numpy np try: import progressbar pbar = progressbar.progressbar(widgets=[progressbar.percentage(), progressbar.counter('%5d'), progressbar.bar(), progressbar.eta()]) except: pbar = list block_length = np.loadtxt('bb.txt.gz') # data file http://filebin.ca/29lbyfknskqj/bb.txt.gz (2mb, 200000 float numbers) n = len(block_length) - 1 # arrays store best configuration best = np.zeros(n, dtype=float) last = np.zeros(n, dtype=int) log = np.log # start first data cell; add 1 cell @ each iteration r in pbar(range(n)): # compute fit_vec : fitness of putative last block (end @ r) #fit_vec = fitfunc.fitness( t_k = block_length[:r + 1] - block_length[r + 1] #n_k = np.cumsum(x[:r + 1][::-1])[::-1] n_k = np.arange(r + 1, 0, -1) fit_vec = n_k * (log(n_k) - log(t_k)) prior = 4 - log(73.53 * 0.05 * ((r+1) ** -0.478)) a_r = fit_vec - prior #fitfunc.prior(r + 1, n) a_r[1:] += best[:r] i_max = np.argmax(a_r) last[r] = i_max best[r] = a_r[i_max] # find changepoints iteratively peeling off last block change_points = np.zeros(n, dtype=int) i_cp = n ind = n while true: i_cp -= 1 change_points[i_cp] = ind if ind == 0: break ind = last[ind - 1] change_points = change_points[i_cp:] print edges[change_points] # show result the first loop slow because length of arrays r @ every iteration, i.e. increasing, leading n^2 complexity.
is there way optimize code further, e.g. through pre-computation? happy solutions using other programming languages.
i can replicate a_r (up fit-prior step) upper triangular nxn matrix with:
def trilog(n): nn = n[:-1,none]-n[none,1:] nn[np.tril_indices_from(nn,-1)]=1 return nn t_k = trilog(block_length) n_k = trilog(-np.arange(n+1)) fit_vec = n_k * (np.log(n_k) - np.log(t_k)) r = np.arange(n)+1 prior = 4 - log(73.53 * 0.05 * (r ** -0.478)) a_r = fit_vec - prior a_r = np.triu(a_r,0) print(a_r) i haven't worked through logic of calculation , applying best.
i've done small arrays. full problem, corresponding matrix large memory.
b=np.ones((200000,200000),float) so memory considerations might stuck for r in range(n) iteration.
Comments
Post a Comment