i have 5 different files. part of each file looks as:
ifile1.txt ifile2.txt ifile3.txt ifile4.txt ifile5.txt 2 3 2 3 2 1 2 /no value 2 3 /no value 2 4 3 /no value 3 1 0 0 1 /no value /no value /no value /no value /no value i need compute average of these 5 files without considering missing values. i.e.
ofile.txt 2.4 2.0 3.0 1.0 99999 here 2.4 = (2+3+2+3+2)/5 2.0 = (1+2+2+3)/4 3.0 = (2+4+3)/3 1.0 = (3+1+0+0+1)/5 99999 = missing i trying in following way, don't feel proper way.
paste ifile1.txt ifile2.txt ifile3.txt ifile4.txt ifile5.txt > ofile.txt tr '\n' ' ' < ofile.txt > ofile1.txt awk '!/\//{sum += $1; count++} {print count ? (sum/count) : count;sum=count=0}' ofile1.txt > ofile2.txt awk '!/\//{sum += $2; count++} {print count ? (sum/count) : count;sum=count=0}' ofile1.txt > ofile3.txt awk '!/\//{sum += $3; count++} {print count ? (sum/count) : count;sum=count=0}' ofile1.txt > ofile4.txt awk '!/\//{sum += $4; count++} {print count ? (sum/count) : count;sum=count=0}' ofile1.txt > ofile5.txt awk '!/\//{sum += $5; count++} {print count ? (sum/count) : count;sum=count=0}' ofile1.txt > ofile6.txt paste ofile2.txt ofile3.txt ofile4.txt ofile5.txt ofile6.txt > ofile7.txt tr '\n' ' ' < ofile7.txt > ofile.txt
the following script.awk deliver want:
begin { gap = -1; maxidx = -1; } { if (nr != fnr + gap) { idx = 0; gap = nr - fnr; } if (idx > maxidx) { maxidx = idx; count[idx] = 0; sum[idx] = 0; } if ($0 != "/no value") { count[idx]++; sum[idx] += $0; } idx++; } end { (idx = 0; idx <= maxidx; idx++) { if (count[idx] == 0) { sum[idx] = 99999; count[idx] = 1; } print sum[idx] / count[idx]; } } you call with:
awk -f script.awk ifile*.txt and allows arbitrary number of input files, each arbitrary number of lines. works follows:
begin { gap = -1; maxidx = -1; } this begin section runs before lines processed , sets current gap , maximum index accordingly.
the gap difference between overall line number nr , file line number fnr, used detect when switch files, that's handy when processing multiple input files.
the maximum index used figure out largest line count output correct number of records @ end.
{ if (nr != fnr + gap) { idx = 0; gap = nr - fnr; } if (idx > maxidx) { maxidx = idx; count[idx] = 0; sum[idx] = 0; } if ($0 != "/no value") { count[idx]++; sum[idx] += $0; } idx++; } the above code meat of solution, executed per line. first if statement used detect whether you've moved new file , can aggregate associated lines each file. mean first line in each input file used calculate average first line of output file.
the second if statement adjusts maxidx if current line number beyond previous line number we've encountered. case file 1 may have 7 lines file 2 has 9 lines (not in case it's worth handling anyway). unencountered line number means initialise sum , count zero.
the final if statement updates sum , count if line contains other /no value.
and then, of course, need adjust line number next time through.
end { (idx = 0; idx <= maxidx; idx++) { if (count[idx] == 0) { sum[idx] = 99999; count[idx] = 1; } print sum[idx] / count[idx]; } } in terms of outputting data, it's simple matter of going through array , calculating average sum , count. notice that, if count 0 (all corresponding entries /no value), adjust sum , count 99999 instead. print average.
so, running code on input files gives, requested:
$ awk -f script.awk ifile*.txt 2.4 2 3 1 99999
Comments
Post a Comment