regex - GREP for a dynamic pattern in a file and print the other lines having former pattern and another pattern -
lets have log file looks this:
06/30/2015 00:17:20.716 info 06z07mjbyxfpzs matched line 06/30/2015 00:17:20.723 info 06z07mjbyxfpzs data xxyyzz 06/30/2015 00:17:20.735 info 06z07mdgc66vhc matched line 06/30/2015 00:17:20.759 info 06z07mgdq9thty data xxyyzz 06/30/2015 00:17:20.755 info 06z07mdgc66vhc matched line 06/30/2015 00:17:20.784 info 06z07mdgc66vhc data xxyyzz 06/30/2015 00:17:20.827 info 06z07n2q9s4g07 data xxyyzz 06/30/2015 00:17:20.855 info 06z07mxt44cf03 data xxyyzz 06/30/2015 00:17:20.861 info 06z07n5mxfykhg data xxyyzz 06/30/2015 00:17:20.873 info 06z07nm473brzb data xxyyzz 06/30/2015 00:17:20.902 info 06z07mm059k0tz data xxyyzz 06/30/2015 00:17:20.970 info 06z07nx2lv9wzc matched line 06/30/2015 00:17:20.974 info 06z07nx2lv9wzc data xxyyzz 06/30/2015 00:17:20.991 info 06z07ngwmw16zz matched line 06/30/2015 00:17:20.994 info 06z07ngwmw16zz data xxyyzz 06/30/2015 00:17:21.085 info 06z07n42c6qczx data xxyyzz 06/30/2015 00:17:21.094 info 06z07nmgpjppv1 matched line 06/30/2015 00:17:21.094 info 06z07mxr42tzzw data xxyyzz 06/30/2015 00:17:21.094 info 06z07mwbfvcgd3 data xxyyzz 06/30/2015 00:17:21.095 info 06z07nmgpjppv1 matched line 06/30/2015 00:17:21.100 info 06z07nmgpjppv1 data xxyyzz 06/30/2015 00:17:21.123 info 06z07p0ybwlv0b data xxyyzz 06/30/2015 00:17:21.132 info 06z07nslzf66hk matched line 06/30/2015 00:17:21.137 info 06z07nslzf66hk data xxyyzz what wish if:
- any line contains
"matched line", need unique id in column 4 (e.g.06z07mjbyxfpzs) and, - search other lines having unique id + text
"some data xxyyzz"and, - print line has matching patterns of (unique id +
"some data xxyyzz") on console final output.
so in case output should be:
06/30/2015 00:17:20.723 info 06z07mjbyxfpzs data xxyyzz 06/30/2015 00:17:20.784 info 06z07mdgc66vhc data xxyyzz 06/30/2015 00:17:20.974 info 06z07nx2lv9wzc data xxyyzz 06/30/2015 00:17:20.994 info 06z07ngwmw16zz data xxyyzz 06/30/2015 00:17:21.100 info 06z07nmgpjppv1 data xxyyzz 06/30/2015 00:17:21.137 info 06z07nslzf66hk data xxyyzz the file talking here huge file (~200 gb file; having millions of records), on shared server, cannot run scripts or commands take lot of time.
[edit] - doing through fgrep printing unique ids matched line in 1 file , some data xxyyzz in other; looking single line grep, awk or sed command (without having create multiple files fgrep)
[edit 2] - output not in file, rather intermediate output of series of grep , sort.
[edit 3] - updated sample input (not in order jumbled):
06/30/2015 00:17:21.094 info 06z07nmgpjppv1 matched line 06/30/2015 00:17:20.716 info 06z07mjbyxfpzs matched line 06/30/2015 00:17:20.735 info 06z07mdgc66vhc matched line 06/30/2015 00:17:20.759 info 06z07mgdq9thty data xxyyzz 06/30/2015 00:17:20.755 info 06z07mdgc66vhc matched line 06/30/2015 00:17:20.784 info 06z07mdgc66vhc data xxyyzz 06/30/2015 00:17:20.827 info 06z07n2q9s4g07 data xxyyzz 06/30/2015 00:17:20.855 info 06z07mxt44cf03 data xxyyzz 06/30/2015 00:17:20.861 info 06z07n5mxfykhg data xxyyzz 06/30/2015 00:17:20.873 info 06z07nm473brzb data xxyyzz 06/30/2015 00:17:20.723 info 06z07mjbyxfpzs data xxyyzz 06/30/2015 00:17:20.902 info 06z07mm059k0tz data xxyyzz 06/30/2015 00:17:20.970 info 06z07nx2lv9wzc matched line 06/30/2015 00:17:20.974 info 06z07nx2lv9wzc data xxyyzz 06/30/2015 00:17:20.991 info 06z07ngwmw16zz matched line 06/30/2015 00:17:21.085 info 06z07n42c6qczx data xxyyzz 06/30/2015 00:17:21.094 info 06z07nmgpjppv1 matched line 06/30/2015 00:17:21.094 info 06z07mxr42tzzw data xxyyzz 06/30/2015 00:17:20.994 info 06z07ngwmw16zz data xxyyzz 06/30/2015 00:17:21.094 info 06z07mwbfvcgd3 data xxyyzz 06/30/2015 00:17:21.095 info 06z07nmgpjppv1 matched line 06/30/2015 00:17:21.100 info 06z07nmgpjppv1 data xxyyzz 06/30/2015 00:17:21.123 info 06z07p0ybwlv0b data xxyyzz 06/30/2015 00:17:21.132 info 06z07nslzf66hk matched line 06/30/2015 00:17:21.137 info 06z07nslzf66hk data xxyyzz
ordered data
the following goes through file once , therefore should fast:
$ awk '/matched line/{id=$4;next;} id==$4' file.log 06/30/2015 00:17:20.723 info 06z07mjbyxfpzs data xxyyzz 06/30/2015 00:17:20.784 info 06z07mdgc66vhc data xxyyzz 06/30/2015 00:17:20.974 info 06z07nx2lv9wzc data xxyyzz 06/30/2015 00:17:20.994 info 06z07ngwmw16zz data xxyyzz 06/30/2015 00:17:21.100 info 06z07nmgpjppv1 data xxyyzz 06/30/2015 00:17:21.137 info 06z07nslzf66hk data xxyyzz in sample input (original question), some data lines follow matched line. enables fast , simple solution.
how use in pipeline
awk works in pipelines. if input not file but, in edit 2, pipeline, use like:
cmd1 <file.log | cmd2 | awk '/matched line/{id=$4;next;} id==$4' | cmd3 how works
/matched line/{id=$4;next;}any time find line containing text
matched line, save id in variableid. since not want printmatched line, tell awk skip rest of commands , jumpnextline.id==$4any time current line has id (field 4) matches our saved
id, print line.(in awk terminology,
id==$4condition: evaluates true or false. when condition true, action performed. in case, specified no action awk performs default action print line.)
partially ordered data
in edit 3, data lines can appear @ random location after matched line. in case:
$ awk '/matched line/{id[$4]=1;next;} id[$4]' file.log 06/30/2015 00:17:20.784 info 06z07mdgc66vhc data xxyyzz 06/30/2015 00:17:20.723 info 06z07mjbyxfpzs data xxyyzz 06/30/2015 00:17:20.974 info 06z07nx2lv9wzc data xxyyzz 06/30/2015 00:17:20.994 info 06z07ngwmw16zz data xxyyzz 06/30/2015 00:17:21.100 info 06z07nmgpjppv1 data xxyyzz 06/30/2015 00:17:21.137 info 06z07nslzf66hk data xxyyzz or, in pipeline:
cmd1 file.log | awk '/matched line/{id[$4]=1;next;} id[$4]'
Comments
Post a Comment