i'm trying ingest log files hadoop.
i'd use oozie trigger ingestion task (written in spark),and have oozie pass filenames task.
i expect log files set out as:
/example/${year}-${month}-${day}-${hour}:${minute}/log1/log1.log /example/${year}-${month}-${day}-${hour}:${minute}/log1/log1.1.log /example/${year}-${month}-${day}-${hour}:${minute}/log1/log1.2.log /example/${year}-${month}-${day}-${hour}:${minute}/log2/log2.log /example/${year}-${month}-${day}-${hour}:${minute}/log2/log2.1.log /example/${year}-${month}-${day}-${hour}:${minute}/log2/log2.2.log (etc).
so, have 2 problems: 1. how oozie generate file names under /example/${year}-${month}-${day}-${hour}:${minute}/log1/ , pass app; and
- how oozie in parallel generate file names under /example/${year}-${month}-${day}-${hour}:${minute}/log2/ , pass second invocation of task.
datetime wise file name create can done using small java program, can call oozie workflow.xml, somthing
string processeddatestring = (new simpledateformat("yyyymmddhhmmss")).format(new date(timeinmilis)); and while calling same jar in workflow
<main-class>namefile.jar</main-class> <arg>path=${output_path}</arg> <arg>name=${name}</arg> <arg>processeddate=${(wf:actiondata('rename')['processeddate'])}</arg> for copying/moving can use same java program copy action.
for log1 , log2 location can mention in job.properties
Comments
Post a Comment