hadoop - Oozie generate set of files in directory -


i'm trying ingest log files hadoop.

i'd use oozie trigger ingestion task (written in spark),and have oozie pass filenames task.

i expect log files set out as:

/example/${year}-${month}-${day}-${hour}:${minute}/log1/log1.log /example/${year}-${month}-${day}-${hour}:${minute}/log1/log1.1.log /example/${year}-${month}-${day}-${hour}:${minute}/log1/log1.2.log /example/${year}-${month}-${day}-${hour}:${minute}/log2/log2.log /example/${year}-${month}-${day}-${hour}:${minute}/log2/log2.1.log /example/${year}-${month}-${day}-${hour}:${minute}/log2/log2.2.log 

(etc).

so, have 2 problems: 1. how oozie generate file names under /example/${year}-${month}-${day}-${hour}:${minute}/log1/ , pass app; and

  1. how oozie in parallel generate file names under /example/${year}-${month}-${day}-${hour}:${minute}/log2/ , pass second invocation of task.

datetime wise file name create can done using small java program, can call oozie workflow.xml, somthing

string processeddatestring = (new simpledateformat("yyyymmddhhmmss")).format(new date(timeinmilis)); 

and while calling same jar in workflow

      <main-class>namefile.jar</main-class>       <arg>path=${output_path}</arg>       <arg>name=${name}</arg>       <arg>processeddate=${(wf:actiondata('rename')['processeddate'])}</arg> 

for copying/moving can use same java program copy action.

for log1 , log2 location can mention in job.properties


Comments