is possible build aws emr master node , set of task(slave) nodes (with out core nodes),when sure source data in s3 , processed result going stored in s3.
basically, question "what need of having datanode process when emr going process data in s3 " ( not store , use in hdfs).
core nodes in emr provide compute resources hdfs. in hadoop 2.x provided yarn nodemanager. if application's input , output both on s3, yarn (and other app layers hive) utilizes hdfs stage jars, split info, session data, etc.
Comments
Post a Comment