How to bootstrap installation of Python modules on Amazon EMR? -


i want basic, fire spark cluster through emr console , run spark script depends on python package (for example, arrow). straightforward way of doing this?

the straightforward way create bash script containing installation commands, copy s3, , set bootstrap action console point script.

here's example i'm using in production:

s3://mybucket/bootstrap/install_python_modules.sh

#!/bin/bash -xe  # non-standard , non-amazon machine image python modules: sudo pip install -u \   awscli            \   boto              \   ciso8601          \   ujson             \   workalendar  sudo yum install -y python-psycopg2 

Comments