sungridengine - Using Docker for HPC with Sun Grid Engine -


i wondering if possible create virtual cluster docker can run scripts have been designed hpc clusters using sge cluster management. these pretty large/complicated workflows, not can re-write, torque/pbs. theoretically should able trick docker thinking there multiple nodes, internal hpc cluster. if can save me pain telling me can't done, appreciative.

warning: not cluster admin. i'm more end user. running on mac osx 10.9.5

client version: 1.7.0 client api version: 1.19 go version (client): go1.4.2 git commit (client): 0baf609 os/arch (client): darwin/amd64 server version: 1.7.0  server api version: 1.19 go version (server): go1.4.2 git commit  (server): 0baf609 os/arch (server): linux/amd64 bash-3.2$ boot2docker  version boot2docker-cli version: v1.7.0 git commit: 7d89508 

i've been using derivative of image (the dockerfileis here). steps pretty straightforward , follow instructions on website:

  1. create image
docker-machine create -d virtualbox local 
  1. make active image
eval "$(docker-machine env local)" 
  1. get swarm image
docker run --rm swarm create 
  1. create swarm master
docker-machine create \     -d virtualbox \     --swarm \     --swarm-master \     --swarm-discovery token://$token \     swarm-master 
  1. use token create swarm nodes
docker-machine create \ -d virtualbox \ --swarm \ --swarm-discovery token://$token \ swarm-agent-00 
  1. add node
 docker-machine create \ -d virtualbox \ --swarm \ --swarm-discovery token://$token \ swarm-agent-01 

now here crazy part. when try source image using command: eval "$(docker-machine env --swarm swarm-master)" stupid thing cannot connect docker daemon. 'docker -d' running on host?. tried eval $(docker-machine env swarm-master) , works, i'm not 100% sure right thing do:

name             active   driver       state     url                         swarm  local                     virtualbox   running   tcp://192.168.99.105:2376    swarm-agent-00            virtualbox   running   tcp://192.168.99.107:2376   swarm-master swarm-agent-01            virtualbox   running   tcp://192.168.99.108:2376   swarm-master swarm-master     *        virtualbox   running   tcp://192.168.99.106:2376   swarm-master (master) 
  1. at point, build multi-container app using yaml file:
bior:  image: stevenhart/bior_annotate  command: login -f sgeadmin  volumes:   - .:/data  links:    - sge  sge:  build: .  ports:   - "6444"   - "6445"   - "6446" 

using docker-compose up

  1. and open new image

docker run -it --rm dockersge_sge login -f sgeadmin

but here problem

when run qhost following:

    hostname                arch         ncpu nsoc ncor nthr  load  memtot  memuse  swapto  swapus ---------------------------------------------------------------------------------------------- global                  -               -    -    -    -     -       -       -       -       - 6bf6f6fda409            lx-amd64        1    1    1    1  0.01  996.2m   96.2m    1.1g     0.0 

shouldn't think there multiple cpus, i.e. each 1 of swarm nodes?

i assume running qhost inside docker.

the thing swarm is, doesn't combine hosts 1 big machine (i used think so).

instead, have, example, 5 1 core machines, swarm pick machine few dockers possible , run docker on machine.

so swarm controller spreads dockers in cluster, rather combining hosts one.

hope helps! if have additional questions, please ask :)

update

i'm not sure if suits you, if don't swarm, recommend kubernetes. use on raspberry pis. cool , more mature swarm, things auto healing , on.

i don't know, surely there's way of integrating docker hadoop too...


Comments