How to deploy and run at a specific time Python scripts on Apache Spark? -
i have set of simple python 2.7 scripts. also, have set of linux nodes. want run @ specific time these scripts on these nodes.
each script may work on each node. script not able run on multiple nodes simultaneously.
so, want complete 3 simple tasks:
- to deploy set of scripts.
- to run @ specific time main script specific parameters on node.
- to result, when script finished.
it seems, able complete first task. have following code snippet:
import urllib import urlparse pyspark import sparkcontext def path2url(path): return urlparse.urljoin( 'file:', urllib.pathname2url(path)) master_url = "spark://my-pc:7077" deploy_zip_path = "deploy.zip" sc = sparkcontext(master=("%s" % master_url), appname="job submitter", pyfiles=[path2url("%s" % deploy_zip_path)]) but have problems. code launches tasks. want deploy scripts nodes.
i recommend keeping code deploy pyspark scripts outside of pyspark scripts.
chronos job scheduler runs on apache mesos. spark can run on mesos. chronos runs jobs shell command. so, can run scripts arguments specify. need deploy spark , scripts mesos nodes. then, can run submit spark scripts chronos using
spark-submitcommand.you store results writing kind of storage mechanism within pyspark scripts. spark has support text files, hdfs, amazon s3, , more. if spark doesn't support storage mechanism need, can use external library does. example, write cassandra in pyspark scripts using cassandra-driver.
Comments
Post a Comment