i've encountered several examples of sparkaction jobs in oozie, , of them in java. edit little , run example in cloudera cdh quickstart 5.4.0 (with spark version 1.4.0).
workflow.xml
<workflow-app xmlns='uri:oozie:workflow:0.5' name='sparkfilecopy'> <start to='spark-node' /> <action name='spark-node'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobtracker}</job-tracker> <name-node>${namenode}</name-node> <prepare> <delete path="${namenode}/user/${wf:user()}/${examplesroot}/output-data/spark"/> </prepare> <master>${master}</master> <mode>${mode}</mode> <name>spark-filecopy</name> <class>org.apache.oozie.example.sparkfilecopy</class> <jar>${namenode}/user/${wf:user()}/${examplesroot}/apps/spark/lib/oozie-examples.jar</jar> <arg>${namenode}/user/${wf:user()}/${examplesroot}/input-data/text/data.txt</arg> <arg>${namenode}/user/${wf:user()}/${examplesroot}/output-data/spark</arg> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>workflow failed, error message[${wf:errormessage(wf:lasterrornode())}] </message> </kill> <end name='end' /> </workflow-app> job.properties
namenode=hdfs://quickstart.cloudera:8020 jobtracker=quickstart.cloudera:8032 master=local[2] mode=client examplesroot=examples oozie.use.system.libpath=true oozie.wf.application.path=${namenode}/user/${user.name}/${examplesroot}/apps/spark the oozie workflow example (in java) able complete , task.
i've written spark-submit job using python / pyspark however. tried removing <class> , jar
<jar>my_pyspark_job.py</jar> but error in logs when attemp run oozie-spark job:
launcher error, reason: main class [org.apache.oozie.action.hadoop.sparkmain], exit code [2] i wonder should placing in <class> , <jar> tags if i'm using python / pyspark?
i struggled lot spark-action in oozie. setup sharelib , tried pass the appropriate jars using --jars option within <spark-opts> </spark-opts> tags, no avail.
i ended getting error or other. run java/python spark jobs in local mode through spark-action.
however, got spark jobs running in oozie in modes of execution using shell action. major problem shell action shell jobs deployed 'yarn' user. if happen deploy oozie spark job user account other yarn, you'll end permission denied error (because user not able access spark assembly jar copied /user/yarn/.sparkstaging directory). way solve set hadoop_user_name environment variable user account name through deploy oozie workflow.
below workflow illustrates configuration. deploy oozie workflows ambari-qa user.
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sparkjob"> <start to="spark-shell-node"/> <action name="spark-shell-node"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobtracker}</job-tracker> <name-node>${namenode}</name-node> <configuration> <property> <name>oozie.launcher.mapred.job.queue.name</name> <value>launcher2</value> </property> <property> <name>mapred.job.queue.name</name> <value>default</value> </property> <property> <name>oozie.hive.defaults</name> <value>/user/ambari-qa/sparkactionpython/hive-site.xml</value> </property> </configuration> <exec>/usr/hdp/current/spark-client/bin/spark-submit</exec> <argument>--master</argument> <argument>yarn-cluster</argument> <argument>wordcount.py</argument> <env-var>hadoop_user_name=ambari-qa</env-var> <file>/user/ambari-qa/sparkactionpython/wordcount.py#wordcount.py</file> <capture-output/> </shell> <ok to="end"/> <error to="spark-fail"/> </action> <kill name="spark-fail"> <message>shell action failed, error message[${wf:errormessage(wf:lasterrornode())}]</message> </kill> <end name="end"/> </workflow-app> hope helps!
Comments
Post a Comment