apache spark - "KeyError: 'SPARK_HOME' ", "can't load main class from JAR" in running PySpark as an Oozie workflow job -
this issue continuation of previous question here, seemingly resolved leads here issue.
i using spark 1.4.0 on cloudera quickstartvm chd-5.4.0. when run pyspark script sparkaction in oozie, encounter error in oozie job / container logs:
keyerror: 'spark_home' then came across this solution , this spark 1.3.0, although still did try. documentations seem issue fixed spark version 1.3.2 , 1.4.0 (but here am, encountering same issue).
the suggested solution in link need set spark.yarn.appmasterenv.spark_home , spark.executorenv.spark_home anything, if it's path not point actual spark_home (i.e., /bogus, although did set these actual spark_home).
here's workflow after:
<spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${resourcemanager}</job-tracker> <name-node>${namenode}</name-node> <master>local[2]</master> <mode>client</mode> <name>${name}</name> <jar>${workflowrootlocal}/lib/my_pyspark_job.py</jar> <spark-opts>--conf spark.yarn.appmasterenv.spark_home=/usr/lib/spark spark.executorenv.spark_home=/usr/lib/spark</spark-opts> </spark> which seems solve original problem above. however, leads error when try inspect stderr of oozie container log:
error: cannot load main class jar file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1437103727449_0011/container_1437103727449_0011_01_000001/spark.executorenv.spark_home=/usr/lib/spark if using python, should not expect main class right? please note in previous related post oozie job example shipped cloudera quickstartvm cdh-5.4.0, features sparkaction written in java working in tests. seems issue in python.
appreciate can help.
rather setting spark.yarn.appmasterenv.spark_home , spark.executorenv.spark_home variables, try , add following lines of code python script before setting sparkconf()
os.environ["spark_home"] = "/path/to/spark/installed/location" found reference here
this helped me resolve error face, faced following error afterwards
traceback (most recent call last): file "/usr/hdp/current/spark-client/analyticsjar/boxplot_outlier.py", line 129, in <module> main() file "/usr/hdp/current/spark-client/analyticsjar/boxplot_outlier.py", line 60, in main sc = sparkcontext(conf=conf) file "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 107, in __init__ file "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 155, in _do_init file "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 201, in _initialize_context file "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/py4j/java_gateway.py", line 701, in __call__ file "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/py4j/protocol.py", line 300, in get_return_value py4j.protocol.py4jjavaerror: error occurred while calling none.org.apache.spark.api.java.javasparkcontext. : java.lang.securityexception: class "javax.servlet.filterregistration"'s signer information not match signer information of other classes in same package
Comments
Post a Comment