python - java.util.HashMap missing in PySpark session -

i'm working apache spark 1.4.0 on windows 7 x64 java 1.8.0_45 x64 , python 2.7.10 x86 in ipython 3.2.0

i attempting write dataframe-based program in ipython notebook reads , writes sql server database.

so far can read data database

from pyspark.sql import sqlcontext sqlcontext = sqlcontext(sc) df = sqlcontext.load(source="jdbc",url="jdbc:sqlserver://serverurl", dbtable="dbname.tablename", driver="com.microsoft.sqlserver.jdbc.sqlserverdriver", user="username", password="password")

and convert data panda , whatever want it. (this more little hassle, works after adding microsoft's sqljdbc42.jar spark.driver.extraclasspath in spark-defaults.conf)

the current problem arises when go write data sql server dataframewriter api:

df.write.jdbc("jdbc:sqlserver://serverurl", "dbname.sparktesttable1", dict(driver="com.microsoft.sqlserver.jdbc.sqlserverdriver", user="username", password="password"))

--------------------------------------------------------------------------- py4jerror                                 traceback (most recent call last) <ipython-input-19-8502a3e85b1e> in <module>() ----> 1 df.write.jdbc("jdbc:sqlserver://jdbc:sqlserver", "dbname.sparktesttable1", dict(driver="com.microsoft.sqlserver.jdbc.sqlserverdriver", user="username", password="password"))  c:\users\user\downloads\spark-1.4.0-bin-hadoop2.6\python\pyspark\sql\readwriter.pyc in jdbc(self, url, table, mode, properties)     394         k in properties:     395             jprop.setproperty(k, properties[k]) --> 396         self._jwrite.mode(mode).jdbc(url, table, jprop)     397      398   c:\python27\lib\site-packages\py4j\java_gateway.pyc in __call__(self, *args)     536         answer = self.gateway_client.send_command(command)     537         return_value = get_return_value(answer, self.gateway_client, --> 538                 self.target_id, self.name)     539      540         temp_arg in temp_args:  c:\python27\lib\site-packages\py4j\protocol.pyc in get_return_value(answer, gateway_client, target_id, name)     302                 raise py4jerror(     303                     'an error occurred while calling {0}{1}{2}. trace:\n{3}\n'. --> 304                     format(target_id, '.', name, value))     305         else:     306             raise py4jerror(  py4jerror: error occurred while calling o49.mode. trace: py4j.py4jexception: method mode([class java.util.hashmap]) not exist     @ py4j.reflection.reflectionengine.getmethod(reflectionengine.java:333)     @ py4j.reflection.reflectionengine.getmethod(reflectionengine.java:342)     @ py4j.gateway.invoke(gateway.java:252)     @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133)     @ py4j.commands.callcommand.execute(callcommand.java:79)     @ py4j.gatewayconnection.run(gatewayconnection.java:207)     @ java.lang.thread.run(unknown source)

the problem seems py4j cannot find java java.util.hashmap class when goes convert connectionproperties dictionary jvm object. adding rt.jar (with path) spark.driver.extraclasspath not not resolve issue. removing dictionary write command avoids error, of course write fails due lack of driver , authentication.

edit: o49.mode part of error changes run run.

davies liu on spark users mailing list found problem. there subtle difference between scala , python apis missed. have pass in mode string (such "overwrite") 3rd parameter in python api not scala api. changing statement follows resolves issue:

df.write.jdbc("jdbc:sqlserver://serverurl", "dbname.sparktesttable1", "overwrite", dict(driver="com.microsoft.sqlserver.jdbc.sqlserverdriver", user="username", password="password"))

WIKI

Search This Blog

python - java.util.HashMap missing in PySpark session -

Comments

Post a Comment