i'm working apache spark 1.4.0 on windows 7 x64 java 1.8.0_45 x64 , python 2.7.10 x86 in ipython 3.2.0
i attempting write dataframe-based program in ipython notebook reads , writes sql server database.
so far can read data database
from pyspark.sql import sqlcontext sqlcontext = sqlcontext(sc) df = sqlcontext.load(source="jdbc",url="jdbc:sqlserver://serverurl", dbtable="dbname.tablename", driver="com.microsoft.sqlserver.jdbc.sqlserverdriver", user="username", password="password") and convert data panda , whatever want it. (this more little hassle, works after adding microsoft's sqljdbc42.jar spark.driver.extraclasspath in spark-defaults.conf)
the current problem arises when go write data sql server dataframewriter api:
df.write.jdbc("jdbc:sqlserver://serverurl", "dbname.sparktesttable1", dict(driver="com.microsoft.sqlserver.jdbc.sqlserverdriver", user="username", password="password"))
--------------------------------------------------------------------------- py4jerror traceback (most recent call last) <ipython-input-19-8502a3e85b1e> in <module>() ----> 1 df.write.jdbc("jdbc:sqlserver://jdbc:sqlserver", "dbname.sparktesttable1", dict(driver="com.microsoft.sqlserver.jdbc.sqlserverdriver", user="username", password="password")) c:\users\user\downloads\spark-1.4.0-bin-hadoop2.6\python\pyspark\sql\readwriter.pyc in jdbc(self, url, table, mode, properties) 394 k in properties: 395 jprop.setproperty(k, properties[k]) --> 396 self._jwrite.mode(mode).jdbc(url, table, jprop) 397 398 c:\python27\lib\site-packages\py4j\java_gateway.pyc in __call__(self, *args) 536 answer = self.gateway_client.send_command(command) 537 return_value = get_return_value(answer, self.gateway_client, --> 538 self.target_id, self.name) 539 540 temp_arg in temp_args: c:\python27\lib\site-packages\py4j\protocol.pyc in get_return_value(answer, gateway_client, target_id, name) 302 raise py4jerror( 303 'an error occurred while calling {0}{1}{2}. trace:\n{3}\n'. --> 304 format(target_id, '.', name, value)) 305 else: 306 raise py4jerror( py4jerror: error occurred while calling o49.mode. trace: py4j.py4jexception: method mode([class java.util.hashmap]) not exist @ py4j.reflection.reflectionengine.getmethod(reflectionengine.java:333) @ py4j.reflection.reflectionengine.getmethod(reflectionengine.java:342) @ py4j.gateway.invoke(gateway.java:252) @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133) @ py4j.commands.callcommand.execute(callcommand.java:79) @ py4j.gatewayconnection.run(gatewayconnection.java:207) @ java.lang.thread.run(unknown source) the problem seems py4j cannot find java java.util.hashmap class when goes convert connectionproperties dictionary jvm object. adding rt.jar (with path) spark.driver.extraclasspath not not resolve issue. removing dictionary write command avoids error, of course write fails due lack of driver , authentication.
edit: o49.mode part of error changes run run.
davies liu on spark users mailing list found problem. there subtle difference between scala , python apis missed. have pass in mode string (such "overwrite") 3rd parameter in python api not scala api. changing statement follows resolves issue:
df.write.jdbc("jdbc:sqlserver://serverurl", "dbname.sparktesttable1", "overwrite", dict(driver="com.microsoft.sqlserver.jdbc.sqlserverdriver", user="username", password="password"))
Comments
Post a Comment