i'm running spark job spark version 1.4 , cassandra 2.18. telnet master , works cassandra machine. job runs fine , following exception. why happen sometimes?
"exception in thread "main" org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 0.0 failed 4 times, recent failure: lost task 0.3 in stage 0.0 (tid 7, 172.28.0.162): java.io.ioexception: failed open native connection cassandra @ {172.28.0.164}:9042 @ com.datastax.spark.connector.cql.cassandraconnector$.com$datastax$spark$connector$cql$cassandraconnector$$createsession(cassandraconnector.scala:155) "
it gives me exception along upper one:
caused by: com.datastax.driver.core.exceptions.nohostavailableexception: host(s) tried query failed (tried: /172.28.0.164:9042 (com.datastax.driver.core.transportexception: [/172.28.0.164:9042] connection has been closed))
i had second error "nohostavailableexception" happen me quite few times week porting python spark java spark.
i having issues driver thread being out of memory , gc taking cores (98% of 8 core), pausing jvm time.
in python when happens it's more obvious (to me) took me bit of time realize going on, got error quite few times.
i had 2 theory on root cause, solution not having gc go crazy.
- first theory, because pausing often, couldn't connect cassandra.
- second theory: cassandra running on same machine spark , jvm taking 100% of cpu cassandra couldn't answer in time , looked driver there no cassandra host.
hope helps!
Comments
Post a Comment