java - When using collect() on JavaRDD in spark , application becomes slower -


i using apache spark spark-cassandra-connector data analysis.for created javardd of table cassandra contains more 3 million records.

javardd<mytestbean> testrdd = testfunction             .cassandratable("keyspace", "table")             .map(new function<cassandrarow, mytestbean>() {                  @override                 public mytestbean call(                         final cassandrarow line) {                      final mytestbean beanobj = new mytestbean();                     beanobj.setid(line.getstring("id")));                     return beanobj ;                  }).filter(new function<mytestbean , boolean>() {                           public boolean call(mytestbean s) {                              long id = s.getid();                              if (id > 600000) {                                 final_check = true;                              }                              return final_check;                           }                  }); 

it wont return rows after filtering. when apply ,

testrdd.count(); 

it took approximately 2 minutes show count 0. behavior expected or there possible issues side?


Comments