scala - Stack overflow exception with reduce and union -


i trying run following. trying divide data in multiple parts, apply operations each part , join results. while "take , foreach" works fine, "count" operation fails stack overflow exception.

// studenttablerdd rdd of data read student table // student table contains data related each student val studentscoringlist = studenttablerdd.map(data => data(student_id_idx)).distinct.collect.map{studentid => {studenttablerdd.filter(x => x(student_id_idx) == studentid)}} val studentprofilingrdd = studentscoringlist.map(data => scorestudentdata(1,data,trained_studentmodellist)).filter(_!=null).reduce(_.union(_))  studentprofilingrdd.take(10).foreach(println(_)) studentprofilingrdd.count // throws stack overflow exception 

  1. val studentscoringlist = studenttablerdd.map(data => data(student_id_idx)).distinct.collect.map{studentid => {studenttablerdd.filter(x => x(student_id_idx) == studentid)}} you've got list[rdd] source rdd. each rdd has data 1 unique studentid, , sum set of rdd equals studenttablerdd of course. strange @ least. there no work data there 1 hard operation (collect) , lot of lazy transformations. (useless splitting , computation?)
  2. val studentrdd = studentscoringlist.map(data => scorestudentdata(1,data,trained_studentmodellist)) transform datum, ok (1 step useless while)
  3. filter(_!=null) if scorestudentdata can return null wrong code. bad style. (1 step useless while)
  4. reduce(_.union(_)) joins rdd back. , again, 1 step useless.

this code gets same result:

studenttablerdd map { data =>      val score = scorestudentdata(1,data,trained_studentmodellist)     if (score == null) none else some(score) } collect {     case some(score) => score } 

but suppose it's not purpose.


Comments