scala - Spark: Is there way to find number of records in a RDD -


i have directory structure in hdfs follows:

/dir1/dir2/dir3/2011/01/01/* /dir1/dir2/dir3/2011/01/02/* .. 

i have done following read files @ lest assume doing following read files:

val data = sc.textfile("/dir1/dir2/dir3/2011/**/**") 

i want make sure have read data under 2011 (all months , dates), thought 1 checking size of rdd give me idea.

that count - docs here.


Comments