etl - Read, transform and stream to Hadoop -

i need build server reads large csv data files (100gbs) in directory, transforms fields , streams them hadoop cluster.

these files copied on other servers @ random time (100s times/day). takes long time finish copying file.

i need to:

my question is: there open source etl tool provide of 5, , works hadoop/spark stream? assume process standard, couldn't find yet.

thank you.

flume or kafka serve purpose. both integrated spark , hadoop.

WIKI