what is the relevence of mentioning no of tasks in storm -


i wanted know actual relevance of using tasks in storm respect output or performance since not have parallelism, on choosing more 1 task component will make change in output? or flow than? or if choose no of tasks > executors how make difference in flow or output (here taking basic word count example). helpful if explain me or without example.

for example say- have topology 3 bolts , 1 spout, , have mentioned 2 workers port,than means these 4 components(1 spot , 3 bolts run on these workers only) have mentioned 2 executors 1st bolt means there 2 thread of bolt running in parallel.now if mention no of task=3 how make difference whether in output or performance? , if have mentioned field grouping grouping there in different executors(plz correct me if m wrong)?

did read article? https://storm.apache.org/documentation/understanding-the-parallelism-of-a-storm-topology.html

to pick example: if set #tasks=3 , specify 2 executors using fieldsgrouping data partitioned 3 substreams (= #tasks). 2 substreams go 1 executor , third second executor. however, using 3 tasks , 2 executors, allows increase number of executors 3 using rebalance command.

as long not want increase number of executors during execution, #tasks should equal #executors (ie, don't specify #tasks).

for example (if don't want change parallelism @ runtime), can imbalance workload both executors (one executor processed 33% of data, other 66%). however, problem in special case , not in general. if assume have 4 tasks, each executors processed 2 substreams , no inbalance occurs.


Comments