loops - Perl: fastest way to count with big data (300million+ lines) -

i have dataset:

domain,ip,org emileaben.com, 94.31.44.1, level 3 communications anaplan.com, 94.31.44.12, level 3 communications anaplan.com, 94.31.44.15, abc anaplan.com, 94.31.44.19, level 3 communications

and count number of ips per domain per organization give me result:

domain,countip,org anaplan.com, 2, level 3 communications emileaben.com, 1, level 3 communications anaplan.com, 1, abc

can help?

from command line, no sorting,

perl -f, -ane'   begin { $" = "," }   $. >1 or next;   $h{"@f[0,2]"}++;   end { print $k =~ s|,\k| $v,|r while ($k,$v) = each(%h)  } ' file

with sorting,

perl -f, -ane'   begin { $" = "," }   $. >1 or next;   $h{"@f[0,2]"}++;   end { print s|,\k| $h{$_},|r sort {$h{$b} <=> $h{$a}} keys %h  } ' file

WIKI

Search This Blog

loops - Perl: fastest way to count with big data (300million+ lines) -

Comments

Post a Comment