i'm running 32 cores machine, , wish parallelize simple operation. given ip_addresses.txt file such :
1.2.3.4 8.8.8.8 120.120.120.120 i'd resolve these ips using script, called script.sh resolves ips respective isps. given ip, , outputs following, example when given 1.2.3.4, fine :
echo 1.2.3.4 | ./script.sh 1.2.3.4|google the ip_addresses.txt contains multi-million unique ips, , thinking parallelizing call script. tried :
cat ip_addresses.txt | parallel ./script.sh but there not output. i'd expect have :
1.2.3.4|google 120.120.120.120|taiwan academic network this way can redirect them file.
my script follow :
#!/bin/bash while read ip ret=$(/home/sco/twdir/product/trunk/ext/libmaxminddb-1.0.3/bin/mmdblookup --file /home/sco/twdir/product/trunk/ext/libmaxminddb-1.0.3/geoip2-isp.mmdb --ip $ip isp 2>/dev/null | grep -v '^$' | grep -v '^ not find' | cut -d "\"" -f 2) [[ $ret != "" ]] && echo -n "$ip|" && echo $ret; done what did miss ? although checked tutorials, can't sort out.
your script reads multiple lines standard input (stdin). gnu parallel defaults putting argument on command line. make gnu parallel give input on stdin use --pipe.
cat ip_addresses.txt | parallel --pipe ./script.sh this run 1 job per core, , pass each job 1 mb of data. looking addresses not cpu hard, might run 10 jobs per cpu (1000%):
cat ip_addresses.txt | parallel -j 1000% --pipe ./script.sh that might hit file handle limit, so:
cat ip_addresses.txt |\ parallel --pipe --block 50m --round-robin -j100 parallel --pipe -j50 ./script.sh this run 100*50 = 5000 jobs in parallel.
if not want wait full 1 mb processed before output, can lower 1k:
cat ip_addresses.txt | parallel -j 1000% --pipe --block-size 1k ./script.sh cat ip_addresses.txt |\ parallel --pipe --block 50k --round-robin -j100 parallel --pipe --block 1k -j50 ./script.sh
Comments
Post a Comment