Can't start jobs with GNU Parallel -


i'm running 32 cores machine, , wish parallelize simple operation. given ip_addresses.txt file such :

1.2.3.4 8.8.8.8 120.120.120.120 

i'd resolve these ips using script, called script.sh resolves ips respective isps. given ip, , outputs following, example when given 1.2.3.4, fine :

echo 1.2.3.4 | ./script.sh 1.2.3.4|google 

the ip_addresses.txt contains multi-million unique ips, , thinking parallelizing call script. tried :

cat ip_addresses.txt | parallel ./script.sh 

but there not output. i'd expect have :

1.2.3.4|google 120.120.120.120|taiwan academic network 

this way can redirect them file.

my script follow :

#!/bin/bash while read ip   ret=$(/home/sco/twdir/product/trunk/ext/libmaxminddb-1.0.3/bin/mmdblookup --file /home/sco/twdir/product/trunk/ext/libmaxminddb-1.0.3/geoip2-isp.mmdb --ip $ip isp 2>/dev/null |  grep -v '^$' | grep -v '^  not find' | cut -d "\"" -f 2)   [[ $ret != "" ]] &&  echo -n "$ip|" && echo $ret; done 

what did miss ? although checked tutorials, can't sort out.

your script reads multiple lines standard input (stdin). gnu parallel defaults putting argument on command line. make gnu parallel give input on stdin use --pipe.

cat ip_addresses.txt | parallel --pipe ./script.sh 

this run 1 job per core, , pass each job 1 mb of data. looking addresses not cpu hard, might run 10 jobs per cpu (1000%):

cat ip_addresses.txt | parallel -j 1000% --pipe ./script.sh 

that might hit file handle limit, so:

cat ip_addresses.txt |\   parallel --pipe --block 50m --round-robin -j100 parallel --pipe -j50 ./script.sh 

this run 100*50 = 5000 jobs in parallel.

if not want wait full 1 mb processed before output, can lower 1k:

cat ip_addresses.txt | parallel -j 1000% --pipe --block-size 1k ./script.sh  cat ip_addresses.txt |\   parallel --pipe --block 50k --round-robin -j100 parallel --pipe --block 1k -j50 ./script.sh 

Comments