Submit YARN jobs to remote Hadoop cluster via socks proxy -


i trying access firewalled hadoop cluster running yarn via socks proxy. cluster not using proxied connections -- client running on local machine (e.g. laptop) connected via ssh -d 9999 user@gateway-host machine can see hadoop cluster.

in hadoop configuration core-site.xml (on laptop) have following lines:

<property>     <name>hadoop.socks.server</name>     <value>localhost:9999</value> </property> <property>     <name>hadoop.rpc.socket.factory.class.default</name>     <value>org.apache.hadoop.net.sockssocketfactory</value> </property> 

accessing hdfs way works great. however, when try submit yarn job, fails , can see in logs nodes not able talk each other:

java.io.ioexception: failed on local exception: java.net.socketexception: connection refused; host details : local host is: "host1"; destination host is: "host2":8030;  @ org.apache.hadoop.net.netutils.wrapexception(netutils.java:772) 

where host1 , host2 both parts of hadoop cluster.

i guess happening hadoop nodes trying communicate via socks proxy , failing since no proxy server exists on each host. there way fix apart setting dedicated proxy server?

you right, hadoop nodes must not use socks proxy communication. can achieve marking socketfactory setting on cluster side final.

in core-site.xml on cluster, add final tag default socketfactory property:

    <property>         <name>hadoop.rpc.socket.factory.class.default</name>         <value>org.apache.hadoop.net.standardsocketfactory</value>         <final>true</final>     </property> 

obviously, must restart cluster services.


Comments