i new pig input data
(message,nil,2015-07-01,22:58:53.66,e,machine.com.name,12,0xd6,string,string ,0,0.0,key=value&key=123456789&key=value&key=us&key=company&key=message&key=123456789&key=string&key=string&key=string&key=string)
i have written java udf below parse last string of input data
package com.pig.udf; import java.io.ioexception; import java.util.arraylist; import java.util.arrays; import java.util.hashmap; import java.util.map; import org.apache.pig.evalfunc; import org.apache.pig.data.tuple; public class pigudf extends evalfunc<map> { @override public map<string, string> exec(tuple input) throws ioexception { // if tuple null, has fewer 3 values, or has number of // values if (input == null || input.size() < 3 || (input.size() % 2 == 0)) { throw new ioexception("incorrect number of values."); } string source = (string) input.get(0); system.out.println("input source"+source); string delim = (input.size() > 1) ? (string) input.get(1) : "&"; int length = (input.size() > 2) ? (integer) input.get(2) : 0; if (source == null || delim == null) { return null; } string[] splits = source.split(delim, length); system.out.println("splits"+ splits); arraylist<string> arraylist = new arraylist<string>( arrays.aslist(splits)); map<string, string> map = new hashmap<string, string>(); (string keyvalue : arraylist) { int end = keyvalue.indexof('='); if (end != -1) { map.put(keyvalue.substring(0, end), keyvalue.substring(end + 1)); } } system.out.println("map"+map); return map; } } when running pig script above java udf getting below error
pig stack trace --------------- error 1066: unable open iterator alias c org.apache.pig.impl.logicallayer.frontendexception: error 1066: unable open iterator alias c @ org.apache.pig.pigserver.openiterator(pigserver.java:892) @ org.apache.pig.tools.grunt.gruntparser.processdump(gruntparser.java:774) @ org.apache.pig.tools.pigscript.parser.pigscriptparser.parse(pigscriptparser.java:372) @ org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser.java:198) @ org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser.java:173) @ org.apache.pig.tools.grunt.grunt.exec(grunt.java:84) @ org.apache.pig.main.run(main.java:607) @ org.apache.pig.main.main(main.java:156) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ org.apache.hadoop.util.runjar.run(runjar.java:221) @ org.apache.hadoop.util.runjar.main(runjar.java:136) caused by: java.io.ioexception: job terminated anomalous status failed @ org.apache.pig.pigserver.openiterator(pigserver.java:884) ... 13 more application log ------------------------------------------------------------------- application application_1436453941326_0020 failed 2 times due container appattempt_1436453941326_0020_000002 exited exitcode: 1 more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1436453941326_0020/then, click on links logs of each attempt. diagnostics: exception container-launch. container id: container_1436453941326_0020_02_000001 exit code: 1 stack trace: exitcodeexception exitcode=1: @ org.apache.hadoop.util.shell.runcommand(shell.java:538) @ org.apache.hadoop.util.shell.run(shell.java:455) @ org.apache.hadoop.util.shell$shellcommandexecutor.execute(shell.java:715) @ org.apache.hadoop.yarn.server.nodemanager.defaultcontainerexecutor.launchcontainer(defaultcontainerexecutor.java:211) @ org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.containerlaunch.call(containerlaunch.java:302) @ org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.containerlaunch.call(containerlaunch.java:82) @ java.util.concurrent.futuretask.run(futuretask.java:262) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615) @ java.lang.thread.run(thread.java:745) container exited non-zero exit code 1 failing attempt. failing application. my script running fine without java udf function , giving me outfile too. issue arises when include java udf in pig script. there no java version mismatch between java udf , machine running pig pointers appreciated
pig script :
register '/home/cloudera/pig/pigudf_1.7.jar'; register '/home/cloudera/pig/pig.jar'; a= load 'logs_message.txt' using pigstorage(',') (component:chararray,nil:chararray,date:chararray,time:chararray,e:chararray,machine_address:chararray,number1:chararray,hex_number:chararray,cal_type:chararray,cal_name:chararray,number2:chararray,number3:chararray,data:chararray) b = filter cal_name matches 'changedmessage'; c = foreach b generate cal_name ,com.pig.udf.pigudf(data) datamap; dump c ;
i see 3 issues code:
- you're missing semi-colon on first line. not sure how runs this, assuming mistake in copying stackoverflow
- you name variable "e": reserved variable. not sure impact have, wouldn't safe. see here list of reserved pig keywords
- (this what's causing error). validations make no sense. looks created split function designed take 3 or less parameters (the string split, delimiter, , max split size). yet you're validating input has more 3 parameters. you're validating has number of parameters. seems validation intended string after you've split it, not before.
should like:
if (input == null || input.size() == 0 || input.size() > 3) { throw new ioexception("incorrect number of values."); } //... if(splits.length % 2 != 0) throw new ioexception("invalid key value pairs"); i'd advise not running programs in cloud on hadoop until you've debugged them, them working locally first. if use pigserver class, can debug udfs on development machine through eclipse or different ide.
Comments
Post a Comment