i'm using nutch crawl website , writing plugin. jaunt 1.0.0.1 used parse html. example, have row
element infobooksitem = body.findfirst("<div class=info_books_item>"); which gets , error, when on page no <div class=info_books_item>. i'm looking @ jaunt javadocs, can't figure out how check, there such element or not.
you correct findfirst method throws exception if element not found.. can use try-catch block catch notfound exception in code, , take there, or if can write helper method not throw exception (if need boolean detector)
public boolean has(element element, string target){ try{ element.findfirst(target); return true; } catch(notfound n){ return false; } } alternatively, can use findevery method, not throw exception, boolean detector:
if(body.findevery("<div class=info_books_item>").size() > 0){ }
Comments
Post a Comment