this question has answer here:
- extract columns of text pdf file using itext 7 answers
- how read table in pdf using itext java? 3 answers
i want extract contents of table in pdf like :

i wrote java programme using itext java pdf libray can read contents of pdf file line line, not know how contents of table
import com.itextpdf.text.pdf.pdfreader; import com.itextpdf.text.pdf.parser.pdftextextractor; public class pdfreader { public static void main(string[] args) { // todo, add application code system.out.println("lecteur pdf"); system.out.println (readpdf("d:/test.pdf")); } private static string readpdf(string pdf_url) { stringbuilder str=new stringbuilder(); try { pdfreader reader = new pdfreader(pdf_url); int n = reader.getnumberofpages(); for(int i=1;i<n;i++) { string str2=pdftextextractor.gettextfrompage(reader, i); str.append(str2); system.out.println(str); } }catch(exception err) { err.printstacktrace(); } return string.format("%s", str); } } this :

but that's not want, want extract contents of table line line , column column, example, save each line in java array
the first array contain : "n°", "date observations", "texte"
the second array contain : "029/14", "le 1er sept 2014 remplace avurnav...", "sete compter du lundi 7 juillet 2014 débuteront les trav..."
the third array contain : "037/14", "le 15 octobre 2014 remplace avurnav ...", "sete du 15 septembre 2014 au 15 juillet 2015, travaux ...."
and on
thanks
you may have identify common field beginning/end character sequences split data array if pdf library doesn't support extracting tables. instance first fields nnn/nn, second field ends nnnn/nn , third field ends next first field begins.
this tricky problem - have had use coordinate based approaches deal before, pdf library may not support extracting position of letters actual text.
Comments
Post a Comment