java - How to extract the contents of a table in pdf file? -


i want extract contents of table in pdf like :

enter image description here

i wrote java programme using itext java pdf libray can read contents of pdf file line line, not know how contents of table

import com.itextpdf.text.pdf.pdfreader; import com.itextpdf.text.pdf.parser.pdftextextractor;  public class pdfreader {      public static void main(string[] args) {          // todo, add application code         system.out.println("lecteur pdf");         system.out.println (readpdf("d:/test.pdf"));     }         private static string readpdf(string pdf_url)     {         stringbuilder str=new stringbuilder();         try         {           pdfreader reader = new pdfreader(pdf_url);         int n = reader.getnumberofpages();          for(int i=1;i<n;i++)          {             string str2=pdftextextractor.gettextfrompage(reader, i);             str.append(str2);            system.out.println(str);          }         }catch(exception err)         {             err.printstacktrace();         }         return string.format("%s", str);     } } 

this :

enter image description here

but that's not want, want extract contents of table line line , column column, example, save each line in java array

the first array contain : "n°", "date observations", "texte"

the second array contain : "029/14", "le 1er sept 2014 remplace avurnav...", "sete compter du lundi 7 juillet 2014 débuteront les trav..."

the third array contain : "037/14", "le 15 octobre 2014 remplace avurnav ...", "sete du 15 septembre 2014 au 15 juillet 2015, travaux ...."

and on

thanks

you may have identify common field beginning/end character sequences split data array if pdf library doesn't support extracting tables. instance first fields nnn/nn, second field ends nnnn/nn , third field ends next first field begins.

this tricky problem - have had use coordinate based approaches deal before, pdf library may not support extracting position of letters actual text.


Comments