Reading PDF document with iTextSharp creates string with repeating first page -


i use itextsharp read in pdf files , parse them using string receive. have encountered strange behavior pdf files. when getting string of example 4 page pdf, string filled pages in following order:

1 2 1 3 1 4

my code reading files follows:

using (pdfreader reader = new pdfreader(filestream)) {      stringbuilder sb = new stringbuilder();       itextextractionstrategy strategy = new simpletextextractionstrategy();      (int page = 0; page < reader.numberofpages; page++)      {          string text = pdftextextractor.gettextfrompage(reader, page + 1, strategy);          if (!string.isnullorwhitespace(text))              sb.append(encoding.utf8.getstring(encoding.convert(encoding.default, encoding.utf8, encoding.default.getbytes(text))));      }       debug.writeline(sb.tostring()); } 

here link file behaviour occurs:

https://onedrive.live.com/redir?resid=d9feff3bf45e05fd!1536&authkey=!aflrlskavlg89yy&ithint=file%2cpdf

hope guys can me out!

thanks chris haas found out going wrong. samples found online on how use itextsharp.pdf incorrect or incorrect implementation.

the simpletextextractionstrategy needs instantiated every page try read. not doing multiply each previous page in resulting string.

also line stringbuilder being appended can changed from:

sb.append(encoding.utf8.getstring(encoding.convert(encoding.default, encoding.utf8, encoding.default.getbytes(text)))); 

to

sb.append(text); 

thus following code gives correct result:

using (pdfreader reader = new pdfreader(filestream)) {     stringbuilder sb = new stringbuilder();      (int page = 0; page < reader.numberofpages; page++)     {         string text = pdftextextractor.gettextfrompage(reader, page + 1, new simpletextextractionstrategy());         if (!string.isnullorwhitespace(text))             sb.append(text);     }     debug.writeline(sb.tostring());                     } 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -