ruby - How to convert PDF to Excel or CSV in Rails 4 -


i have searched lot. have no choice unless asking here. guys know online convertor has api or gem/s can convert pdf excel or csv file?

i not sure if here best place ask either.

my application in rails 4.2. pdf file has contains header , big table 10 columns.

more info: user upload pdf via form need grab pdf parse csv , read content. tried read content pdf reader gem result wasn't promising.

i have used: freepdfconvert.com/pdf-excel unfortunately don't supply api. (i have contacted them)

sample pdf

enter image description here

this piece of code convert pdf text handy. gem: pdf-reader

 def self.parse     reader = pdf::reader.new("pdf_uploaded_by_user.pdf")     reader.pages.each |page|       puts page.text     end   end 

now if check sample attached pdf see fields might empty means can't split text line space , put in array won't able map array correct fields.

thank you.

ok, after lots of research couldn't find api or proper software it. here how did it.

i first extract table out of pdf table api pdftables. cheap.

then convert html table csv.

(this not ideal works)

here code:

require 'httmultiparty' class pagetextreceiver   include httmultiparty   base_uri 'http://localhost:3000'    def run     response = pagetextreceiver.post('https://pdftables.com/api?key=myapikey', :query => { f: file.new("/path/to/pdf/uploaded_pdf.pdf", "r") })      file.open('/path/to/save/as/html/response.html', 'w') |f|       f.puts response     end   end    def convert     f = file.open("/path/to/saved/html/response.html")     doc = nokogiri::html(f)     csv = csv.open("path/to/csv/t.csv", 'w',{:col_sep => ",", :quote_char => '\'', :force_quotes => true})     doc.xpath('//table/tr').each |row|       tarray = []       row.xpath('td').each |cell|         tarray << cell.text       end       csv << tarray     end     csv.close   end end 

now run this:

#> page = pagetextreceiver.new #> page.run #> page.convert 

it not refactored. proof of concept. need consider performance.

i might use sidkiq run in background , move result main thread.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -