ruby - How to convert PDF to Excel or CSV in Rails 4 -
i have searched lot. have no choice unless asking here. guys know online convertor has api or gem/s can convert pdf excel or csv file?
i not sure if here best place ask either.
my application in rails 4.2. pdf file has contains header , big table 10 columns.
more info: user upload pdf via form need grab pdf parse csv , read content. tried read content pdf reader gem result wasn't promising.
i have used: freepdfconvert.com/pdf-excel unfortunately don't supply api. (i have contacted them)
sample pdf
this piece of code convert pdf text handy. gem: pdf-reader
def self.parse reader = pdf::reader.new("pdf_uploaded_by_user.pdf") reader.pages.each |page| puts page.text end end
now if check sample attached pdf see fields might empty means can't split text line space , put in array won't able map array correct fields.
thank you.
ok, after lots of research couldn't find api or proper software it. here how did it.
i first extract table out of pdf table api pdftables. cheap.
then convert html table csv.
(this not ideal works)
here code:
require 'httmultiparty' class pagetextreceiver include httmultiparty base_uri 'http://localhost:3000' def run response = pagetextreceiver.post('https://pdftables.com/api?key=myapikey', :query => { f: file.new("/path/to/pdf/uploaded_pdf.pdf", "r") }) file.open('/path/to/save/as/html/response.html', 'w') |f| f.puts response end end def convert f = file.open("/path/to/saved/html/response.html") doc = nokogiri::html(f) csv = csv.open("path/to/csv/t.csv", 'w',{:col_sep => ",", :quote_char => '\'', :force_quotes => true}) doc.xpath('//table/tr').each |row| tarray = [] row.xpath('td').each |cell| tarray << cell.text end csv << tarray end csv.close end end
now run this:
#> page = pagetextreceiver.new #> page.run #> page.convert
it not refactored. proof of concept. need consider performance.
i might use sidkiq
run in background , move result main thread.
Comments
Post a Comment