apache spark - pySpark DataFrame FloatType() with file coming in as unicode -


hello have following schema:

[structfield(record_id,stringtype,true), structfield(offer_id,floattype,true)] 

the file importing coming in unicode. sc.textfiles turning unicode false still pulls string error. question before load data dataframe have cleanse (convert unicode float before saying floattype?

what efficient way scale 1000's of fields.

  1. it not practice convert implicitly between unrelated data types. (almost) no system can automagically. yes, have tell system , system accept taking risk of failure in future (what happens if string field contains "abc" suddenly?)
  2. you should use map function translation layer between sc.textfile , createdataframe or apply schema step. casting correct data types should happen there.
  3. if have 1000s of fields, may want implement infer-schema mechanism , take sample of data decide schema use, , apply whole data.

(assuming spark 1.3.1 release)


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -