apache spark - pySpark DataFrame FloatType() with file coming in as unicode -

hello have following schema:

[structfield(record_id,stringtype,true), structfield(offer_id,floattype,true)]

the file importing coming in unicode. sc.textfiles turning unicode false still pulls string error. question before load data dataframe have cleanse (convert unicode float before saying floattype?

what efficient way scale 1000's of fields.

it not practice convert implicitly between unrelated data types. (almost) no system can automagically. yes, have tell system , system accept taking risk of failure in future (what happens if string field contains "abc" suddenly?)
you should use map function translation layer between sc.textfile , createdataframe or apply schema step. casting correct data types should happen there.
if have 1000s of fields, may want implement infer-schema mechanism , take sample of data decide schema use, , apply whole data.

(assuming spark 1.3.1 release)

Search This Blog

Braziel

apache spark - pySpark DataFrame FloatType() with file coming in as unicode -

Comments

Post a Comment

Popular posts from this blog

IF statement in MySQL trigger -

c++ - What does MSC in "// appease MSC" comments mean? -

android - MPAndroidChart - How to add Annotations or images to the chart -