R JSON UTF-8 parsing -
i have issue when trying parse json file in russian alphabet in r. file looks this:
[{"text": "Валера!", "type": "status"}, {"text": "когда выйдет", "type": "status"}, {"text": "КАК ДЕЛА?!)", "type": "status"}]
and saved in utf-8 encoding. tried libraries rjson, rjsonio , jsonlite parse it, doesn't work:
library(jsonlite) allfiles <- fromjson(txt="ru_json_example_short.txt")
gives me error
error in feed_push_parser(buf) : lexical error: invalid char in json text. [{"text": "Валера!", " (right here) ------^
when save file in ansi encodieng, works ok, then, russian alphabet transforms question marks, output unusable. know how parse such json file in r, please?
edit: above mentioned applies utf-8 file saved in windows notepad. when save in pspad , parse it, result looks this:
text type 1 <u+0412><u+0430><u+043b><u+0435><u+0440><u+0430>! status 2 <u+043a><u+043e><u+0433><u+0434><u+0430> <u+0432><u+044b><u+0439><u+0434><u+0435><u+0442> status 3 <u+041a><u+0410><u+041a> <u+0414><u+0415><u+041b><u+0410>?!) status
try following:
dat <- fromjson(sprintf("[%s]", paste(readlines("./ru_json_example_short.txt"), collapse=","))) dat [[1]] text type 1 Валера! status 2 когда выйдет status 3 КАК ДЕЛА?!) status
Comments
Post a Comment