hadoop - Pig Json Multistorage? -

using pig (0.14), i'm interested in following use-case: wish process raw json multiple output directories based upon key , store result (aggregated data) json. json has evolving (dynamic) schema read in elephant-bird, , (so-far) has not caused problems.

i can either store output in correct directories (using multistorage) or json (using jsonstorage) not both. far can tell, there no publicly available udf purpose.

have missed something, or case of writing own udf perform this? seems simple use-case , have thought have been supported.

for looking answer this; udf required.

it possible (and relatively straight forward) combine piggybank udfs of jsonstorage , multistorage create pseudo "jsonmultistorage" class.

Search This Blog

Braziel

hadoop - Pig Json Multistorage? -

Comments

Post a Comment

Popular posts from this blog

android - How to save instance state of selected radiobutton on menu -

python 3 IndexError: list index out of range -

IF statement in MySQL trigger -