hadoop - Pig Json Multistorage? -


using pig (0.14), i'm interested in following use-case: wish process raw json multiple output directories based upon key , store result (aggregated data) json. json has evolving (dynamic) schema read in elephant-bird, , (so-far) has not caused problems.

i can either store output in correct directories (using multistorage) or json (using jsonstorage) not both. far can tell, there no publicly available udf purpose.

have missed something, or case of writing own udf perform this? seems simple use-case , have thought have been supported.

for looking answer this; udf required.

it possible (and relatively straight forward) combine piggybank udfs of jsonstorage , multistorage create pseudo "jsonmultistorage" class.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -