google cloud dataflow - Partition data coming from CSV so I can process larger patches rather then individual lines -


i getting started google data flow, have written simple flow reads csv file cloud storage. 1 of steps involves calling web service enrich results. web service in question performs better when sending several 100 requests in bulk.

in looking @ api don't see great way aggregate 100 elements of pcollection single par.do execution. results need split handle last step of flow writing bigquery table.

not sure if need use windowing want. of windowing examples see more geared towards counting on given time period.

you can buffer elements in local member variable of dofn, , call web service when buffer large enough, in finishbundle. example:

class callservicefn extends dofn<string, string> {   private list<string> elements = new arraylist<>();    public void processelement(processcontext c) {     elements.add(c.element());     if (elements.size() >= max_call_size) {       (string result : callservicewithdata(elements)) {         c.output(result);       }       elements.clear();     }   }    public void finishbundle(context c) {     (string result : callservicewithdata(elements)) {       c.output(result);     }   } } 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -