spark : use the global config variables in executors -


i have global config object in spark app.

object config {  var lambda = 0.01 } 

and set value of lambda according user's input.

object myapp {    def main(args: string[]) {      config.lambda = args(0).todouble      ...      rdd.map(_ * config.lambda)    } } 

and found modification not take effect in executors. value of lambda 0.01. guess modification in driver's jvm not effect executor's.

do have other solution ?

i found similar question in stackoverflow :

how set , static variables spark?

in @daniell. 's answer, gives 3 solutions:

  1. put value inside closure serialized executors perform task.

but wonder how write closure , how serialized executors, 1 give me code example?

2.if values fixed or configuration available on executor nodes (lives inside jar, etc), can have lazy val, guaranteeing initialization once.

what if declare lambda lazy val variable? modification in driver take effects in executors? give me code example?

3.create broadcast variable data. know way, need local broadcast[] variable wraps config object right? example:

val config = sc.broadcast(config) 

and use config.value.lambda in executors , right ?

  1. put value inside closure
object config {var lambda = 0.01} object sotest {   def main(args: array[string]) {     val sc = new sparkcontext(new sparkconf().setappname("staticvar"))     val r = sc.parallelize(1 10, 3)     config.lambda = 0.02     mul(r).collect.foreach(println)     sc.stop()   }   def mul(rdd: rdd[int]) = {     val l = config.lambda     rdd.map(_ * l)   } } 
  1. lazy val once initialisation
object sotest {   def main(args: array[string]) {     lazy val lambda = args(0).todouble     val sc = new sparkcontext(new sparkconf().setappname("staticvar"))     val r = sc.parallelize(1 10, 3)     r.map(_ * lambda).collect.foreach(println)     sc.stop()   } } 
  1. create broadcast variable data
object config {var lambda = 0.01} object sotest {   def main(args: array[string]) {     val sc = new sparkcontext(new sparkconf().setappname("staticvar"))     val r = sc.parallelize(1 10, 3)      config.lambda = 0.04     val bc = sc.broadcast(config.lambda)     r.map(_ * bc.value).collect.foreach(println)      sc.stop()   } } 

note: shouldn't pass in config object sc.broadcast() directly, serialise config before transfer executors, however, config not serialisable. thing mention here: broadcast variable not fit situation here, because sharing single value.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -