spark : use the global config variables in executors -
i have global config object in spark app.
object config { var lambda = 0.01 }
and set value of lambda according user's input.
object myapp { def main(args: string[]) { config.lambda = args(0).todouble ... rdd.map(_ * config.lambda) } }
and found modification not take effect in executors. value of lambda 0.01. guess modification in driver's jvm not effect executor's.
do have other solution ?
i found similar question in stackoverflow :
how set , static variables spark?
in @daniell. 's answer, gives 3 solutions:
- put value inside closure serialized executors perform task.
but wonder how write closure , how serialized executors, 1 give me code example?
2.if values fixed or configuration available on executor nodes (lives inside jar, etc), can have lazy val, guaranteeing initialization once.
what if declare lambda lazy val variable? modification in driver take effects in executors? give me code example?
3.create broadcast variable data. know way, need local broadcast[] variable wraps config object right? example:
val config = sc.broadcast(config)
and use config.value.lambda
in executors , right ?
- put value inside closure
object config {var lambda = 0.01} object sotest { def main(args: array[string]) { val sc = new sparkcontext(new sparkconf().setappname("staticvar")) val r = sc.parallelize(1 10, 3) config.lambda = 0.02 mul(r).collect.foreach(println) sc.stop() } def mul(rdd: rdd[int]) = { val l = config.lambda rdd.map(_ * l) } }
- lazy val once initialisation
object sotest { def main(args: array[string]) { lazy val lambda = args(0).todouble val sc = new sparkcontext(new sparkconf().setappname("staticvar")) val r = sc.parallelize(1 10, 3) r.map(_ * lambda).collect.foreach(println) sc.stop() } }
- create broadcast variable data
object config {var lambda = 0.01} object sotest { def main(args: array[string]) { val sc = new sparkcontext(new sparkconf().setappname("staticvar")) val r = sc.parallelize(1 10, 3) config.lambda = 0.04 val bc = sc.broadcast(config.lambda) r.map(_ * bc.value).collect.foreach(println) sc.stop() } }
note: shouldn't pass in config object
sc.broadcast()
directly, serialise config before transfer executors, however, config not serialisable. thing mention here: broadcast variable
not fit situation here, because sharing single value.
Comments
Post a Comment