apache kafka - Confluent Platform: Schema Registry Subjects -
working confluent platform, platform offered creators of apache kafka, , have question:
in documentation of schema registry api reference, mention abstraction of "subject". register schema under "subject" of form topicname-key, or topicname-value, yet there no explanation why need (as implies) separate schema key , value of messages on given topic. nor there direct statement effect registration "subject" associates schema topic, other mnemonically.
further confusing matters, subsequent examples ("get schema version subject" , "register new schema under subject") on page do not use format subject name, , instead use topic name "subject" value. if has insight a) why there these 2 "subjects" per topic, , b) proper usage is, appreciated.
confluent schema registry bit inconsistent subject names :)
indeed, kafkaavroserializer
(used new kafka 0.8.2 producer) uses topic-key|value
pattern subjects (link) whereas kafkaavroencoder
(for old producer) uses schema.getname()-value
pattern (link).
the reason why 1 have 2 different subjects per topic (one key, 1 value) pretty simple:
say have avro schema representing log entry, , each log entry has source information attached it:
{ "type":"record", "name":"logentry", "fields":[ { "name":"line", "type":"string" }, { "name":"source", "type":{ "type":"record", "name":"sourceinfo", "fields":[ { "name":"host", "type":"string" }, { "name":"...", "type":"string" } ] } } ] }
a common use case want partition entries source, have 2 subjects associated topic (and subjects revisions of avro schemas) - 1 key (which sourceinfo
) , 1 value (logentry
).
having these 2 subjects partitioning , storing data long have schema registry running , producers/consumers can talk it. modifications these schemas reflected in schema registry , long satisfy compatibility settings should serialize/deserialize without having care this.
note: further information my personal thoughts , maybe don't yet understand how supposed work might wrong.
i more how kafkaavroencoder
implemented rather kafkaavroserializer
. kafkaavroencoder
not in way enforce use 1 schema per topic key\value whereas kafkaavroserializer
does. might issue when plan produce data multiple avro schemas 1 topic. in case kafkaavroserializer
try update topic-key
, topic-value
subjects , 99% break if compatibility violated (and if have multiple avro schemas different , incompatible each other).
on other side, kafkaavroencoder
cares schema names , may safely produce data multiple avro schemas 1 topic , should work fine (you have many subjects schemas).
this inconsistency still unclear me , hope confluent guys can explain if see question/answer.
hope helps you
Comments
Post a Comment