hadoop - Why to use multiple column families in HBase? -


why use multiple column families in hbase , advantages of these tuples?

already documented in official hbase guide, take @ statements in bold: on number of column families hbase not above 2 or 3 column families keep number of column families in schema low. currently, flushing , compactions done on per region basis if 1 column family carrying bulk of data bringing on flushes, adjacent families flushed though amount of data carry small. when many column families flushing , compaction interaction can make bunch of needless i/o loading (to addressed changing flushing , compaction work on per column family basis). more information on compactions, see compaction.

try make 1 column family if can in schemas. introduce second , third column family in case data access column scoped; i.e. query 1 column family or other not both @ 1 time.

33.1. cardinality of columnfamilies

where multiple columnfamilies exist in single table, aware of cardinality (i.e., number of rows). if columnfamilya has 1 million rows , columnfamilyb has 1 billion rows, columnfamilya’s data spread across many, many regions (and regionservers). makes mass scans columnfamilya less efficient.

one example have analytics table daily, monthly, yearly , total column families, each 1 own ttl settings (expiration) , columns each date ranges (days, months, years...), they're different scopes , when query table, fetch 1 type of aggregation @ time, i.e.: retrieve daily stats of last 30 days

if want know more schema design take @ great introduction hbase schema design amandeep khurana


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -