hadoop - Why to use multiple column families in HBase? -
why use multiple column families in hbase , advantages of these tuples?
already documented in official hbase guide, take @ statements in bold: on number of column families hbase not above 2 or 3 column families keep number of column families in schema low. currently, flushing , compactions done on per region basis if 1 column family carrying bulk of data bringing on flushes, adjacent families flushed though amount of data carry small. when many column families flushing , compaction interaction can make bunch of needless i/o loading (to addressed changing flushing , compaction work on per column family basis). more information on compactions, see compaction.
try make 1 column family if can in schemas. introduce second , third column family in case data access column scoped; i.e. query 1 column family or other not both @ 1 time.
33.1. cardinality of columnfamilies
where multiple columnfamilies exist in single table, aware of cardinality (i.e., number of rows). if columnfamilya has 1 million rows , columnfamilyb has 1 billion rows, columnfamilya’s data spread across many, many regions (and regionservers). makes mass scans columnfamilya less efficient.
one example have analytics table daily, monthly, yearly , total column families, each 1 own ttl settings (expiration) , columns each date ranges (days, months, years...), they're different scopes , when query table, fetch 1 type of aggregation @ time, i.e.: retrieve daily stats of last 30 days
if want know more schema design take @ great introduction hbase schema design amandeep khurana
Comments
Post a Comment