database - What is a good algorithm to check whether or not a number exist in multiple sets without searching them all? -
scenario
let's have multiple databases in 3 zones. zone a
, b
, , c
. each zone in different geographical location. @ same time, have application route username , password based on geographical location of user. example, user a
redirected database in zone a
. user b
zone b
, on.
now, let's user a
moves zone b
. application query zone b
, won't find anything. querying zone a
, zone c
might take time due zones far away, , have query databases in zones.
my question
how can verify if string/number exists in multiple sets?
or
how can verify row exist in database before sending query?
my algorithm
this not perfect, give idea i'm trying do
if have database following 3 users
- foo
- bar
- foobar
we take hash of 3 users, , next prime number if hash not prime.
sum = hash(foo).nextprime() * hash(bar).nextprime() * hash(foobar).nextprime()
that sum shared between zones. if want check foo
, can take hash of foo, , next prime, take gcd(foo,sum)
. if it's not equal one. means foo exist in database. if equal one, means foo doesn't exist @ all. if want add new username. can sum = sum * hash(newusername).nextprime().
sum grow point faster query databases.
do know similar algorithm solve problem?
one data structure suitable application bloom filter.
a bloom filter probabilistic data structure allows test whether item in set. if test returning false item not in set (0% false negatives), if true may in set, not guaranteed (false positives possible).
the filter implemented bit array m bits , set of k hash functions. add item array (e.g. username), hash item using each of hash functions , take modulo m of each hash value compute indexes set in bit array. test if item in set, compute hashes , indexes , check of corresponding bits in array set 1. if of them 0 them item not in set, if 1 item in set, there small chance may not be, percentage of false positives can reduced using larger m.
to implement k hash functions, possible use same hashing algorithm (e.g. crc32, md5 etc) append different salts username string each before passing hash function, creating "new" hash functions each salt. given m , n (number of elements being added), optimal number of hash functions k = (m / n) ln 2
for application, bloom filter bit array shared across zones b c etc. when user attempts login, first check in database of local zone, , if present log them in normal. if not present in local database, check bloom filter , if result negative know sure don't exist in zone. if positive, still need check databases in other zones (because of possibility of false positive), presumably isn't big issue because contacting other zones in case transfer user's data in case true positive.
one down-side of using bloom filter is difficult (though not impossible) remove elements set once have been added.
Comments
Post a Comment