python - Obtain a list containing string elements excluding elements prefixed with any other element from initial list -
i have trouble filtering list of strings. found similar question here not need.
the input list is:
l = ['ab', 'xc', 'abb', 'abed', 'sdfdg', 'abfdsdg', 'xccc']
and expected result
['ab', 'xc', 'sdfdg']
the order of items in result not important
the filter function must fast because size of list big
my current solution is
l = ['ab', 'xc', 'abb', 'abed', 'sdfdg', 'abfdsdg', 'xccc'] in range(0, len(l) - 1): j in range(i + 1, len(l)): if l[j].startswith(l[i]): l[j] = l[i] else: if l[i].startswith(l[j]): l[i] = l[j] print list(set(l))
edit
after multiple tests big input data, list 1500000 strings, best solution is:
def filter(l): if l==[]: return [] l2=[] l2.append(l[0]) llen = len(l) k=0 itter = 0 while k<llen: addkelem = '' j=0 l2len = len(l2) while j<l2len: if (l2[j].startswith(l[k]) , l[k]!= l2[j]): l2[j]=l[k] l.remove(l[k]) llen-=1 j-=1 addkelem = '' continue if (l[k].startswith(l2[j])): addkelem = '' break elif(l[k] not in l2): addkelem = l[k] j+=1 if addkelem != '': l2.append(addkelem) addkelem = '' k+=1 return l2
for execution time around of 213 seconds
sample imput data - each line string in list
this algorithm completes task in 0.97 second on computer, the input file submitted author (154mb):
l.sort() last_str = l[0] filtered = [last_str] app = filtered.append str in l: if not str.startswith(last_str): last_str = str app(str) # commented because of massive amount of data print. # print filtered
the algorithm simple: first sort list lexicographically, search first string isn't prefixed first 1 of list, search next 1 isn't prefixed last unprefixed one, etc.
if list sorted (your example file seems sorted), can remove l.sort()
line, result in o(n) complexity in both time , memory.
Comments
Post a Comment