python - Obtain a list containing string elements excluding elements prefixed with any other element from initial list -


i have trouble filtering list of strings. found similar question here not need.

the input list is:

l = ['ab', 'xc', 'abb', 'abed', 'sdfdg', 'abfdsdg', 'xccc'] 

and expected result

['ab', 'xc', 'sdfdg'] 

the order of items in result not important

the filter function must fast because size of list big

my current solution is

l = ['ab', 'xc', 'abb', 'abed', 'sdfdg', 'abfdsdg', 'xccc'] in range(0, len(l) - 1):     j in range(i + 1, len(l)):         if l[j].startswith(l[i]):             l[j] = l[i]         else:             if l[i].startswith(l[j]):                 l[i] = l[j]  print list(set(l))  

edit

after multiple tests big input data, list 1500000 strings, best solution is:

def filter(l):     if l==[]:         return []     l2=[]     l2.append(l[0])     llen = len(l)     k=0     itter = 0     while k<llen:         addkelem = ''         j=0         l2len = len(l2)         while j<l2len:             if (l2[j].startswith(l[k]) , l[k]!= l2[j]):                 l2[j]=l[k]                 l.remove(l[k])                 llen-=1                 j-=1                 addkelem = ''                 continue             if (l[k].startswith(l2[j])):                 addkelem = ''                 break             elif(l[k] not in l2):                 addkelem = l[k]             j+=1         if addkelem != '':             l2.append(addkelem)             addkelem = ''         k+=1     return l2 

for execution time around of 213 seconds

sample imput data - each line string in list

this algorithm completes task in 0.97 second on computer, the input file submitted author (154mb):

l.sort()  last_str = l[0] filtered = [last_str] app      = filtered.append  str in l:     if not str.startswith(last_str):         last_str = str         app(str)  # commented because of massive amount of data print. # print filtered 

the algorithm simple: first sort list lexicographically, search first string isn't prefixed first 1 of list, search next 1 isn't prefixed last unprefixed one, etc.

if list sorted (your example file seems sorted), can remove l.sort() line, result in o(n) complexity in both time , memory.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -