python - carriage lines and splitting cell values in pandas DataFrame -


my data in pandas dataframe each row, structured this:

> df={'date1': '0 \r created february 21, 2015', 'amt': '$50,815 raised 498 donors'} 

i want

>df={'month': 'february', 'day': 21, 'year': '2015, 'cur': '$', 'raised': '50815', 'num_donor': '498'}  

df.date1, many of cells contain carriage returns, several in row (at beginning , end of strings). there way remove them entire dataframe?

in cases, works:

> df['date1'] = df['date1'].map(lambda x: str(x).lstrip('\r created').rstrip('...')) 

but not work (code diff columns). example, none of following remove \r or ','

> df['raised2'][0] = ,50,815,\r   > df['raised2'] = df['raised2'].map(lambda x: str(x).lstrip('\r').rstrip('\r'))  > rm_carriage = lambda x: re.findall("^/\r*(.*?)/\r*$", str(x))  > df.applymap(carriage_function) 

this gets me month same logic not day or year

> df['month'] = df['date1'].map(lambda x: x.split()[0])  > df['day'] = df['date1'].map(lambda x: x.split()[1])   #indexerror 

depends on exact data, example should work

df['date1_splitted'] = df.date1.str.replace('\r|,', ' ').apply(lambda x: filter(none, x.split(' '))) df['year'] = df.date1_splitted.apply(lambda x: x[2]) df['day'] = df.date1_splitted.apply(lambda x: x[1]) df['month'] = df.date1_splitted.apply(lambda x: x[0]) df['amt_splitted'] = df.amt.str.replace('\r|,', '').apply(lambda x: x.split(' ')) df['cur'] = df.amt_splitted.apply(lambda x: x[0][0]) df['raised'] = df.amt_splitted.apply(lambda x: x[0][1:]) df['num_donors'] = df.amt_splitted.apply(lambda x: x[-2]) 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -