python - carriage lines and splitting cell values in pandas DataFrame -
my data in pandas dataframe each row, structured this:
> df={'date1': '0 \r created february 21, 2015', 'amt': '$50,815 raised 498 donors'}
i want
>df={'month': 'february', 'day': 21, 'year': '2015, 'cur': '$', 'raised': '50815', 'num_donor': '498'}
df.date1
, many of cells contain carriage returns, several in row (at beginning , end of strings). there way remove them entire dataframe?
in cases, works:
> df['date1'] = df['date1'].map(lambda x: str(x).lstrip('\r created').rstrip('...'))
but not work (code diff columns). example, none of following remove \r or ','
> df['raised2'][0] = ,50,815,\r > df['raised2'] = df['raised2'].map(lambda x: str(x).lstrip('\r').rstrip('\r')) > rm_carriage = lambda x: re.findall("^/\r*(.*?)/\r*$", str(x)) > df.applymap(carriage_function)
this gets me month same logic not day or year
> df['month'] = df['date1'].map(lambda x: x.split()[0]) > df['day'] = df['date1'].map(lambda x: x.split()[1]) #indexerror
depends on exact data, example should work
df['date1_splitted'] = df.date1.str.replace('\r|,', ' ').apply(lambda x: filter(none, x.split(' '))) df['year'] = df.date1_splitted.apply(lambda x: x[2]) df['day'] = df.date1_splitted.apply(lambda x: x[1]) df['month'] = df.date1_splitted.apply(lambda x: x[0]) df['amt_splitted'] = df.amt.str.replace('\r|,', '').apply(lambda x: x.split(' ')) df['cur'] = df.amt_splitted.apply(lambda x: x[0][0]) df['raised'] = df.amt_splitted.apply(lambda x: x[0][1:]) df['num_donors'] = df.amt_splitted.apply(lambda x: x[-2])
Comments
Post a Comment