python - Fill NA Values in pandas Series with a stop -


i'm analyzing time series, , based on criteria, can pick out rows either start or end of events. @ point, series looks (i've left out repetitive values brevity):

the setup

import numpy np import pandas pandas import timestamp  datadict = {'event': {   timestamp('2010-01-01 00:20:00', tz=none): 'event start',   timestamp('2010-01-01 00:30:00', tz=none): '--',   timestamp('2010-01-01 00:40:00', tz=none): '--',   timestamp('2010-01-01 00:50:00', tz=none): '--',   timestamp('2010-01-01 01:00:00', tz=none): '--',   timestamp('2010-01-01 01:10:00', tz=none): 'event end',   timestamp('2010-01-01 01:20:00', tz=none): '--',   timestamp('2010-01-01 02:20:00', tz=none): '--',   timestamp('2010-01-01 02:30:00', tz=none): 'event start',   timestamp('2010-01-01 02:40:00', tz=none): '--',   timestamp('2010-01-01 02:50:00', tz=none): '--',   timestamp('2010-01-01 03:00:00', tz=none): '--',   timestamp('2010-01-01 03:10:00', tz=none): '--',   timestamp('2010-01-01 03:20:00', tz=none): '--',   timestamp('2010-01-01 03:30:00', tz=none): 'event end', }} data = pandas.dataframe.from_dict(datadict)                             event 2010-01-01 00:20:00  event start 2010-01-01 00:30:00           -- 2010-01-01 00:40:00           -- 2010-01-01 00:50:00           -- 2010-01-01 01:00:00           -- 2010-01-01 01:10:00    event end 2010-01-01 01:20:00           -- 2010-01-01 02:20:00           -- 2010-01-01 02:30:00  event start 2010-01-01 02:40:00           -- 2010-01-01 02:50:00           -- 2010-01-01 03:00:00           -- 2010-01-01 03:10:00           -- 2010-01-01 03:20:00           -- 2010-01-01 03:30:00    event end 

here's achieve (ideally without for loops)

                           event  event number 2010-01-01 00:20:00  event start  1 2010-01-01 00:30:00           --  1 2010-01-01 00:40:00           --  1 2010-01-01 00:50:00           --  1 2010-01-01 01:00:00           --  1 2010-01-01 01:10:00    event end  1 2010-01-01 01:20:00           --  na 2010-01-01 02:20:00           --  na 2010-01-01 02:30:00  event start  2 2010-01-01 02:40:00           --  2 2010-01-01 02:50:00           --  2 2010-01-01 03:00:00           --  2 2010-01-01 03:10:00           --  2 2010-01-01 03:20:00           --  2 2010-01-01 03:30:00    event end  2 2010-01-01 03:40:00           --  na 2010-01-01 03:50:00           --  na 

here's i've tried

with optimistic assumptions quality of data, can event numbers this:

table = data[data.event != '--'].reset_index() table['event number'] = 1 + np.floor(table.index / 2) table = table.set_index('index')                             event  event number index                                          2010-01-01 00:20:00  event start             1 2010-01-01 01:10:00    event end             1 2010-01-01 02:30:00  event start             2 2010-01-01 03:30:00    event end             2 

i can join original dataframe, , fillna method='ffill'

data2 = data.join(table[['event number']]) data2['filled'] = data2['event number'].fillna(method='ffill')                             event  event number  filled 2010-01-01 00:20:00  event start             1       1 2010-01-01 00:30:00           --           nan       1 2010-01-01 00:40:00           --           nan       1 2010-01-01 00:50:00           --           nan       1 2010-01-01 01:00:00           --           nan       1 2010-01-01 01:10:00    event end             1       1 2010-01-01 01:20:00           --           nan       1 # <- d'oh 2010-01-01 02:20:00           --           nan       1 # <- d'oh  2010-01-01 02:30:00  event start             2       2 2010-01-01 02:40:00           --           nan       2 2010-01-01 02:50:00           --           nan       2 2010-01-01 03:00:00           --           nan       2 2010-01-01 03:10:00           --           nan       2 2010-01-01 03:20:00           --           nan       2 2010-01-01 03:30:00    event end             2       2 

the problem

as can see, time between events (01:20 through 02:20) being associated event #1.

the question

is there anyway skip on these sections without looping?

you can achieve looking @ cumulative summation of number of event start , number of event end:

>>> data['event number'] = (data.event == 'event start').cumsum() >>> data                            event  event number 2010-01-01 00:20:00  event start             1 2010-01-01 00:30:00           --             1 2010-01-01 00:40:00           --             1 2010-01-01 00:50:00           --             1 2010-01-01 01:00:00           --             1 2010-01-01 01:10:00    event end             1 2010-01-01 01:20:00           --             1 2010-01-01 02:20:00           --             1 2010-01-01 02:30:00  event start             2 2010-01-01 02:40:00           --             2 2010-01-01 02:50:00           --             2 2010-01-01 03:00:00           --             2 2010-01-01 03:10:00           --             2 2010-01-01 03:20:00           --             2 2010-01-01 03:30:00    event end             2 

now need set nan when there no event; places corresponds rows cumulative summation of event start equal cumulative summation of event end (with shifting 1 row)

>>> idx = data['event number'] == (data.event.shift(1) == 'event end').cumsum() >>> data.loc[idx, 'event number'] = np.nan >>> data                            event  event number 2010-01-01 00:20:00  event start             1 2010-01-01 00:30:00           --             1 2010-01-01 00:40:00           --             1 2010-01-01 00:50:00           --             1 2010-01-01 01:00:00           --             1 2010-01-01 01:10:00    event end             1 2010-01-01 01:20:00           --           nan 2010-01-01 02:20:00           --           nan 2010-01-01 02:30:00  event start             2 2010-01-01 02:40:00           --             2 2010-01-01 02:50:00           --             2 2010-01-01 03:00:00           --             2 2010-01-01 03:10:00           --             2 2010-01-01 03:20:00           --             2 2010-01-01 03:30:00    event end             2  [15 rows x 2 columns] 

Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -