python - Modify DataFrame passed as argument -
i have timeseries dataframe (df) need add column, , pass df function modifies content of time slice of single column. idea follows:
rng = pd.date_range('1/1/2011', periods=3, freq='h') df= pd.dataframe([0,0,0],columns=['a'],index=rng) df['b']=0 def v(dff,n): dff.loc[rng[0]:rng[1],:].b=n
as far understand python argument passing, call v(df,n) should modify dataframe problem id not time.
the following code demonstrates problem:
v(df,1) print("ater first: ", df) v(df,2) print("after second: ", df) ('ater first: ', b 2011-01-01 00:00:00 0 0 2011-01-01 01:00:00 0 0 2011-01-01 02:00:00 0 0 [3 rows x 2 columns]) ('after second: ', b 2011-01-01 00:00:00 0 2 2011-01-01 01:00:00 0 2 2011-01-01 02:00:00 0 0
which surprising, because expect column b ether 0,0,0, or first 1,1,0, , 2,2,0.
the things stranger if put single print(df) before first call v. code:
print("before: ", df) v(df,1) print("ater first: ", df) v(df,2) print("after second: ", df) produces: ('before: ', b 2011-01-01 00:00:00 0 0 2011-01-01 01:00:00 0 0 2011-01-01 02:00:00 0 0 [3 rows x 2 columns]) ('ater first: ', b 2011-01-01 00:00:00 0 1 2011-01-01 01:00:00 0 1 2011-01-01 02:00:00 0 0 [3 rows x 2 columns]) ('after second: ', b 2011-01-01 00:00:00 0 2 2011-01-01 01:00:00 0 2 2011-01-01 02:00:00 0 0
so result depends on whether print df ferore call function modifies it!
this happens if , if add new column df, take time range slice , modify column. if create dataframe 2 columns in first place, things work expected.
what going on? bug in pandas or in python or understanding of how things work in python fundamentally wrong?
thanks
i think problem chain indexing
work, if change function line this:
def v(dff,n): dff.loc[rng[0]:rng[1],'b']=n
then works expected, recommended semantic assignment works in cases.
Comments
Post a Comment