python - Equivalent of Series.map for DataFrame? -


using series.map series argument, can take elements of series , use them indices series. want same thing columns of dataframe, using each row set of index levels multiindex-ed series. here example:

>>> d = pandas.dataframe([["a", 1], ["b", 2], ["c", 3]], columns=["x", "y"]) >>> d    x  y 0   1 1  b  2 2  c  3  [3 rows x 2 columns] >>> s = pandas.series(np.arange(9), index=pandas.multiindex.from_product([["a", "b", "c"], [1, 2, 3]])) >>> s  1    0    2    1    3    2 b  1    3    2    4    3    5 c  1    6    2    7    3    8 dtype: int32 

what able d.map(s), each row of d should taken tuple use index multiindex of s. is, want same result this:

>>> s.ix[[("a", 1), ("b", 2), ("c", 3)]]  1    0 b  2    4 c  3    8 dtype: int32 

however, dataframe, unlike series, has no map method. other obvious alternative, s.ix[d], gives me error "cannot index multidimensional key", apparently not supported either.

i know can converting dataframe list of lists, or using row-wise apply grab each item 1 one, isn't there way without amount of overhead? how can equivalent of series.map on multiple columns @ once?

you create multiindex dataframe , ix/loc using that:

in [11]: mi = pd.multiindex.from_arrays(d.values.t)  in [12]: s.loc[mi]  # can use ix out[12]:  1    0 b  2    4 c  3    8 dtype: int64 

this pretty efficient:

in [21]: s = pandas.series(np.arange(1000*1000), index=pandas.multiindex.from_product([range(1000), range(1000)]))  in [22]: d = pandas.dataframe(zip(range(1000), range(1000)), columns=["x", "y"])  in [23]: %timeit mi = pd.multiindex.from_arrays(d.values.t); s.loc[mi] 100 loops, best of 3: 2.77 ms per loop  in [24]: %timeit s.apply(lambda x: x + 1)  # @ least compared apply 1 loops, best of 3: 3.14 s per loop 

Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -