python - Equivalent of Series.map for DataFrame? -
using series.map
series argument, can take elements of series , use them indices series. want same thing columns of dataframe, using each row set of index levels multiindex-ed series. here example:
>>> d = pandas.dataframe([["a", 1], ["b", 2], ["c", 3]], columns=["x", "y"]) >>> d x y 0 1 1 b 2 2 c 3 [3 rows x 2 columns] >>> s = pandas.series(np.arange(9), index=pandas.multiindex.from_product([["a", "b", "c"], [1, 2, 3]])) >>> s 1 0 2 1 3 2 b 1 3 2 4 3 5 c 1 6 2 7 3 8 dtype: int32
what able d.map(s)
, each row of d
should taken tuple use index multiindex of s
. is, want same result this:
>>> s.ix[[("a", 1), ("b", 2), ("c", 3)]] 1 0 b 2 4 c 3 8 dtype: int32
however, dataframe, unlike series, has no map
method. other obvious alternative, s.ix[d]
, gives me error "cannot index multidimensional key", apparently not supported either.
i know can converting dataframe list of lists, or using row-wise apply
grab each item 1 one, isn't there way without amount of overhead? how can equivalent of series.map
on multiple columns @ once?
you create multiindex dataframe , ix/loc using that:
in [11]: mi = pd.multiindex.from_arrays(d.values.t) in [12]: s.loc[mi] # can use ix out[12]: 1 0 b 2 4 c 3 8 dtype: int64
this pretty efficient:
in [21]: s = pandas.series(np.arange(1000*1000), index=pandas.multiindex.from_product([range(1000), range(1000)])) in [22]: d = pandas.dataframe(zip(range(1000), range(1000)), columns=["x", "y"]) in [23]: %timeit mi = pd.multiindex.from_arrays(d.values.t); s.loc[mi] 100 loops, best of 3: 2.77 ms per loop in [24]: %timeit s.apply(lambda x: x + 1) # @ least compared apply 1 loops, best of 3: 3.14 s per loop
Comments
Post a Comment