some illustrative data in dataframe (multiindex) format:
|entity| year |value| +------+------+-----+ | | 1999 | 2 | | | 2004 | 5 | | b | 2003 | 3 | | | 2007 | 2 | | | 2014 | 7 |
i calculate slope using scipy.stats.linregress
each entity a
, b
in above example. tried using groupby
on first column, following split-apply-combine advice, seems problematic since it's expecting 1 series
of values (a
, b
), whereas need operate on 2 columns on right.
this done in r via plyr
, not sure how approach in pandas.
a function can applied groupby
apply
function. passed function in case linregress
. please see below:
in [4]: x = pd.dataframe({'entity':['a','a','b','b','b'], 'year':[1999,2004,2003,2007,2014], 'value':[2,5,3,2,7]}) in [5]: x out[5]: entity value year 0 2 1999 1 5 2004 2 b 3 2003 3 b 2 2007 4 b 7 2014 in [6]: scipy.stats import linregress in [7]: x.groupby('entity').apply(lambda v: linregress(v.year, v.value)[0]) out[7]: entity 0.600000 b 0.403226