Using Pandas groupby to calculate many slopes -


some illustrative data in dataframe (multiindex) format:

|entity| year |value| +------+------+-----+ | | 1999 | 2 | | | 2004 | 5 | | b | 2003 | 3 | | | 2007 | 2 | | | 2014 | 7 |

i calculate slope using scipy.stats.linregress each entity a , b in above example. tried using groupby on first column, following split-apply-combine advice, seems problematic since it's expecting 1 series of values (a , b), whereas need operate on 2 columns on right.

this done in r via plyr, not sure how approach in pandas.

a function can applied groupby apply function. passed function in case linregress. please see below:

in [4]: x = pd.dataframe({'entity':['a','a','b','b','b'],                           'year':[1999,2004,2003,2007,2014],                           'value':[2,5,3,2,7]})  in [5]: x out[5]:    entity  value  year 0           2  1999 1           5  2004 2      b      3  2003 3      b      2  2007 4      b      7  2014   in [6]: scipy.stats import linregress  in [7]: x.groupby('entity').apply(lambda v: linregress(v.year, v.value)[0]) out[7]:  entity    0.600000 b    0.403226