i have sports data of following structure:
season, country, league, hometeam, awayteam, status
the status can h, or d or a
i want create pivot table each season, country , league showing h percentage of total matches.
import pandas pd grouped_success_rate = df_data.groupby(["season", "country", "league"]) homewins_per_league = grouped_success_rate.apply(lambda x: x[(x["status"] == "h")].shape[0]).unstack("season") homewins_per_league.fillna(0, inplace=true) homewins_per_league["total"] = homewins_per_league.apply(lambda x: sum(x), axis=1) total_matches_per_league = grouped_success_rate.apply(lambda x: len(x["status"])).unstack("season") total_matches_per_league.fillna(0, inplace=true) total_matches_per_league["total"] = total_matches_per_league.apply(lambda x: sum(x), axis=1) homewins_rate_per_league = (homewins_per_league / total_matches_per_league).applymap(lambda x: round(x, 3))
as can see group dataframe, have create separate dataframe specific option , total. there way without creating 2 dataframe ?
i think easiest way add column dataframe first:
df_data['is_h'] = df_data['status'] == 'h' df_data.groupby(["season", "country", "league"])['is_h'].mean()