lets assume have data frame:
df = pd.dataframe({'label': [0, 1, 2, 0, 1, 2], 'cat_col': [1, 1, 2, 2, 3, 3]}) cat_col label 0 1 0 1 1 1 2 2 2 3 2 0 4 3 1 5 3 2
i want transform data frame following:
cat_col, label, count_when_label_is_0, count_when_label_is_1, count_when_label_is_2 1 0 1, 1, 0 1 1 1, 1, 0 ...
so add 1 column each label value (multinomial label) , each row put count label value when row.cat_col in row. have slow:
size = df[['cat_col', 'label']].groupby(['cat_col', 'label']).size() def get_size(cat_val, label_val): if label_val in size[cat_val]: return size[cat_val][target_val] return 0 label_val in range(9): # 9 classes in multinominal label df['new_col_' + str(label_val)] = df['cat_col'].apply( lambda cat_val: get_size(cat_val, label_val))
you can use pivot_table
:
in [11]: df.pivot_table(index="cat_col", columns="label", aggfunc=len, fill_value=0) out[11]: label 0 1 2 cat_col 1 1 1 0 2 1 0 1 3 0 1 1