i have array values belongs set. transform array in binary matrix, each column of matrix represent each possible value of set, row value 1 column matches input array or 0 others. think name binary pivot.
the input array column of table type
example of input array (the previous example capital letters, led misinterpretation):
'apple'
'banana'
'cherry'
'dragonfruit'
'apple'
'cherry'
so, in example input assume 4 different values: 'apple', 'banana', 'cherry' or 'dragonfruit', in real scenario can more 4.
example output matrix:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 0 1 0
i have achieved desired behavior, know if there better way perform operation. in vectorized way (without for-loop each category) or using built-in function.
function [ binmatrix, categs ] = pivottobinarymatrix( input ) categorizedinput = categorical(input); categs = categories(categorizedinput); binmatrix = zeros(size(atributo, 1), size(categorias, 1)); = 1: size(caters,1) binmatrix(:,i) = ismember(categorizedinput, categs(i)); end end
for 50.000 entries 9 categories performed in 0.075137 seconds.
edit: i've improved examples, because previous examples led misinterpretation.
i'm going assume input array cell array of characters so:
inputarray = {'apple', 'banana', 'cherry', 'dragonfruit', 'apple', 'cherry'};
you can convert above numeric array using unique
function's third output. what's great unique
assigns unique id in sorted order, , if have cell array of characters, respects lexicographical ordering of characters.
next, declare matrix of zeros (like did above) use sub2ind
index matrix , set values 1.
something this. bear in mind initialized output differently. it's trick learned allocate matrix of zeroes quite fast. see here: faster way initialize arrays via empty matrix multiplication? (matlab)
inputarray = {'apple', 'banana', 'cherry', 'dragonfruit', 'apple', 'cherry'}; [~,~,inputnum] = unique(inputarray); inputnum = inputnum.'; %// make compatible in dimensions binmatrix(numel(inputarray), max(inputnum)) = 0; binmatrix(sub2ind(size(binmatrix), 1:numel(inputarray), inputnum)) = 1;
another method create sparse
logical array set right row , column positions 1, use index our zeroes array , set values accordingly.
something like:
inputarray = {'apple', 'banana', 'cherry', 'dragonfruit', 'apple', 'cherry'}; [~,~,inputnum] = unique(inputarray); inputnum = inputnum.'; %// make compatible in dimensions binmatrix = sparse(1:numel(inputarray), inputnum, 1, numel(inputarray), max(inputnum)); binmatrix = full(binmatrix);
let's put in timing script. i've incorporated 2 methods above, plus old method, plus divakar's (only first method) , brodroll's (very ingenious btw) method. divakar's , brodroll's method, have used unique
third output original inquiry had capital letters confused all. using third output can convert previous methods new specifications.
btw, example , code mismatched. example has set each column index it's each row. timing tests, i'm going transpose result.i'm running matlab r2013a on mac os x 10.10.3 16 gb of ram , intel i7 2.3 ghz processor. so:
clear all; close all; %// generate dictionary chars = {'apple', 'banana', 'cherry', 'dragonfruit'}; rng(123); %// generate 50000 random words v = randi(numel(chars), 50000, 1); inputarray = chars(v); [~,~,inputnum] = unique(inputarray); inputnum = inputnum.'; %// make compatible in dimensions %// timing #1 - sub2ind tic; binmatrix(numel(inputarray), max(inputnum)) = 0; binmatrix(sub2ind(size(binmatrix), 1:numel(inputarray), inputnum)) = 1; t = toc; clear binmatrix; %// timing #2 - sparse tic; binmatrix = sparse(1:numel(inputarray), inputnum, 1, numel(inputarray), max(inputnum)); binmatrix = full(binmatrix); t2 = toc; clear binmatrix; %// timing #3 - ismember , tic; binmatrix = zeros(numel(inputarray), numel(chars)); = 1: size(binmatrix,1) binmatrix(i,:) = ismember(chars, inputarray(i)); end t3 = toc; %// timing #4 - bsxfun clear binmatrix; tic; binmatrix = bsxfun(@eq,inputnum',unique(inputnum)); %// changed make dimensions match t4 = toc; clear binmatrix; %// timing #5 - raw sub2ind tic; binmatrix(numel(inputarray), max(inputnum)) = 0; binmatrix( (inputnum-1)*size(binmatrix,1) + [1:numel(inputarray)] ) = 1; t5 = toc; fprintf('timing using sub2ind: %f seconds\n', t); fprintf('timing using sparse: %f seconds\n', t2); fprintf('timing using ismember , loop: %f seconds\n', t3); fprintf('timing using bsxfun: %f seconds\n', t4); fprintf('timing using raw sub2ind: %f seconds\n', t5);
we get:
timing using sub2ind: 0.004223 seconds timing using sparse: 0.004252 seconds timing using ismember , loop: 2.771389 seconds timing using bsxfun: 0.020739 seconds timing using raw sub2ind: 0.000773 seconds
in terms of rank:
- raw
sub2ind
sub2ind
sparse
bsxfun
- op's method