performance - Pivot to binary matrix from categorial array -


i have array values belongs set. transform array in binary matrix, each column of matrix represent each possible value of set, row value 1 column matches input array or 0 others. think name binary pivot.

the input array column of table type

example of input array (the previous example capital letters, led misinterpretation):

'apple'
'banana'
'cherry'
'dragonfruit'
'apple'
'cherry'

so, in example input assume 4 different values: 'apple', 'banana', 'cherry' or 'dragonfruit', in real scenario can more 4.

example output matrix:

1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 0 1 0

i have achieved desired behavior, know if there better way perform operation. in vectorized way (without for-loop each category) or using built-in function.

 function [ binmatrix, categs ] = pivottobinarymatrix( input )       categorizedinput = categorical(input);        categs = categories(categorizedinput);        binmatrix = zeros(size(atributo, 1), size(categorias, 1));        = 1: size(caters,1)            binmatrix(:,i) = ismember(categorizedinput, categs(i));       end  end 

for 50.000 entries 9 categories performed in 0.075137 seconds.

edit: i've improved examples, because previous examples led misinterpretation.

i'm going assume input array cell array of characters so:

inputarray = {'apple', 'banana', 'cherry', 'dragonfruit', 'apple', 'cherry'}; 

you can convert above numeric array using unique function's third output. what's great unique assigns unique id in sorted order, , if have cell array of characters, respects lexicographical ordering of characters.

next, declare matrix of zeros (like did above) use sub2ind index matrix , set values 1.

something this. bear in mind initialized output differently. it's trick learned allocate matrix of zeroes quite fast. see here: faster way initialize arrays via empty matrix multiplication? (matlab)

inputarray = {'apple', 'banana', 'cherry', 'dragonfruit', 'apple', 'cherry'}; [~,~,inputnum] = unique(inputarray); inputnum = inputnum.'; %// make compatible in dimensions binmatrix(numel(inputarray), max(inputnum)) = 0; binmatrix(sub2ind(size(binmatrix), 1:numel(inputarray), inputnum)) = 1; 

another method create sparse logical array set right row , column positions 1, use index our zeroes array , set values accordingly.

something like:

inputarray = {'apple', 'banana', 'cherry', 'dragonfruit', 'apple', 'cherry'}; [~,~,inputnum] = unique(inputarray); inputnum = inputnum.'; %// make compatible in dimensions binmatrix = sparse(1:numel(inputarray), inputnum, 1, numel(inputarray), max(inputnum)); binmatrix = full(binmatrix); 

let's put in timing script. i've incorporated 2 methods above, plus old method, plus divakar's (only first method) , brodroll's (very ingenious btw) method. divakar's , brodroll's method, have used unique third output original inquiry had capital letters confused all. using third output can convert previous methods new specifications.

btw, example , code mismatched. example has set each column index it's each row. timing tests, i'm going transpose result.i'm running matlab r2013a on mac os x 10.10.3 16 gb of ram , intel i7 2.3 ghz processor. so:

clear all; close all;  %// generate dictionary chars = {'apple', 'banana', 'cherry', 'dragonfruit'};  rng(123);  %// generate 50000 random words v = randi(numel(chars), 50000, 1); inputarray = chars(v); [~,~,inputnum] = unique(inputarray); inputnum = inputnum.'; %// make compatible in dimensions  %// timing #1 - sub2ind tic; binmatrix(numel(inputarray), max(inputnum)) = 0; binmatrix(sub2ind(size(binmatrix), 1:numel(inputarray), inputnum)) = 1; t = toc;  clear binmatrix;  %// timing #2 - sparse tic; binmatrix = sparse(1:numel(inputarray), inputnum, 1, numel(inputarray), max(inputnum)); binmatrix = full(binmatrix); t2 = toc;  clear binmatrix;  %// timing #3 - ismember , tic; binmatrix = zeros(numel(inputarray), numel(chars)); = 1: size(binmatrix,1) binmatrix(i,:) = ismember(chars, inputarray(i)); end t3 = toc;  %// timing #4 - bsxfun clear binmatrix; tic; binmatrix = bsxfun(@eq,inputnum',unique(inputnum)); %// changed make dimensions match t4 = toc;  clear binmatrix;  %// timing #5 - raw sub2ind tic; binmatrix(numel(inputarray), max(inputnum)) = 0; binmatrix( (inputnum-1)*size(binmatrix,1) + [1:numel(inputarray)] ) = 1; t5 = toc;  fprintf('timing using sub2ind: %f seconds\n', t); fprintf('timing using sparse: %f seconds\n', t2); fprintf('timing using ismember , loop: %f seconds\n', t3); fprintf('timing using bsxfun: %f seconds\n', t4); fprintf('timing using raw sub2ind: %f seconds\n', t5); 

we get:

timing using sub2ind: 0.004223 seconds timing using sparse: 0.004252 seconds timing using ismember , loop: 2.771389 seconds timing using bsxfun: 0.020739 seconds timing using raw sub2ind: 0.000773 seconds 

in terms of rank:

  1. raw sub2ind
  2. sub2ind
  3. sparse
  4. bsxfun
  5. op's method