Octave has a great built-in function called
nchoosek
which makes shuffling a breeze. Called with scalar arguments it returns the value of the binomial coefficient, which is the number of ways you can choose k things from n things (k<=n). For fixed n, nchoosek is maximum when k=n/2. That is plotted below for n=2:24.Notice that's a log-log scale, as n increases it quickly becomes intractable to perform a complete permutation test.
Suppose we have two data vectors, and we want to know if they are from populations with different means. We can use the
normrnd()
function to draw 8 samples from a normal distribution with 0 mean and 8 samples from a normal distribution with a mean of 1.n=8;
x1 = normrnd(0,1,n,1);
x2 = normrnd(1,1,n,1);
perms = nchoosek( [x1;x2], n );
We stored all 12870 permutations of the vectors (16 choose 8) in the matrix
perms
. Now we use the built-in function complement
to find the difference of the means under each of those labellings.for(i=1:length(m1) )
comps(i,:) = complement( perms(i,:), [x1;x2] );
endfor
m1 = mean( perms, 2 ); % take row-wise averages, DIM=2
m2 = mean( comps, 2 ); % take row-wise averages, DIM=2
m_shuff = m1 - m2;
Now we can use the distribution of
m_shuff
to decide if the difference in the means of our two data vectors is significant.Octave script with all of these calculations: shuffle.m.
What about the bootstrap you say? That's really easy in Octave too:
ReplyDeletex = empirical_rnd( n, data );
Draws a bootstrap sample (that's with replacement) of size n from your data vector.