Saturday, February 2, 2013

No Interactions? OFAT is still a Bad Idea


Suppose you are trying to estimate the effect that 6 factors have on a response, and you know that none of the factors influence the effect of the others, so that a simple model like this
Y = b1X1 + b2X2 + b3X3 + b4X4 + b5X5 + b6X6
(1)

is the perfect choice. How should you get the data you need to estimate the bi’s? You may be tempted to design a test to estimate each of these factors by changing one factor at a time (OFAT). There are no interaction terms (e.g. b7X1X4) in equation 1. So there’s no need to perform any runs that change several of the X’s at once, right? Wrong.

Table 2 shows a 36 run OFAT design. There are three repeated cases for each treatment. Table 1 shows a 32 run D-optimal design. There are no repeated runs. You might expect that you would be better able to estimate error from the design in Table 2 because of replication, but you’d be wrong. In fact, as Figure 1 shows,


PIC
Figure 1: 1000 fits of the model in equation 1 to synthetic data


the average standard error in the coefficient estimates for the model in equation 1 are significantly lower for the D-optimal design most of the time even with fewer runs than the OFAT design.

Why does this happen? Each run in the D-optimal design contributes to the estimate of every term in the model. However, each run in the OFAT design can only contribute to the estimate of a single term in the model. The “error bars” for OFAT designs will almost always be significantly larger than D-optimal designs (other optimality criteria give largely the same improvement over OFAT in practice).



Table 1: D-Optimal Design








X1X2X3X4X5X6







1-1-1-1-1-1-1
21-1-1-1-1-1
3-11-1-1-1-1
5-1-11-1-1-1
8111-1-1-1
9-1-1-11-1-1
1211-11-1-1
141-111-1-1
15-1111-1-1
17-1-1-1-11-1
2011-1-11-1
221-11-11-1
23-111-11-1
261-1-111-1
27-11-111-1
29-1-1111-1
3211111-1
341-1-1-1-11
35-11-1-1-11
37-1-11-1-11
40111-1-11
41-1-1-11-11
4411-11-11
481111-11
49-1-1-1-111
541-11-111
55-111-111
56111-111
59-11-1111
6011-1111
621-11111
63-111111












Table 2: OFAT Design








X1X2X3X4X5X6







1100000
2010000
3001000
4000100
5000010
6000001
7-1-0-0-0-0-0
8-0-1-0-0-0-0
9-0-0-1-0-0-0
10-0-0-0-1-0-0
11-0-0-0-0-1-0
12-0-0-0-0-0-1
13100000
14010000
15001000
16000100
17000010
18000001
19-1-0-0-0-0-0
20-0-1-0-0-0-0
21-0-0-1-0-0-0
22-0-0-0-1-0-0
23-0-0-0-0-1-0
24-0-0-0-0-0-1
25100000
26010000
27001000
28000100
29000010
30000001
31-1-0-0-0-0-0
32-0-1-0-0-0-0
33-0-0-1-0-0-0
34-0-0-0-1-0-0
35-0-0-0-0-1-0
36-0-0-0-0-0-1









2 comments:

  1. Hello VC, I am a bit confused by the topic of the post (and also I am not good with R). Basic question: in the table 1 shouldn't these be 0 and 1, not -1 and 1?

    ReplyDelete
    Replies
    1. It's common practice to center the factor levels so a two-level factor takes values -1 and 1.

      With the AlgDesign function gen.factorial used in the script above you can change this with the 'center' option (center=FALSE instead of center=TRUE).

      My goal with the post certainly wasn't to confuse, so please ask more questions if you've got them. Anything in particular that is especially confusing?

      Delete