A good, well conceived, control underlies any business insight. It may be so obvious you are not even aware of it but it will be there.

Why have a control?
If the insight is in the differences then you need a norm to be different from. Having a good control is about providing that norm.

Controls can be manufactured in a variety of ways both before and after the data has been gathered. As with anything else a little preparation can pay huge dividends here. If a control is factored in to the data gathering, then cleaner data is created and the insights produced are far more likely to be valid.

Ways of creating a control

There are an infinite (literally) number of ways of creating a control. A few are listed below but remember the objective is to ensure your control matches your experiment as closely as possible. If your control is atypical the business insight you gain will be just plain wrong.

Some straighforward ways of creating a control include:

  • By Time
  • By Location
  • By Demographic
  • By Exposure
  • Self Selection
  • By Specific

Sample Bias
Some problems have specific criteria that will immediately suggest themselves. Some criteria are easier to set aside as a control while creating or gathering the data. Often these are very useful. Sometimes they are dangerous. It is worth examining why they are easier than other data to separate. If there is a fundamental difference that has knock on effects you may end up with contaminated data.

As an example if you were sampling apples to see how productive an orchard should be one simplified way to do it might be as follows:

  1. Count the apples on a selection of trees
  2. Measure the size of some apples
  3. calculate the average size of an apple
  4. Multiply the number of apples by the average size of the apples

There are two different openings for sampling bias here. You could just sample trees in an easily accessible area. These might be exposed to more or less sun, more or less sheltered from the wind and on better or worse soil than the other trees.

Let us assume you have selected the trees as a representative sample. Now we gather the apples. The easiest thing is to pick ones that are within easy reach. They all look the same but perspective does funny things. The sample could be out by a factor of around three just from sampling low down the tree. Maybe the higher apples are more expesed to pilfering by birds and never grow so large. Maybe they are less exposed to grazing by local children (scrumping) and so grow much bigger.

Whatever the case they are from the same tree but they may not be representative at all. An apple that is a 20% smaller across the middle middle is half as big and the apple harvest estimate is out by a factor of two. Getting a harvest estimate wrong by producing half as many apples illustrates the importance of accuracy and the only way to avoid this is to avoid samlping bias. A mistake in your control can have a huge impact upon your final answers.