## Friday, August 14, 2015

### Tukey End Count (Better disc Part III)

You are definitely a nerd when you have a favorite statistical test.  The Tukey End Count is mine.

The advantages of the TEC are that it does not require a calculator and it can provide significant confidence (I will use 80% confidence) with limited data.

#### Background:

Assume for a that you have two discs.  One we will call S for smooth and the other we will call T for textured.  Also assume that your athletes can make 10 throws in a practice without significant degradation in ability.  Ten throws total means five throws with each disc.  While not "random", alternating discs is "fair" in case performance droops.

Further, assume that both discs will produce identical distributions in the distance they can be thrown.  That is, they may look different but they will perform the same in the field.  This is the null hypothesis and certain types of data will force us to admit that this hypothesis is probably incorrect.

#### Example:

Make the ten throws recording the distance and whether the disc was (S) or (T).  Then rank order the throws from longest to shortest.  You might get a list that looks like this:

Longest (T)-(T)-(S)-(S)-(S)-(T)-(T)-(S)-(T)-(S) Shortest distance.

Notice that no distances are listed...just the order.

If the discs are truly identical, then there was a random chance that either Smooth or Textured would be the longest throw.  If they are identical, there is a 4/9 chance that the second longest throw will be the same as the longest throw and there is a 3/8 chance that the third longest throw will be the same as the longest and second longest.  Multiplying out 1*(4/9)*(3/8)=17% chance that "random chance" could have delivered that string.  So three-in-a-row gives us a confidence of 83% and is slightly above our arbitrary 80% confidence limit.

The string shown above does not meet that standard.

#### But what about the other end of the string?

If the distributions for the two populations (S) and (T) are identical and the first two members (but not the third) of the list are one species, then there is a (5/8) chance of the last member being the other species.  1*(4/9)*(5/8)=73%.  Still not there.

But what if the top two were (T) and the bottom two were (S); what confidence will that give?  Let's check it out.  (1*(4/9))  *  ((5/8)*(4/7))=15.8% chance that it was generated by random chance or 84% confidence that the original hypothesis, "that both distributions are identical" is incorrect.

#### Summary:

In a test of ten throws where five are (S) and five are (T), an end count where the three (or more) longest throwsare all of one species provides more than 80% confidence that the two species are not the same.

Failing that,  an end count where the two longest throws are from one species and two (or more) of the shortest throws are from the other species is also sufficient to yield more than 80% confidence that the distributions are different.

Of course, it is possible to test different types of textured tapes and different patterns of application as (A)-(B) comparisons.

#### The psychology of athletic performance

Many athletes perform better when they "feel lucky".  Fiddling around with different surface treatments can convince the athlete that they have a "lucky" disc.

An additional factor is The Hawthorne Effect.

#### A few observations

Using too low of a "confidence" threshold increases the risk of adding noise but no additional performance to the system.  Using too high of a "confidence" threshold may cause throwers to discard changes that have the potential to improve their distances.  I pulled 80% out of my back end because nobody will die if the wrong decision is made, nor will a company lose five million dollars.

Tukey End Count tests can be done with different numbers of test points.  The confidences can be calculated for those specific cases.

Throwers who are more repeatable in their technique will have more separation of populations because there will be less random noise blurring the distributions.

Some treatments may show increased sensitivity to throwing technique.  That is, they may be less forgiving of poor throws but be highly rewarding of superb throws.  That would show up in the TEC as (T) showing up on both ends of the distributions...in both the longest and shortest throws.  While it fails the TEC it may be strategically smart to use the less forgiving equipment in meets where the only chance a marginal thrower might have a chance of placing is a great throw combined with the "spiky" performing equipment.

You still have to use your brain.  If a thrower PRs by a wide margin with a disc that fails the TEC, it might still be worth further investigation.

The TEC is a one time deal.  There is a temptation to re-run it when a piece of equipment comes close to passing.  The issue is that the equipment will pass at some point based on random chance if you run the test enough times.

Since there will be athlete-equipment interactions, the athlete may have to fine-tune their technique to the equipment.  As speculated earlier, improved boundary layer performance may come at the price of increased roll of the disc in flight.  The athlete will have to compensate for that.  What that means is that an athlete might get one result one week...(S) being better...but get a different result after more exposure to the "improved" disc.

Part of the price you pay for the simplicity of the TEC method is that you lose the ability to discriminate between small differences in performance.  You cannot use this test to pick fly specks out of the pepper.