Tuesday, June 11, 2019

Sudden Death Testing Theory

Have you ever marveled at the reliability of most electric appliances?

Take a color TV for example. If you were to construct it from discrete resistors, capacitors, chokes and transistors it would number in the millions of parts. If you add in lengths of conductors and "connections" it probably pushes ten million failure points.

And yet we buy one without testing at the store to ensure it works. We plug them in and the work. Ten or twenty years later we get tired of looking at it and put it on the curb for recycling where it is stolen by college kids who puts it in their dorm room. The college kids spill beer on it for five years and then throw it in the dumpster out-back where it is stolen by freshmen for another five years of use.

How did this marvel achieve such astronomical levels of reliability? Ten million failure points, thirty years of abuse and nobody remarks on it because that level of performance is the expectation.

Sudden Death Testing
Traditional testing theory involves taking a discrete part and testing it "on the bench" until it fails. Then another part is tested. Then a third and so on. After each part is validated to certain levels of performance the parts are built into assemblies which are then tested on the bench or under accelerated conditions. If it passes the tests it is released for sale to the public.

Sometimes the assembly almost meets the criteria and is released anyway. That is an economics decision. Huge amounts of capital have been tied up building tools and advertising the product in the expectation it would be ready at a given time.

Sudden Death Testing is a little bit different.

In Sudden Death Testing the technician randomly scoops up a large number of parts. For the sake of this essay, let's say it is a hundred parts.

The technician puts them into a gang fixture where all hundred parts get cycled at the same time. The technician stops the test when the first part fails.

Then he reloads another hundred and tests them until the first part fails.

He continues to do this until he has enough failures to perform a Weibull analysis.

Why?
Because complex devices fail when the weakest component fails.

Unlike sports, the most interesting end of the curve is not how long the strongest man lasts but how soon the weakest sister leaves the contest.

There is no point in prolonging the test and tying up the fixture once the weakest sister is identified and how long she lasted is recorded.

Running thousands or hundreds of thousands of components instead of tens of parts greatly increases the chances of finding the true weakest sisters.

Example
Consider a technician who has been tasked with studying capacitors, an electrical gizmo that stores electrical energy that is analogous to the way a spring stores mechanical energy.

Let us suppose that the primary failure mode is the device shorting out at the ends.

The technician is able to have the Integrated Circuit shop etch a single chip with a million capacitors on a single chip that is the same size and shape as the "real" chip the capacitors will be sprinkled across.

The technician orders a batch of one hundred of these chips for his first run.

One hundred chips times a million capacitors per chip equals one-hundred-million capacitors at a whack.

Now, let us imagine that due to vagaries in the process that the etching is a little less crisp on the extreme corner of the chip on the four chips that were on the corners of the block the chips were etched en-mass.

Sudden Death Testing will find that weak capacitor.

Maybe four weak capacitors out of one-hundred-million is OK in most industries but it is not in consumer electronics. Remember, this device is going to get beer spilled on it, puked on, rained on and  thrown in a trash dumpster a couple of times. It will have many unanticipated stresses placed on it.

The team of engineers will continue to tweak the geometry of the capacitor until even the four weakest sisters per hundred million meet the requirements. It will only make the others stronger.

Over-reaction?
Maybe. Maybe not.

Someday things might get sporty. When the chips are down and things really matter then you will want processes, physical, mental and moral where even the weakest sisters can get the the job done.

Avoid processes and people who mollycoddle and make excuses for the weak sisters.

Stop feeding the pigeons.