< home


  A Statistical Primer for
Understanding the Messiah Strad Research

by Michael R. Weeks

 

Almost a year ago, the editors of the Soundpost Online posed several questions regarding the research done by Topham and McCormick to date the Messiah Strad.  Since the questions have not been answered in the interim, I decided to take up the challenge. I am not an expert on dendrochronology (DDC), but I can comment on the basic statistics used in the article. My aim here is not to resolve the issue, but to explain some of the basic statistical concepts, so the readers can make a more informed analysis by themselves. Because this is intended as a “layman’s” guide, I will take some liberties with the mathematics to simplify the analysis. If you are a statistician by trade, please humor me.

I’ll take each question individually and not necessarily in the order in which they were posed by the editors.

1)  Explain the t-value.

To understand the t-value we must start with the basics.  
Many processes behave in what statisticians call a “normal” manner. Roughly, this means that the process has some 
mean (or average value) and the error associated with the
mean clusters around the mean in a predictable distribution.
 
Most of us recognize the distribution as a bell curve.

For example, if one flips a coin, the probability of getting heads 
is 50%.
So if I flipped a coin 100 times, I should come out with 50 heads. However, we know intuitively that if I did several trials of 100 coin flips the number of heads would be something like 48, 54, 41, and 52.  I wouldn’t get 50 heads every time. However, if I did an infinite number of trials of 100 coin flips, the mean number of heads would approach 50. In my real world of finite attempts I would get a mean and another number called the standard deviation. Trials which resulted in a wide dispersion of results would have a large standard deviation, and trials in which most of the results were close to 50 would have a small standard deviation.

Now we can address the t-value. The t-value is simply a measure of how many standard deviations I am from the mean (in actuality, t-values are adjusted for sample size, but that doesn’t really concern us here). So let’s say that we have determined that our standard deviation is 2 (i.e. the average variance around the mean), and then we take a new sample and flip 54 heads. We then can calculate the t-value for this trial. We are 4 units away from our mean of 50 (54–50) and the standard deviation is 2. This means that in terms of standard deviations we are 2 deviations from the mean (4÷2). Thus, 2 is our t-value. Intuitively, we can see that the farther we are from the mean, the larger the t-value. One can look at a statistical chart to find probabilities associated with each t-value -- which leads us to the next question…

2)  Are the t-values ascribed to successful matches throughout the paper by Topham and McCormick consistent with the level of statistical probability generally accepted for a dendrochronological match?

The short answer is yes. Here's why... We talked about the mean and standard deviation in the previous example. However, for many more complicated examples, we don’t really know the mean. In that case we may make a hypothesis about the mean. Let’s continue with the coin-flip example. Now let’s say I have a “trick” coin. It is more likely to come up with heads than tails. I don’t know what the exact probability distribution is, so I make a guess (or as a statistician would say - I formulate a hypothesis). I guess that the “true” number of heads in a 100-flip trial will be 60 (if I take an infinite number of trials). Now I take a few trials and come up with an average of 55 heads with a standard deviation of 2 (or in this case our statistician friends would insist that we call the standard deviation a standard error). Now, I find that I am 2.5 standard deviations from my suggested mean of 60 ((60-55)÷2). If the mean was really 60, I can go to the textbook tables to find out what the probability of getting 55 in my trial. If I go to the chart I find  that the chance of getting this result (55) is less than 1%. So is my mean really 60?  Probably not…

So when do I decide if my result is close enough? For various reasons, most researchers use a probability of 5% to test their hypotheses. This equates to a t-value of approximately  2. So in this case, if my result was between 56 and 64, I would  say that there is a good chance that 60 could be the mean.

The terminology that is used in these cases by statisticians is instructive. A researcher will analyze the data and then make one of two judgments. One outcome is that we can reject the hypothesis.  In the example above, the researcher would say that since the t‑value was greater than two, we can reject the hypothesis that the mean was 60 (at the 5% level). The second possible outcome is that we can “not reject” the hypothesis. If the result was 57, this would be the case. Notice that we don’t say that we “accept” the hypothesis. We can never really be sure whether the actual mean is 60; we just know that 60 is a “reasonable” possibility.

How does this apply to our question? We know intuitively that every tree does not grow in exactly the same manner. There is some amount of statistical variation among trees. This variation (or error) gives us a range of results from different trees. In this analysis we could simplify the variation by aggregating the results to an age range. In the Messiah example we can say that the mean is 1682 (the year that Topham and McCormick found as their experimental result for the age of the wood). If the age really is greater than 1716 (the attribution date), what is the probability we would have gotten a result of 1682? The answer is that the probability is very small indeed.

Unfortunately the analysis is not quite as straight-forward as our previous example, since the t-values come from a cross-analysis with other instruments and other dendrochronological data.  However, the t-values that are obtained with some of the cross-analysis are considered huge to statisticians. Textbook charts do not even list the probabilities for t-values greater than 5, and some of the Topham and McCormick results give t-values of 6.5 and 10. The probabilities associated with these t-values are astronomically small.

Now for those skeptics out there, there are some interesting results that point to the indeterminate nature of statistical dendrochronological analysis. These results are not from the Messiah, but the “Milstein” Strad of 1716. T&M make the argument that the Messiah is genuine because it is from 1716 and matches so well with two instruments from 1717.  However, two of the worst t-stats (1.2 and 0.9 - much less than our stated requirement of 2) in the data come from the Milstein Strad’s cross analysis with the same two instruments from 1717.

How can this be? Well, the date associated with the wood of the Milstein Strad is 1706. As you can see this is much closer to our attribution date of 1716.  Consequently the data are closer to the mean and we “cannot reject” the hypothesis that the wood was younger than 1716. Does this mean that the Milstein Strad is a fake? No, it just means that dendrochronology is inconclusive in this case when compared with the instruments used to make the Messiah comparison. We must look at other factors to make an attribution. The Milstein Strad does match another instrument attributed to Stradivari in 1716, but not the ones which provide such convincing evidence for the Messiah Strad.

In fact, we are probably exceedingly lucky that we got “good” statistical results for the Messie from this analysis.  If the tree had been cut closer in time to the making of the violin, we may have still had many more years of speculation.

3)  Would T&M’s results for the Messiah have produced a conclusive date without the cross-match to the Italian Instrument Master Chronology? In other words, would a date have emerged from the date in the absence of an anticipated date?

T&M would have been able to get a date without the instrument master chronology. They could have used the pure dendrochronological data from the nearby forests; however, it seems that the instrument master chronology probably improved their findings. Certainly there is a certain amount of “bootstrapping” to this process, but it seems inevitable in an investigation of this type. The question then becomes: if we have data from hundreds of years of violin research, why not use it?

4)  There were three questions regarding whether the research conforms to accepted dendrochronological practice?

As I stated earlier, I am not an expert in dendrochronological research; however, I have been through the academic peer review process. Given that this research was published in a leading academic journal, I feel pretty confident that it conforms to accepted practice. If not, I do not think it would have been able to successfully navigate the peer review maze.

At times academic communities do show biases and publish suspect data (some of the controversial global-warming data on both sides of that issue comes to mind), but I doubt that is the case here. While this may be a controversial issue in the violin world, I don’t think most readers of the Journal of Archeological Science would find it controversial.

5)  I will add three other questions to the mix to clarify some of the concepts in this discussion.  What are the problems associated with widespread use of these techniques?

The problems that come from the use of this process arise from the statistical uncertainty associated with the process. Most processes that are evaluated statistically will arrive at a solution which includes a confidence interval.  This interval will give the range that equates to a 95% (the most commonly used interval) certainty of the solution.  So for the Messiah date we should have a date range associated with the results, not a single discrete date.  We’ll call the discrete date the DDC date (for dendrochronology). A graph of the results would look something like the example in Figure 1. A caution: these graphs presented here are only simplified depictions of the comparison process. This is not a representation of the complex comparisons that were actually done for the T & M paper.  Obviously, so much of the T&M data is embedded within the results that it would be impossible to completely reconstruct their results without access to all of the raw data and much dendrochronological expertise.

Figure 1: Confidence Interval Depiction for Messiah Strad (not to scale)

As you can see from the figure, the attribution date falls outside of the 95% confidence interval. This means that we can be at least 95% confident that the tree was cut prior to 1716. The problem is in calculating the confidence interval. In DDC, the date is based on a comparison with a known database. Since most of the time we know where the tree was harvested, we can get a good database. In the case of 300 year old violins, we are not positive where the tree originated. For this reason, T&M decided to use an aggregation of the data available. This was probably the best choice for an analysis, but it makes calculating a confidence interval nearly impossible. To analyze the results we must make a confidence interval approximation. In the case of the Messiah Strad, the t-values were so high we can be reasonably certain that the attribution date lies outside the 95% confidence interval.

The Milstein Strad, however, does not provide such a convincing case. Figure 2 provides a simplified representation of the results for the Milstein Strad. In this case it is more difficult to use the DDC date to justify the attribution date with 95% confidence. I make this assessment based on the low t-values associated with the two 1717 Strads that were used for the Messiah.

Figure 2: Confidence Interval Depiction for the Milstein Strad (not to scale)

6)  Could DDC results be misused?

As you can see from these examples, the possibility of the misuse of DDC dates is great.  Both of the tests result in DDC dates that support the attribution; however, Messiah results would stand up in a “scientific court of law.” The Milstein Strad results would not stand up to the same scrutiny. I fear, however, that unscrupulous sellers might use these inconclusive dates to justify a spurious attribution on other lesser violins which lack the detailed provenance of most Strads. The sales pitch would go something like this: “I got a DDC analysis and the wood dates from 1750; therefore, my attribution of 1755 is correct.” In reality, the statistical error associated with DDC results like these is probably so large that they cannot be relied upon. One must have some measure of the confidence interval to make a sound assessment. 

A corollary to this is that someone could go “DDC shopping” to get the results they want. Since each analysis (different databases, different measurements, and different cross-analysis data) will result in a slightly different answer one could continue to do the analysis until one found the results one desired. If the results are conclusive (say a t-value which leads to a 99.9% confidence), then DDC shopping would be cost prohibitive. If the results were less conclusive, then a few further trials might produce the desired results.

7)  Where could the T&M analysis have gone wrong?

One place that the analysis could have gone wrong is purely “Murphy’s Law.” We said that we have achieved very good statistical results; however, the results from these experiments could be the “one in a million” chance that, given the data, we would achieve the results obtained by T&M. This is extremely unlikely, but so is winning the lottery. Have you bought a lottery ticket lately?

A second possible avenue for mistakes would be measurement error. These are very fine measurements and any malfunctioning equipment or improper technique would probably result in incorrect results.

In conclusion, I hope these elaborations will enable readers to tackle the research of Topham and McCormick and make their own decisions.  We will never have total certainty in matters such as this; we can only weigh the evidence for ourselves. ###

About the author: Michael Weeks is a doctoral candidate in management studies at Templeton College, University of Oxford. He is also a violinist and a member of the Oxford Symphony Orchestra, a local community orchestra. He can be reached at mikeweeks@aol.com.

  Article  Marketplace  Review Message Board  Home Workbench  Scandal  Studio  Archive  Contact