| < home | ||
|
by Michael R. Weeks |
||
|
Almost
a year ago, the editors of the Soundpost Online posed several questions regarding
the research done by Topham and McCormick to date the Messiah Strad. Since the questions have not been answered in the interim, I
decided to take up the challenge. I
am not an expert on dendrochronology (DDC), but I can comment on the basic
statistics used in the article.
My
aim here is not to resolve the issue, but to explain some of the basic
statistical concepts, so the readers can make a
more informed analysis by
themselves. Because this is
intended as a “layman’s” guide, I will take some liberties with the
mathematics to simplify the analysis. If you are a statistician by trade, please humor me. I’ll
take each question individually and not necessarily in the order in which
they were posed by the editors. 1)
Explain the t-value. To
understand the t-value we must start with the basics.
For
example, if one flips a coin, the probability of getting heads Now
we can address the t-value. The
t-value is simply a
measure of how many standard deviations I am from the
mean (in actuality, t-values are adjusted for sample size,
but that
doesn’t really concern us here). So
let’s say that
we have determined that our standard deviation is 2 (i.e.
the
average variance around the mean), and then we take a new sample and
flip 54 heads.
We then can
calculate the t-value
for this trial. We are 4 units away from our mean of 50 (54–50)
and the
standard deviation is 2.
This
means that in terms of standard deviations we are 2 deviations from the
mean (4÷2). Thus, 2 is our
t-value. Intuitively, we can
see that the farther
we are from the mean, the larger the t-value.
One can look
at a statistical chart to find probabilities
associated with each t-value -- which leads us to the next question… 2) Are the t-values ascribed to successful matches throughout the paper by Topham and McCormick consistent with the level of statistical probability generally accepted for a dendrochronological match? The
short answer is yes. Here's why... We
talked about the
mean and standard deviation in the previous example.
However, for many more complicated examples, we don’t really know
the mean. In that case we may
make a hypothesis about the mean.
Let’s
continue with the coin-flip example. Now let’s say I have a “trick” coin.
It is more likely
to come up with heads than tails. I don’t know what the
exact probability distribution is, so I
make a guess (or as a statistician would say - I formulate a hypothesis).
I guess
that the “true” number of heads in a 100-flip trial
will be 60 (if I take an infinite number of trials).
Now I take a few trials
and come up with an average of 55 heads
with a standard deviation of 2 (or in this case our statistician friends
would insist that we call the standard deviation a standard error).
Now, I find that I am 2.5 standard deviations from my
suggested
mean of 60 ((60-55)÷2).
If
the mean was really 60,
I can go to the textbook tables to find out what
the
probability of getting 55 in my trial. If I go to the chart I find
that the chance of getting this result
(55) is less than 1%. So
is
my mean really 60? Probably
not… So
when do I decide if my result is close enough?
For various reasons, most researchers use a probability of 5% to
test
their hypotheses. This
equates to a t-value of approximately
2.
So in this
case, if my result was between 56 and 64, I would
say
that there is a good chance that 60 could be the mean. The
terminology that is used in these cases by statisticians
is instructive. A researcher will analyze the data and then
make one of two
judgments. One outcome is
that we can
reject the hypothesis. In
the example above, the researcher would say that since the t‑value
was greater than two, we can reject the hypothesis that the mean was 60
(at the 5%
level).
The second
possible outcome is that we can
“not reject” the hypothesis. If the result was 57, this would
be the case. Notice that we don’t say that we “accept” the
hypothesis. We can never
really be sure whether the actual
mean is 60; we just know that 60 is a
“reasonable” possibility. How
does this apply to our question? We
know intuitively
that every tree does not grow in exactly the same manner.
There is some amount of statistical variation among trees.
This variation (or error) gives us a range of results from
different trees. In this
analysis we could simplify the
variation by aggregating the results to an
age range. In
the Messiah example we can say that the mean is 1682
(the
year that Topham and McCormick found as their experimental result for the
age of the wood). If the age
really
is greater than 1716 (the attribution date), what is the
probability we would have gotten a result of 1682? The
answer is that the probability is very small indeed. Unfortunately
the analysis is not quite as straight-forward as
our previous example,
since the t-values come from a cross-analysis with other instruments and
other dendrochronological data. However,
the t-values that are obtained with some of the cross-analysis are
considered
huge to statisticians.
Textbook
charts do not even list the probabilities for t-values greater than 5, and
some of the
Topham and McCormick results give t-values of 6.5 and 10.
The probabilities associated with these t-values are astronomically
small. Now
for those skeptics out there, there are some interesting results that
point to the indeterminate nature of statistical dendrochronological analysis.
These results are
not from
the Messiah, but the “Milstein” Strad of 1716.
T&M make
the argument that the Messiah is genuine because it is
from
1716 and matches so well with two instruments from
1717. However, two of the worst t-stats (1.2 and 0.9 - much
less
than our stated requirement of 2) in the data come from
the Milstein
Strad’s cross analysis with the same two
instruments from 1717. How
can this be? Well, the date
associated with the wood of
the Milstein Strad is 1706. As you can see this is much closer
to our attribution date of 1716.
Consequently the data are
closer to the mean and we “cannot
reject” the hypothesis
that the wood was younger than 1716. Does this mean that
the Milstein Strad is a fake?
No, it just means that dendrochronology is inconclusive in this
case when
compared with the instruments used to make the Messiah
comparison. We must look at other factors to make an
attribution. The Milstein Strad does match
another
instrument attributed to
Stradivari in 1716, but not the ones
which provide such convincing
evidence for the
Messiah Strad. In
fact, we are probably exceedingly lucky that we got “good” statistical
results for the Messie from this analysis.
If the tree
had been cut closer in time to the making of the
violin, we
may have still had many more years of speculation. 3)
Would T&M’s results for the Messiah have produced a
conclusive date without
the cross-match to the Italian Instrument Master
Chronology?
In other words,
would a date have emerged from the date in the absence of an anticipated
date? T&M
would have been able to get a date without the
instrument master
chronology. They could have
used the
pure dendrochronological data from the nearby forests; however,
it seems that the instrument master chronology probably improved their
findings. Certainly there is
a certain amount of “bootstrapping” to this process, but it seems
inevitable in an investigation of this type.
The question then becomes: if we have data from hundreds of years
of violin research, why not use it? 4)
There were three questions regarding whether the research conforms
to
accepted dendrochronological practice? As I
stated earlier, I am not an expert in dendrochronological research; however, I have been through the academic peer
review process. Given that this research was published in a leading academic
journal, I feel pretty confident that it
conforms to accepted practice. If not, I do not think it would
have been able to successfully
navigate the peer review
maze. At
times academic communities do show biases and publish suspect data (some
of the controversial global-warming data
on both sides of that issue comes
to mind), but I doubt that is
the case here. While this may be a controversial issue in the
violin world, I
don’t think most readers of the Journal of Archeological Science would
find it controversial. 5)
I will add three other questions to the mix to clarify some of the
concepts in this discussion. What are the problems associated
with widespread use of these
techniques? The
problems that come from the use of this process arise
from the statistical
uncertainty associated with the process.
Most processes that are evaluated statistically will arrive at
a
solution which includes a confidence interval.
This interval
will give the range that equates to a 95% (the most
commonly used interval) certainty of the solution.
So for the Messiah
date we should have a date range associated with
the results,
not a single discrete date.
We’ll call the discrete date the
DDC date (for dendrochronology).
A graph of the results
would look something like the example in
Figure 1.
A caution: these
graphs presented here are only
simplified depictions of the comparison
process. This is not
a
representation of the complex comparisons that were
actually done for the
T & M paper. Obviously,
so much of the T&M data is embedded within the results that it would
be impossible to completely reconstruct their results without access to
all of the raw data and much dendrochronological expertise.
Figure 1: Confidence Interval Depiction for Messiah
Strad (not to scale) As
you can see from the figure, the attribution date falls
outside of the 95%
confidence interval.
This
means that we can be at least 95% confident that the tree was cut prior
to
1716. The
Milstein Strad, however, does not provide such a
convincing case. Figure 2 provides a simplified
representation of the results for
the Milstein Strad.
In this
case it is more difficult to use the DDC date to
justify the attribution date with 95% confidence. I make this
assessment based on the low t-values associated with
the two 1717 Strads that were used for the Messiah.
Figure 2: Confidence Interval Depiction for the
Milstein Strad (not to scale) 6)
Could DDC results be misused? As
you can see from these examples, the possibility of the misuse of DDC
dates is great. Both of the
tests result in
DDC dates that support the attribution; however,
Messiah results would stand up in a “scientific court of
law.” The Milstein Strad results would not stand up to the
same scrutiny.
I fear, however, that unscrupulous sellers
might use these
inconclusive dates to justify a spurious attribution on other lesser
violins which lack the detailed provenance of most Strads.
The sales pitch would go
something like this:
“I got a DDC analysis and the wood dates from 1750;
therefore, my
attribution of 1755 is correct.” In
reality, the statistical error associated with DDC results
like these is
probably so large that they cannot be relied
upon.
One must have some measure of the confidence
interval to make a
sound assessment. A
corollary to this is that someone could go “DDC shopping”
to get the results they want.
Since each
analysis (different databases, different measurements, and different
cross-analysis data) will result in a slightly different answer one could
continue to do the analysis until one found
the results one desired.
If the results are conclusive (say a
t-value which leads to a 99.9%
confidence), then DDC
shopping would be cost prohibitive.
If the results were less conclusive, then a few further trials
might produce
the desired results. 7)
Where could the T&M analysis have gone wrong? One
place that the analysis could have gone wrong is purely “Murphy’s
Law.”
We said that we have
achieved very good statistical results; however, the results from these
experiments could be the “one in a million” chance that,
given the
data, we would achieve the results obtained by
T&M.
This is extremely unlikely, but so is winning the lottery.
Have you bought a lottery ticket lately? A
second possible avenue for mistakes would be
measurement error. These
are very fine measurements and
any malfunctioning equipment or improper
technique would probably result in incorrect results. In
conclusion, I hope these elaborations will enable readers to
tackle the
research of Topham and McCormick and make their own decisions.
We will never have total certainty in matters such as
this; we can
only weigh the evidence for ourselves. About
the author: Michael Weeks is
a doctoral candidate in management studies
at Templeton College,
University of Oxford. He is
also a violinist and a member
of the Oxford Symphony Orchestra, a local
community orchestra. He can
be
reached at mikeweeks@aol.com. |
|
Article
Marketplace
Review
Message
Board Home Workbench
Scandal
Studio
Archive
Contact |