diversity indexes

Mon Mar 11 08:23:24 EST 1996

At 10:40 AM 3/11/96 gmt+0200, you wrote: 
> 
> 
>Dear all, 
>I'm doing a research work on comparing coral reef fishes in two  
>areas, using a video camera. 
>What I need to know, urgently, is how to compare, statisticly,  
>diversity indexes to see if they are significant. 
>Are there any scientific papers on similar studies? 
>I would appreciate it if you would help me. Thanks! 
>Sorry for any duplication! 
> 
>-- 
>amorim 

I'm sure that I'll be corrected quickly by this list membership if I'm wrong 
(and I'll start off by saying that I'm by no means a statistician), but I 
researched similarity and diversity indices in depth when I was still in my 
graduate program back in the early 80's.  I've since held the view that for 
the most part, since diversity indices (or any index, for that matter - 
i.e., habitat suitability indices) attempt to represent in a single value 
(and lose much of that information in the process), the mass of 
"information" that exists in large, often multidimensional (species, space 
and time) data sets, statistical inferences between indices are usually 
meaningless. 

An example: two or more diversity indices can be identical in their 
respective values, but the underlying data upon which they are each based 
can be entirely different.  I won't go into the mathematics here, but 
suffice it to say that you can generate the exact same diversity index 
(i.e., Shannon and Weaver's H') from any number of data sets, each differing 
from the others in the number of species, the number of individuals within 
each species, and even the composition of the community encountered.  Since 
the indices can be identical, statistical tests will identify no significant 
difference between the data sets, yet these data sets can be widely divergent. 

I also recall reading once that species diversity indices in particular are 
often highly correlated with each other, due simply to the formula that is 
used to calculate them - I apologize, I don't recall off-hand the paper that 
I read that described this "phenomenon (?)," but I'd be happy to look it up 
in my collection if you're interested. 

Combining this problem with the failure of most data commonly encountered in 
ecological field studies to even closely satisfy the assumptions of 
classical statistical methods and probability theory (missing data, mixed 
data [binary, rank, quantitative], independent and normal error 
distributions, homogeneity of variances, additivity of effects, etc.), I 
think one would be hard pressed to stand on any statistically "significant 
differences" between two or more indices. 

I said "usually" above; there may be limited instances where indices can be 
used as predictor variables, IF considerable thought is given a priori to 
the underlying hypotheses and the potential relationships that may exist 
between the variables.  Assume, for example, that you have an impacted area 
and a control area, and you want to develop an index that "describes" that 
impact.  After sampling both areas and using the log-transformed abundance 
values of the species abundances encountered in a discriminant analysis, the 
linear additive discriminate function of the log-transformed species 
abundances would represent (by definition) the best predictor of that 
impact, and the most efficient test of the null hypothesis: "no impact." 
The definition of impacted and control areas could be incorporated into the 
analysis by first performing a cluster analysis to group the samples into 
faunally homogenous assemblages.  If a two-group solution could somehow be 
interpreted as representing the impacted and control groups of samples that 
are spatially contiguous, discriminant analysis could be used to define an 
index of the faunal differences between those two groups.  If, however, the 
two groups were *derived* by cluster analysis, then no significance test 
would be appropriate since the two groups were created from the outset so as 
to maximize the differences on the discriminating variables.  However, if 
the groups were defined a priori, then tests of the null hypothesis in 
species composition between impacted and control areas may be appropriate. 
My suggestion: first, know what question you are asking, and then second, 
try to use a statistical method that retains more of the underlying 
biological information. 

As for papers that you might want to consider, try R. Pikul, 1974, 
Development of environmental indices, IN Statistical and mathematical 
aspects of pollution problems.  J.W. Pratt (Ed.).  Mercel Dekker, New York. 

Hope this helps 

Michael Noah 

    _/      _/     _/  _/      _/ _/    _/ U.S. Navy, COMFLEACT Yokosuka 
   _/_/    _/    _/_/  _/    _/   _/  _/   Environmental Department 
  _/  _/  _/   _/  _/  _/  _/     _/_/     PSC 473 Box 1 Code 1000 
 _/    _/_/  _/_/_/_/  _/_/       _/       FPO AP 96349-1100 
_/      _/ _/      _/  _/        _/        243-7311 / 011-81-311-743-7311 
                                       FAX 243-9027 / 011-81-311-743-9027 

           Michael D. Noah                 mnoah at gol.com 

      "Mother, mother ocean, I have heard your call"  J. Buffett        

   U N I V E R S I T Y  o f  N E B R A S K A    C O R N H U S K E R S 
   N a t i o n a l  C h a m p i o n s   1 9 7 0 - 7 1   1 9 9 4 - 9 5