Data
Analysis
(data
sheet)
1)
Score the individual genotypes
2)
Calculate genotype frequencies
3)
Calculate allele frequencies
4)
Using the observed allele frequencies, calculate the genotype frequencies
you would expect under Hardy-Weinberg equilibrium
5)
Use a goodness-of-fit test (Chi-square) to compare the expected and
observed frequencies.
(adapted from JonBaker’s
curriculum unit)
Calculating Genotype and Allele Frequencies
From your Hinf1 restriction digest gel you will (hopefully!) be able to distinguish a genotype for each fish. (See gel above, and schematic drawing below).
Use the data sheet to score the number of fish that have each genotype: AA, AB, or BB. (Use a separate data sheet for each population). Disregard samples where a genotype can not clearly be established. If desired, you could try repeating the PCR on that sample, or digesting more of the PCR product, or running more of the digest on the gel, depending on what the problem seems to be.
After recording the genotypes on the data sheet under the "Observed" heading, calculate the frequencies of each genotype in that population, by dividing each genotype's total by the total number of genotypes obtained.
Now it's time to calculate the allele frequencies, i.e. the proportion of A alleles and B alleles in your population (how many A's and B's you had, divided by the total number of alleles - use bottom part of data sheet). Remembering that each fish is diploid, how many total alleles were in your population? That's right, double the total number of fish scored. To calculate the frequency of each allele, start by counting how many A's and B's you had. For example, if you had 6 AA, 4 AB, and 2 BB genotypes (total =12 fish), then there are 16 A's and 8 B's (total = 24 alleles: remember, each AA genotype contributes two A's and each AB genotype contributes one A to the total; likewise with B's). Now divide the number of A's by the total number of alleles to get the frequency of A (which we will now call "p"- in this example 16¸24 or .67), and likewise for frequency of B (which we will now call "q" - it would have to be .33 since they have to add up to one).
Now you're ready to do the Hardy Weinberg analysis. Essentially this analysis involves figuring out how many fish of each genotype you would have expected given your allele frequencies, IF there was random mating, no selection, mutation, migration and large population size with respect to the prolactin-2 gene. The Hardy Weinberg theorem says that if the five basic conditions are met, allele frequencies for a given gene will not change from one generation to the next. This is the first thing you want to establish, before you can compare one population to another.
The Hardy-Weinberg Theorem really is a model that states that for any set of allele frequencies one can figure out the expected genotype frequencies if the population is not undergoing any evolution. If a population is in Hardy Weinberg equilibrium (HWE), the observed genotype frequencies will conform to p2 + 2pq + q2, where p2 = freq (AA), 2pq = freq (AB), and q2 = freq (BB). To determine whether your population at your locus is in HWE take the p and q values you calculated above for your population (observed data), and calculate the genotype frequencies you would EXPECT if the population were in HWE. To do this calculation, use the value for p that you calculated and square it to get the expected frequency of AA genotypes. Continue for each genotype using 2pq and q2. Write these numbers on the data table line for expected genotype frequencies. To convert these expected frequencies into actual numbers of fish to compare to your actual population you need to multiply each of the expected frequencies by the total number of fish in the actual population to figure out how many fish you would have expected to be of each genotype, if the population were in HWE. For the example given here, expected AA frequency = p2 = (.67)2 = .45. Multiplying .45 x 12 total fish = 5.4 expected AA fish. Do the same set of calculations to get your expected number of fish for the other two genotypes, and write the numbers on the data sheet under "expected fish."
O.K. so you expected 5.4 AA fish, and you actually counted 6 AA fish. Doesn't seem that different, but the question is how do you know whether your expected number of genotypes is significantly different from your observed number of different genotypes? To answer this question we need a statistical tool to help us.
We will use the statistical test known as Chi Square Goodness of Fit to determine whether there is a significant difference between the number of actual and expected genotypes. We first formulate a conservative hypothesis, called the null hypothesis (Ho), which states that there is no difference between the observed and expected values; therefore the population is in Hardy Weinberg equilibrium. We also state the alternate hypothesis (HA) that there is a significant difference, and therefore the population is not in Hardy Weinberg equilibrium. Using this test will tell you the probability of arriving at the differences you find by chance alone, i.e. the lower the probability generated by the test, the greater the likelihood that the difference you see between the observed and expected result is actually the result of a violation of one of the five conditions listed above.
The test determines the significance of the difference between the two sets of numbers, by first plugging them into the equation:
This equation gives us the Chi Square critical value, which, by looking at a table of Chi square critical values, will tell us the probability that the difference we find is by chance alone. .For the example given above, the c2 calcuation would be:
Before we look up this critical value in the Chi Square table, we have to choose a level of certainty with which we are comfortable. For instance, a certainty (also called the P value or alpha value) of 0.05 means that 5% of the time you might actually say that there is a difference, when there really is no difference. For most scientific purposes, the level of certainty is arbitrarily set at 0.05, meaning there is only a 5% probability that the difference between observed and expected is due to chance alone.
We also need to calculate the degrees of freedom (n). The number of degrees of freedom is equal to the number of classes (in our case, the three genotypes) minus one (because if we know two of the expected genotype frequencies we automatically know the third) minus the number of independent values we calculated from our observed data to determine our expected values (these independent values are the allele frequencies – only one of which is independent, because if we know p then we automatically know q). The equation for calculating degrees of freedom is:
d.f = k – 1 - m
where k is the number of classes (genotypes) and m is the number of independent values we calculate from the data (allele frequencies). For a two allele system there are three classes and one independent value, thus there is 1 degree of freedom.
d.f. = 3 – 1 – 1
Now you are ready to go to the Chi Square table of critical values (note: this is a partial table)
Probability of
exceeding the critical value
od.f.
0.10 0.05
0.025 0.01
0.001
----------------------------------------------------------------
1
2.706 3.841
5.024 6.635
10.828
2
4.605
5.991 7.378
9.210 13.816
3
6.251 7.815
9.348 11.345
16.266
4
7.779 9.488
11.143 13.277
18.467
5
9.236 11.070
12.833 15.086
20.515
The
rows are arranged by increasing degrees of freedom, so you will be using only
the top row (for one degree of freedom) for the goodness of fit test.
The columns are arranged by probability, with 0.10 (or 10%) on the left
and 0.001 (or 0.1% chance) on the far right.
For the 5% level, go to the 0.05 column, where the number in the top row
is 3.84. If your critical value is less than 3.84 then there is a 95%
probability that the difference between observed and expected was caused by
chance alone; therefore you would accept the null hypothesis that the observed
and expected values are not significantly different, and that your population is
indeed in Hardy Weinberg equilibrium.
If
your critical value is greater than 3.84, then there is only a 5% probability
that the difference between your observed and expected values is due solely to
chance. Therefore, the
difference would be considered
significant, and you would reject the null hypothesis that the difference was
due to chance. You would then
accept the alternative hypothesis that there was indeed a significant difference
due to something besides chance.
In
this case, you would have cause to believe that one of the five conditions for
H-W equilibrium had been
violated. You would probably want
to confirm your results by increasing your sample size before trying to
determine which condition may have been violated.
If
your population is in Hardy Weinberg equilibrium for a given locus, then it
tells you that there is no evolution going on in terms of that particular gene.
These fish mate randomly with each other and AA fish don't prefer other
AA fish; they'd be just as happy to mate with AB or BB fish.
BB fish are not surviving to parenthood any more than the AAs or ABs.
The A and B alleles are not mutating at a significant rate.
And the population is behaving as though there is a large pool of fish
with these alleles.
Once we know our population is in Hardy Weinberg equilibrium, and
therefore that it is nonevolving in terms of the gene in question,
we are ready to compare it to another population for that locus.
If it is not in Hardy Weinberg equilibrium, there may be natural
selection or nonrandom mating or something else going on, so we wouldn't be able
to do a fair comparison to another population.
In that case, we would probably want to look at another locus,
hopefully one that is in Hardy Weinberg equilibrium.