1) Score the individual genotypes

2) Calculate genotype frequencies

3) Calculate allele frequencies

4) Using the observed allele frequencies, calculate the genotype frequencies you would expect under Hardy-Weinberg equilibrium

5) Use a goodness-of-fit test (Chi-square) to compare the expected and observed frequencies.

Hardy Weinberg Analysis

(adapted from JonBaker’s curriculum unit)

Summary

The first question that population geneticists ask after they collect genotype data for a population is whether or not the population is in Hardy Weinberg equilibrium for that locus. This test involves comparing the observed and expected genotype frequencies that you calculated for each of the populations above. When a population is in Hardy-Weinberg equilibrium for a given genetic locus it means that there is random mating (with respect to that locus), no selection, no mutation, no gene flow and a population large enough to avoid the random effects of genetic drift. Population geneticists generally use population genetic profiles to determine how reproductively isolated populations are from one another. However, if selection is operating on a locus and one finds differences between populations at that locus, the difference may be due to selection acting differently on the two populations, rather than the result of reproductive isolation between the two populations. Thus, if one is comparing the allele frequencies of two populations , first one needs to determine whether each population is in Hardy Weinberg equilibrium. If they are not, one may well be comparing apples and oranges, since one population may have selection causing it to be out of equilibrium, and the other might have nonrandom mating or some other process causing the deviation from Hardy-Weinberg. equilibrium. If both populations are in Hardy Weinberg equilibrium, then it is more likely that difference in allele frequencies of the two populations is due to reproductive isolation.

Calculating Genotype and Allele Frequencies

From your Hinf1 restriction digest gel you will (hopefully!) be able to distinguish a genotype for each fish. (See gel above, and schematic drawing below).

Use the data sheet to score the number of fish that have each genotype: AA, AB, or BB. (Use a separate data sheet for each population). Disregard samples where a genotype can not clearly be established. If desired, you could try repeating the PCR on that sample, or digesting more of the PCR product, or running more of the digest on the gel, depending on what the problem seems to be.

After recording the genotypes on the data sheet under the "Observed" heading, calculate the frequencies of each genotype in that population, by dividing each genotype's total by the total number of genotypes obtained.

Now it's time to calculate the allele frequencies, i.e. the proportion of A alleles and B alleles in your population (how many A's and B's you had, divided by the total number of alleles - use bottom part of data sheet). Remembering that each fish is diploid, how many total alleles were in your population? That's right, double the total number of fish scored. To calculate the frequency of each allele, start by counting how many A's and B's you had. For example, if you had 6 AA, 4 AB, and 2 BB genotypes (total =12 fish), then there are 16 A's and 8 B's (total = 24 alleles: remember, each AA genotype contributes two A's and each AB genotype contributes one A to the total; likewise with B's). Now divide the number of A's by the total number of alleles to get the frequency of A (which we will now call "p"- in this example 16¸24 or .67), and likewise for frequency of B (which we will now call "q" - it would have to be .33 since they have to add up to one).

Now you're ready to do the Hardy Weinberg analysis. Essentially this analysis involves figuring out how many fish of each genotype you would have expected given your allele frequencies, IF there was random mating, no selection, mutation, migration and large population size with respect to the prolactin-2 gene. The Hardy Weinberg theorem says that if the five basic conditions are met, allele frequencies for a given gene will not change from one generation to the next. This is the first thing you want to establish, before you can compare one population to another.

The Hardy-Weinberg Theorem really is a model that states that for any set of allele frequencies one can figure out the expected genotype frequencies if the population is not undergoing any evolution. If a population is in Hardy Weinberg equilibrium (HWE), the observed genotype frequencies will conform to p² + 2pq + q², where p² = freq (AA), 2pq = freq (AB), and q² = freq (BB). To determine whether your population at your locus is in HWE take the p and q values you calculated above for your population (observed data), and calculate the genotype frequencies you would EXPECT if the population were in HWE. To do this calculation, use the value for p that you calculated and square it to get the expected frequency of AA genotypes. Continue for each genotype using 2pq and q². Write these numbers on the data table line for expected genotype frequencies. To convert these expected frequencies into actual numbers of fish to compare to your actual population you need to multiply each of the expected frequencies by the total number of fish in the actual population to figure out how many fish you would have expected to be of each genotype, if the population were in HWE. For the example given here, expected AA frequency = p² = (.67)² = .45. Multiplying .45 x 12 total fish = 5.4 expected AA fish. Do the same set of calculations to get your expected number of fish for the other two genotypes, and write the numbers on the data sheet under "expected fish."

O.K. so you expected 5.4 AA fish, and you actually counted 6 AA fish. Doesn't seem that different, but the question is how do you know whether your expected number of genotypes is significantly different from your observed number of different genotypes? To answer this question we need a statistical tool to help us.

Chi Square Goodness-of-Fit Test

We will use the statistical test known as Chi Square Goodness of Fit to determine whether there is a significant difference between the number of actual and expected genotypes. We first formulate a conservative hypothesis, called the null hypothesis (H_o), which states that there is no difference between the observed and expected values; therefore the population is in Hardy Weinberg equilibrium. We also state the alternate hypothesis (H_A) that there is a significant difference, and therefore the population is not in Hardy Weinberg equilibrium. Using this test will tell you the probability of arriving at the differences you find by chance alone, i.e. the lower the probability generated by the test, the greater the likelihood that the difference you see between the observed and expected result is actually the result of a violation of one of the five conditions listed above.

The test determines the significance of the difference between the two sets of numbers, by first plugging them into the equation:

c² = å (observed - expected)²

expected

This equation gives us the Chi Square critical value, which, by looking at a table of Chi square critical values, will tell us the probability that the difference we find is by chance alone. .For the example given above, the c² calcuation would be:

(6 - 5.4) + (4 - 5.3) + (2 - 1.3) = .07 + .32 + .38 = .77

5.4 5.3 1.3

Before we look up this critical value in the Chi Square table, we have to choose a level of certainty with which we are comfortable. For instance, a certainty (also called the P value or alpha value) of 0.05 means that 5% of the time you might actually say that there is a difference, when there really is no difference. For most scientific purposes, the level of certainty is arbitrarily set at 0.05, meaning there is only a 5% probability that the difference between observed and expected is due to chance alone.

We also need to calculate the degrees of freedom (n). The number of degrees of freedom is equal to the number of classes (in our case, the three genotypes) minus one (because if we know two of the expected genotype frequencies we automatically know the third) minus the number of independent values we calculated from our observed data to determine our expected values (these independent values are the allele frequencies – only one of which is independent, because if we know p then we automatically know q). The equation for calculating degrees of freedom is:

d.f = k – 1 - m

where k is the number of classes (genotypes) and m is the number of independent values we calculate from the data (allele frequencies). For a two allele system there are three classes and one independent value, thus there is 1 degree of freedom.

d.f. = 3 – 1 – 1

Now you are ready to go to the Chi Square table of critical values (note: this is a partial table)

Probability of exceeding the critical value

^od.f. 0.10 0.05 0.025 0.01 0.001

----------------------------------------------------------------

1 2.706 3.841 5.024 6.635 10.828

2 4.605 5.991 7.378 9.210 13.816

3 6.251 7.815 9.348 11.345 16.266

4 7.779 9.488 11.143 13.277 18.467

5 9.236 11.070 12.833 15.086 20.515

The rows are arranged by increasing degrees of freedom, so you will be using only the top row (for one degree of freedom) for the goodness of fit test. The columns are arranged by probability, with 0.10 (or 10%) on the left and 0.001 (or 0.1% chance) on the far right. For the 5% level, go to the 0.05 column, where the number in the top row is 3.84. If your critical value is less than 3.84 then there is a 95% probability that the difference between observed and expected was caused by chance alone; therefore you would accept the null hypothesis that the observed and expected values are not significantly different, and that your population is indeed in Hardy Weinberg equilibrium.

If your critical value is greater than 3.84, then there is only a 5% probability that the difference between your observed and expected values is due solely to chance. Therefore, the difference would be considered significant, and you would reject the null hypothesis that the difference was due to chance. You would then accept the alternative hypothesis that there was indeed a significant difference due to something besides chance.

In this case, you would have cause to believe that one of the five conditions for

H-W equilibrium had been violated. You would probably want to confirm your results by increasing your sample size before trying to determine which condition may have been violated.

If your population is in Hardy Weinberg equilibrium for a given locus, then it tells you that there is no evolution going on in terms of that particular gene. These fish mate randomly with each other and AA fish don't prefer other AA fish; they'd be just as happy to mate with AB or BB fish. BB fish are not surviving to parenthood any more than the AAs or ABs. The A and B alleles are not mutating at a significant rate. And the population is behaving as though there is a large pool of fish with these alleles.

Once we know our population is in Hardy Weinberg equilibrium, and therefore that it is nonevolving in terms of the gene in question, we are ready to compare it to another population for that locus. If it is not in Hardy Weinberg equilibrium, there may be natural selection or nonrandom mating or something else going on, so we wouldn't be able to do a fair comparison to another population. In that case, we would probably want to look at another locus, hopefully one that is in Hardy Weinberg equilibrium.