Wednesday, July 27, 2011

Homozygosity Mapping and LOD Score

1. HOMOZYGOSITY MAPPING
Homozygosity is the state of possessing two identical forms of a particular gene (alleles), one inherited from each parent.
Homozygosity mapping is a method for mapping the human genome, used to detect genes that cause disease only when both copies in an individual are mutated (the genes are homozygous, or the same). This technique works for genetic disorders that are inherited from both parents, since inheriting a pair of heterozygous (different) genes results in expression of a non-mutated version from one parent, and the absence of disease symptoms. It is the way to map human recessive traits with the DNA of inbred children.
2. HOW IS HOMOZYGOSITY MAPPING DONE
Homozygosity mapping can be done by either of the following methods
1. SNPArrays
2. RFLP
3. Microsatellite markers



2.1. SNP ARRAYS FOR HOMOZYGOSITY MAPPING
It was proposed that a set of SNPs evenly spread across the human genome could be used to screen two populations (typically populations with and without a disorder) and that some SNPs would associate more with the disease group, thus implicating the SNP, or a DNA sequence close by, in the disease state. A massive technological effort followed and whole genome scans, using tens of thousands of SNPs, were made a reality with the advent of array-based technologies.
One recent advance is the development of high-density SNP microarrays for genotyping. The SNP arrays overcome low marker informativity by using a large number of markers to achieve greater coverage at finer resolution.

BARDET-BIEDL SYNDROME
We used SNP microarray genotyping for homozygosity mapping in a small consanguineous Israeli Bedouin family with autosomal recessive Bardet-Biedl syndrome (BBS; obesity, pigmentary retinopathy, polydactyly, hypogonadism, renal and cardiac abnormalities, and cognitive impairment) in which previous linkage studies using short tandem repeat (STR) polymorphisms failed to identify a disease locus.
SNP genotyping revealed a homozygous candidate region. Mutation analysis in the region of homozygosity identified a conserved homozygous mis-sense mutation in the TRIM32 gene, a gene coding for an E3 ubiquitin ligase. Functional analysis of this gene in zebra-fish and expression correlation analyses among other BBS genes in an expression quantitative trait loci data set demonstrate that TRIM32 is a BBS gene. This study shows the value of high-density SNP genotyping for homozygosity mapping and the use of expression correlation data for evaluation of candidate genes and identifies the proteosome degradation pathway as a pathway involved in BBS.
Positional cloning is a powerful approach to identify genes mutated in human and animal models monogenic (single gene case) diseases. The genomic DNA region (and the embedded polymorphisms) harboring the disease-causing mutations segregates with the disease in analyzed pedigrees. However, positional cloning relayed until recently on the availability of large pedigrees to reach a significant linkage.
This protocol describes the use of whole genome genotyping on sporadic consanguineous patients to identify potential disease loci and subsequent positional candidate genes, by homozygosity mapping (autozygosity). It takes advantage of high density single nucleotide polymorphism (SNP) genotyping arrays, and of the assumption that unrelated patients from several consanguineous families are mutated in the same gene.

2.2. RFLP (RESTRICTION FRAGMENT LENGTH POLYMORPHISM)
RFLP is a method used by molecular biologists to follow a particular sequence of DNA as it is passed on to other cells. It is technique that exploits variations in homologous DNA sequences. It refers to a difference between samples of homologous DNA molecules that come from differing locations of restriction enzyme sites. By cutting two different DNA molecules with the same restriction enzyme, scientists can compare the lengths of the fragments; two identical molecules will have identical fragments, while two similar molecules may be largely alike, with perhaps a few differences in fragment size. These differences in restriction fragment lengths are called polymorphisms and are used in all types of DNA typing.
RFLP PRODUCTION
RFLP methodology involves
• Cutting a particular region of DNA with known variability, with restriction enzymes
• Separating the DNA fragments by agarose gel electrophoresis
• Determining the number of fragments and relative sizes
The pattern of fragment sizes will differ for each individual tested.
APPLICATIONS
RFLP technique has many applications like
DNA fingerprinting in forensic science
Tracing ancestry
Studying evolution and migration of wild life
Detection and diagnosis of certain diseases
Genetic mapping (to calculate the genetic distance between two loci)
2.3. MICROSATELLITE MARKERS
Microsatellites are simple sequence tandem repeats (SSTRs). The repeat units are generally di-, tri- tetra- or pentanucleotides. For example, a common repeat motif in birds is ACn, where the two nucleotides A and C are repeated in bead-like fashion a variable number of times (n could range from 8 to 50). They tend to occur in non-coding regions of the DNA (this should be fairly obvious for long dinucleotide repeats) although a few human genetic disorders are caused by (trinucleotide) microsatellite regions in coding regions. On each side of the repeat unit are flanking regions that consist of "unordered" DNA. The flanking regions are critical because they allow us to develop locus-specific primers to amplify the microsatellites with PCR (polymerase chain reaction). That is, given a stretch of unordered DNA 30-50 base pairs (bp) long, the probability of finding that particular stretch more than once in the genome becomes vanishingly small. In contrast, a given repeat unit (say AC19) may occur in thousands of places in the genome. We use this combination of widely occurring repeat units and locus-specific flanking regions as part of our strategy for finding and developing microsatellite primers. The primers for PCR will be sequences from these unique flanking regions. By having a forward and a reverse primer on each side of the microsatellite, we will be able to amplify a fairly short (100 to 500 bp, where bp means base pairs) locus-specific microsatellite region.
P.S SEE USES IN PRESENTATION.
3. HOMOZYGOSITY MAPPER
It is a web based approach for homozygosity mapping with a store of marker data in a database into which users can upload their SNP genotype files. Database analyses the data in a few minutes, detects homozygous portions (alleles) and provides agraphical interface of the results. Software also provides the option to zoom into single chromosomes anduser-defined chromosomal regions. It is integrated with a gene search engine GeneDistiller which enables users to determine most promising gene. Users can restrict access or make their uploaded sequences public. Hence homozygosity mapper can be used as a data repository for homozygosity mapping based researches.
4. GENETIC LINKAGE
Genetic linkage means that certain genes tend to be inherited together, because they are on the same chromosome. Thus parental combinations of characters are found more frequently in offspring than non parental. Genetic loci that are physically close to one another on the same chromosome tend to stay together during meiosis, and are thus genetically linked.

DISCOVERY
In 1905 the three geneticists William Bateson, Edith Rebecca Saunders, and Reginald C.Punnett discovered an apparent exception to one of Mendel's foundational proposals: the principle of independent assortment.
In their work with pea plants, these researchers noticed that not all of their crosses yielded results that reflected the principle of independent assortment specifically, some phenotypes appeared far more frequently than traditional Mendelian genetics would predict. Based on these findings, they proposed that certain alleles must somehow be coupled with one another, although they weren't sure how this linkage occurred. The answer to this question came just seven years later, when Thomas Hunt Morgan used fruit flies to demonstrate that linked genes must be real physical objects that are located in close proximity on the same chromosome.
In 1910, Morgan discovered a fly with mutant white eyes while normally fruit flies have red eyes, not white eyes. Morgan crossed this white eyed male fly to its red eyed sisters. Later he inbred the heterozygous F1 red-eyed flies, the traits of the F2 progeny did not assort independently. Morgan expected a 1:1:1:1 ratio of red-eyed females, red-eyed males, white-eyed males, and white-eyed females. Instead, he observed the following phenotypes in his F2 generation:
2,459 red-eyed females
1,011 red-eyed males
782 white-eyed males
There were no white-eyed females, and Morgan wondered whether this was because the trait was sex-limited and only expressed in male flies. To test whether this trait was sex limited he completed a second cross between the original white-eyed male fly and some of his F1 daughters. These crosses produced an F2 generation with the following phenotypes:
129 red-eyed females
132 red-eyed males
88 white-eyed females
86 white-eyed males
Thus, the results of this cross did produce white-eyed females, and the groups had approximately equal numbers. Morgan therefore hypothesized that the eye color trait was connected with the sex factor. This in turn led to the idea of genetic linkage, which means that when two genes are closely associated on the same chromosome, they do not assort independently.
LINKAGE MAP
A linkage map is a genetic map of a species or experimental population that shows the position of its known genes or genetic markers relative to each other in terms of recombination frequency, rather than as specific physical distance along each chromosome. Linkage mapping is critical for identifying the location of genes that cause genetic diseases.
A genetic map is a map based on the frequencies of recombination between markers during crossover of homologous chromosomes. The greater the frequency of recombination (segregation) between two genetic markers, the farther apart they are assumed to be. Conversely, the lower the frequency of recombination between the markers, the smaller the physical distance between them. Historically, the markers originally used were detectable phenotypes derived from coding DNA sequences; eventually, confirmed or assumed non-coding DNA sequences such as microsatellites or those generating restriction fragment length polymorphisms (RFLPs) have been used.
Genetic maps help researchers to locate other markers, such as other genes by testing for genetic linkage of the already known markers.
A genetic map is not a physical map (such as a radiation reduced hybrid map) or gene map.
A map of the genes on a chromosome based on linkage analysis. A linkage map does not show the physical distances between genes but rather their relative positions, as determined by how often two gene loci are inherited together. The closer two genes are (the more tightly they are linked), the more often they will be inherited together.
Linkage distance is measured in centimorgans (cM).
CONSTRUCTING A GENETIC LINKAGE MAP
Genetic linkage maps of each chromosome are made by determining how frequently two markers are passed together from parent to child. Because genetic material is sometimes exchanged during the production of sperm and egg cells, groups of traits (or markers) originally together on one chromosome may not be inherited together. Closely linked markers are less likely to be separated by spontaneous chromosome rearrangements. In this diagram, the vertical lines represent chromosome 4 pairs for each individual in a family. The father has two traits that can be detected in any child who inherits them: a short known DNA sequence used as a genetic marker (M) and Huntingtons disease (HD). The fact that one child received only a single trait (M) from that particular chromosome indicates that the fathers genetic material recombined during the process of sperm production. The frequency of this event helps determine the distance between the two DNA sequences on a genetic map.


5. LOD SCORE METHOD FOR ESTIMATING RECOMBINATION FREQUENCY

The LOD score (logarithm (base 10) of odds), developed by Newton E. Morton, is a statistical test often used for linkage analysis in human, animal, and plant populations. The LOD score compares the likelihood of obtaining the test data if the two loci are indeed linked, to the likelihood of observing the same data purely by chance. Positive LOD scores favor the presence of linkage, whereas negative LOD scores indicate that linkage is less likely. Computerized LOD score analysis is a simple way to analyze complex family pedigrees in order to determine the linkage between Mendelian traits (or between a trait and a marker, or two markers).
The method is described in greater detail by Strachan and Read. Briefly, it works as follows:
Establish a pedigree
Make a number of estimates of recombination frequency
Calculate a LOD score for each estimate
The estimate with the highest LOD score will be considered the best estimate
The LOD score is calculated as follows:

Where:
 NR denotes the number of non-recombinant offspring,
 R denotes the number of recombinant offspring.
 Theta is the recombinant fraction, it is equal to R / (NR + R)
The reason 0.5 is used in the denominator is that any alleles that are completely unlinked (e.g. alleles on separate chromosomes) have a 50% chance of recombination, due to independent assortment.
In practice, LOD scores are looked up in a table which lists LOD scores for various standard pedigrees and various values of recombination frequency.
By convention, a LOD score greater than 3.0 is considered evidence for linkage. A LOD score of +3 indicates 1000 to 1 odds that the linkage being observed did not occur by chance. On the other hand, a LOD score less than -2.0 is considered evidence to exclude linkage. Although it is very unlikely that a LOD score of 3 would be obtained from a single pedigree, the mathematical properties of the test allow data from a number of pedigrees to be combined by summing the LOD scores. It is important to keep in mind that this traditional cutoff of LOD>+3 is an arbitrary one and that the difference between certain types of linkage studies, particularly analyses of complex genetic traits with hundreds of markers, these criteria should probably be modified to a somewhat higher cutoff.
5.1. MAPPING GENES WITH THE LOD SCORE METHOD
The dilemma of mapping genes can be overcome through the "lod score method" which involves the estimation of genetic distances in the situations other than simple testcrosses. The data obtained from the pedigree is used to calculate the map distances from the recombination frequencies. It is one of the basic and fundamental human genetics methods used today. The use of spread sheet programs i.e. Lotus 1-2-3 or Microsoft Excel makes the solution of this predicament effortless.
WHAT IS NEEDED?
The fundamental aim of this problem is to determine R, the recombinant fraction (fraction of gametes that are recombinant), using data from relatively small families. R can vary from 0 (2 genes completely linked) to 0.50 (2 genes unlinked).
STEPS INVOLVED
There are 4 basic steps in the method:-
(1) Determine the expected frequencies of F2 phenotypes for every value of R from 0.01 to 0.50
(2) Determine the "likelihood" (L) that the family data observed resulted from a given R value: the maximum likelihood is the best estimate of R for the given data
(3) Determine the Odds Ratio and the logarithm of the odds ratio (lod score) by comparing the Likelihood for each value of R to the Likelihood for unlinked genes (R = 0.50)
(4) Add LOD scores from different families to achieve an acceptably high lod score so a specific most likely R can be assigned.
The following example used for consideration comprises of:-
Two genes showing the complete dominancethe heterozygote is indistinguishable from the dominant homozygote
STEP 1: CALCULATE THE EXPECTED FREQUENCY OF OFFSPRING FOR VALUES OF R FROM 0 TO 0.50
The expected offspring numbers are calculated as follows:
Determine the frequency of each gamete produced by the F1's. For example, if R= 0.20, then 20% of the gametes produced by either parent will be recombinant. Since there are two types of recombinant gamete, A b and a B, the frequency of each will be 0.10. Since 80% of the gametes will be parental, the frequency of the parental types A B and a b will be 0.40 each.
Use a Punnett square to determine the offspring being formed from the union of the gametes. Multiply the gamete frequencies to get the offspring frequency. For instance, one cell of the Punnett square has the A B gamete from the father combining with the A b gamete from the mother. The frequency of the A B gamete is 0.40 and the frequency of the A b gamete is 0.10. Thus the frequency of the offspring in this cell is 0.40 x 0.10 = 0.04.
Determine the phenotype for each cell in the Punnett square and add up the frequencies to get the total frequency for each offspring phenotype.

A B
0.40 A b
0.10 a B
0.10 a b
0.40
A B
0.40 A B/A B
0.16 A b/A B
0.04 a B/A B
0.04 a b/A B
0.16
A b
0.10 A B/A b
0.04 A b/A b
0.01 a B/A b
0.01 a b/A b
0.04
a B
0.10 A B/a B
0.04 A b/a B
0.01 a B/a B
0.01 a b/a B
0.04
a b
0.40 A B/a b
0.16 A b/a b
0.04 a B/a b
0.04 a b/a b
0.16

F2 Phenotype Cell Sums Expected Freq
A_ B_ 0.16+.04+.04+.16+.04+.01+.04+.01+0.16 0.66
A_ bb 0.01 + 0.04 + 0.04 0.09
aa B_ 0.01 + 0.04 + 0.04 0.09
aa bb 0.16 0.16
Using a Punnett square to determine the genotypes and multiplying the frequencies of the two gametes that go into each type of offspring, then adding up offspring that have the same phenotype.
STEP 2: EXAMINE THE OBSERVED FAMILY DATA IN LIGHT OF THE EXPECTED DISTRIBUTION OF OFFSPRING FOR EACH R VALUE
This is done by determining the likelihood (L) of the observed family for each value of R. The likelihood is simply the probability of the observed family, as determined using the multinomial theorem, an extension of the binomial theorem.

First define the terms for the observed family:
a = number of A_ B_ offspring
b = number of A_ bb offspring
c = number of aa B_ offspring
d = number of aa bb offspring
n = total offspring (= a + b + c + d)
Then define the terms for the expected family proportions (obtained from step 1 above):
p = expected proportion of A _ B _ offspring
q = expected proportion of A_ bb offspring
r = expected proportion of aa B_ offspring
s = expected proportion of aa bb offspring
The term of the multinomial equation that describes the actual family is: pa qb rc sd multiplied by a coefficient.
The coefficient is: n! /(a! b! c! d!), where ! means "factorial".
This is very similar to the coefficient for the binomial.
Thus, the likelihood equation is: L = [n! /(a! b! c! d!)]pa qb rc sd
Above calculated the expected phenotype proportions for R = 0.20 (20 map units between A and B). They are: A_ B_ = 0.66; A_ bb = 0.09; aa B_ = 0.09; aa bb = 0.16. A family of 5 children has 2 with the A_ B_ phenotype, 1 with aa B_, and 2 with aa bb.
L = (5!/2! 0! 1! 2!)(.66)2(.09)0(.09)1(.16)2
L = 30(.4356)(.09)(.0256)
L = 0.0301
The likelihood (L) needs to be calculated for all values of R between 0.01 and 0.50. Note that the coefficient will be the same for all values of R; the coefficient only depends on the observed data. When this is done, the value of R with the highest likelihood is the best estimate of R that can be obtained with data from this particular family.
STEPS 3 AND 4: COMBINING DATA FROM SEVERAL FAMILIES
The data needs to be compared and added from several different families, to get a good estimate of R. To do this, the L values must be standardized by calculating the Odds Ratio (OR), which is the ratio of the L for each R value divided by the L for R = 0.50 (unlinked). Then, the logarithm of the Odds ratio is taken; this is the LOD score. LOD scores from different families can be added. (This is equivalent to multiplying the Odds Ratios, as in the AND rule for two events--family 1 AND family 2--both occurring.) A total LOD score for some R value of 3.0 is considered proof of linkage between the two genes.
For R = 0.20, the Odds Ratio = L0.20 / L0.50. We calculated L0.20 = 0.0301 above; L0.50 = 0.00695. The Odds ratio is thus 4.331 and the LOD score is the base 10 logarithm of this, 0.637. Clearly it would take several families of this size to reach a LOD score of 3.0.

REFRENCES
Discovery and Types of Genetic Linkage By: Ingrid Lobo, Ph.D. (Write Science Right) & Kenna Shaw, Ph.D. (Executive Editor, Nature Education) © 2008 Nature Education Citation: Lobo, I. & Shaw, K. (2008) Discovery and types of genetic linkage. Nature Education 1(1)
http://www.bios.niu.edu/johns/lodprob.htm
http://www.ornl.gov/sci/techresources/Human_Genome/publicat/primer/fig8.html
http://www.bio.davidson.edu/courses/genomics/method/RFLP.html
http://biotech.about.com/od/glossary/g/RFLPdef.htm
http://www.nlm.nih.gov/visibleproofs/education/dna/rflp.pdf
http://www.biology-online.org/dictionary/Genetic_linkage
European Journal of Human Genetics (2007) 15, 362–368. doi:10.1038/sj.ejhg.5201761.
Protocol Exchange (2007) doi:10.1038/nprot.2007.343.

No comments:

Post a Comment