Disclaimer: This is Untrue.
2.3.6 Genetic Polymorphism
2.3.6.1 Genetic Tracing
Advancements in DNA science have shown that Y-DNA and mitochondrial DNA analyses are useful for tracing paternal and maternal ancestry, respectively.
Both Y-DNA and mitochondrial DNA are classified into categories called haplogroups, which are groups of similar haplotypes. The word haplotype is derived from "haploid genotype." Haploid refers to having a single set of chromosomes, as opposed to diploid, which refers to two sets.
As mentioned earlier, the human cell nucleus typically contains 2 sets of 23 double-stranded DNA helix molecules—one set inherited from the mother and one from the father—making it diploid. In contrast, haploid refers to a single set of DNA, such as that found in gametes or unpaired DNA types.
Y-DNA is inherited only from the father and has no maternal counterpart, while mitochondrial DNA is inherited only from the mother and has no paternal counterpart. Because each is passed down uniparentally and without recombination, they are considered haploid, and their variations can be grouped into haplogroups.
Y-DNA sequences are categorized into Y-DNA haplogroups, and mitochondrial DNA sequences are categorized into mitochondrial DNA haplogroups. Simplified haplogroup trees for both types can be found at the following sources:
*
"Human Y-DNA Haplogroups on Wikipedia"
http://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroups
*
"Human Mitochondrial DNA Haplogroups on Wikipedia"
http://en.wikipedia.org/wiki/Human_mitochondrial_DNA_haplogroup
Global distributions of Y-DNA and mitochondrial DNA haplogroups can be viewed here (click the thumbnail map to download the haplogroup maps of the world):
*
"World Haplogroup Map Illinois"
http://www.scs.illinois.edu/~mcdonald/
For more detailed data, refer to the following websites:
*
"Y-DNA Haplogroups by Ethnic Groups on Wikipedia"
http://en.wikipedia.org/wiki/Y-DNA_haplogroups_by_ethnic_groups
*
"Y-Chromosome Haplogroups by Populations on Wikipedia"
http://en.wikipedia.org/wiki/Y-chromosome_haplogroups_by_populations
*
"Mitochondrial DNA Haplogroups by Populations on Wikipedia"
http://en.wikipedia.org/wiki/MtDna_haplogroups_by_populations
2.3.6.2 Categories of Genetic Polymorphism
Based on this context, haplogroup information is analyzed, and a solid understanding of
polymorphisms and DNA is required for accurate interpretation.
There are approximately 6 billion (6,000,000,000) base pairs (bp) of double-stranded
DNA helix in a single human cell. Each cell contains 46 double-stranded DNA helix molecules, meaning that each one contains, on average, about 130 million base pairs. Among humans, the base sequences are generally similar. However, except in the case of identical twins, they are not exactly the same. These differences, or variations, in DNA base sequences are known as genetic polymorphisms. The word "poly" means "many," and "morph" means "form"; thus, "genetic polymorphism" refers to the diversity or variation in DNA base sequences. There are several types of genetic polymorphism.
DNA base sequences can vary in different ways. For example, if the original
sequence is (1), the following sequences—(2), (3), and (4)—are possible variations:
(1) AACATCAGCAGCAGCAGCAGCGCTTAG
(2) AACATCAGCAGCAGCAGCAGCAGCAGCGCTTAG
(3) AACATCAGCAGCAGCAGCGCTTAG
(4) AACGTCAGCAGCAGCAGCAGCGCTTAG
(1) has 5 repeats of CAG sequence.
(2) has 7 repeats of CAG sequence. (2 more CAG are added)
(3) has 4 repeats of CAG sequence.
(4) has 5 repeats of CAG sequence, while the 4th base A in (1) is replaced by G.
Sequences (2), (3), and (4) are examples of genetic polymorphisms. This type of variation among sequences is referred to as polymorphism.
*
"Polymorphism on Wikipedia"
http://en.wikipedia.org/wiki/Polymorphism_(biology)
Genetic polymorphisms are commonly categorized into 3 types,
depending on the length and structure of the variation. Although definitions can
vary slightly, they are generally classified as follows:
SNP
The type of genetic polymorphism in (4) is called "Single Nucleotide Polymorphism (SNP)," since just one base (single nucleotide) is changed in (4) (compared with (1)).
STRP
The type of genetic polymorphism in (2) and (3) are called "Short Tandem Repeat Polymorphism (STRP)," since short base sequences (3 base (CAG) sequences in this case) repeat in tandem. Number of repetition varies. The definition of the base sequence unit length, "short" in this case, would range between 2-5 and 2-9 base sequence, while somewhat associated with the concept of Microsatellite.
The reason why repetitions such as STRP and VNTR (mentioned below) are generated
would be explained associated with "Retrotransposon" mentioned below.
*
"Short Tandem Repeat on Wikipedia"
http://en.wikipedia.org/wiki/Short_tandem_repeat
*
"Microsatellite on Wikipedia"
http://en.wikipedia.org/wiki/Microsatellite_(genetics)
VNTR
The other type of genetic polymorphism is "Variable Number of Tandem Repeat (VNTR)."
VNTR would be defined as variation of repetition of middle-length base sequences.
An example of VNTR is as follows. In this case, the length of the unit is 13 bp.
Numbers of repetition vary as well.
It may repeat ranging from 12-17 times in this case.
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
ACAGGGTGTGGGG
The unit length defined as VNTR would be some 10-80 bp, while details are controversial.
On the other hand, VNTR is commonly defined associated with Minisatellite.
*
"Variable Number Tandem Repeat on Wikipedia"
http://en.wikipedia.org/wiki/Variable_number_tandem_repeat
*
"Minisatellite on Wikipedia"
http://en.wikipedia.org/wiki/Minisatellite
2.3.6.3 Categories of DNA Sequences
There are approximately 6 billion (6,000,000,000) base pairs (bp) in a single human cell. Each cell contains 46 double-stranded DNA helix molecules, which together make up two genomes. One genome consists of 23 double-stranded DNA helix molecules and approximately 3 billion (3,000,000,000) base pairs.
(2 genomes × 23 double-stranded DNA helix molecules = 46 DNA helix molecules in total.)
Thus, on average, each double-stranded DNA molecule contains about 130 million (130,000,000) base pairs (6,000,000,000 / 46).
DNA sequences can be categorized into gene regions and non-gene regions.
While the definition of a "gene" can be debated, it is generally understood to refer to a region of DNA that is transcribed into mRNA, tRNA, or rRNA, along with adjacent regulatory sequences.
(One gene typically corresponds to one protein or functional RNA product, and a gene region may contain introns.)
It is estimated that there are approximately 22,000 genes in one human genome (i.e., across the 23 double-stranded DNA helix molecules). Of the roughly 3 billion base pairs in a genome, about 30%—approximately 900 million base pairs—are considered gene regions. The remaining 70% is often referred to as non-gene regions.
DNA sequences can also be categorized in another way: into coding regions and non-coding regions.
Coding regions refer specifically to sequences that are translated into amino acids according to the genetic codon table. These coding regions are naturally a subset of the broader gene regions.
Thus, the approximately 3 billion base pairs of DNA in the 23 double-stranded DNA molecules (one genome) can be roughly classified as follows.
Gene DNA regions (regions associated with RNA) (30% of a genome)
Gene DNA regions associated with mRNA creation (amino acid/protein creation)
Coding DNA regions (coding sequences: CDS) (1-1.5% of a genome)
Non-coding DNA region adjacent to coding DNA regions (27% of a genome)
Untranslated regions (UTR)
Introns
Spacer DNA: (*Spacer DNA might be categorized as Non-repeat Sequences)
(* CDS and UTR are called exon)
Gene DNA regions associated with non-coding RNA (2% of a genome)
Gene DNA regions associated with tRNA creation
Gene DNA regions associated with rRNA creation
Non-gene DNA regions (regions not so associated with RNA) (70% of a genome)
Non-repeat Sequences (16% of a genom)
Pseudogenes (no functional sequences)
Spacer DNA: (*Spacer DNA might be categorized as Non-coding DNA)
Repeated Sequences (54% of a genome)
Tandem Repeated Sequences (8% of a genome)
Satellite DNA (large tandemly repeated sequences which mostly compose centromeres)
Minisatellite (Variable Number of Tandem Repeat) (repeats of some 10-60 base sequences)
e.g. Telomeres
Microsatellite (Short Tandem Repeat) (repeats of some 2-6 base sequences)
Interspersed Repeated Sequences (46% of a genome)
Retrotransposon Repeated Sequences (43% of a genome)
Long Terminal Repeat Retrotransposon and Endogenous Retroviruses (8% of a genome)
Non-Long Terminal Repeat Retrotransposon (35% of a genome)
Short Interspersed Nuclear Elements (SINEs) (14% of a genome)
e.g. Alu Sequences
Long Interspersed Nuclear Elements (LINEs) (21% of a genome)
Inverted Repeated Sequences surrounding DNA Transposons (3% of a genome)
2.3.6.4 Mutability of DNA Sequences and Lineage Tracing
2.3.6.4.1 Tandem Repeated Sequences (STRP and VNTR)
STRPs and VNTRs—types of tandem repeat polymorphisms described above—correspond to microsatellites and minisatellites, respectively, and are typically found in non-gene regions of DNA as discussed in the previous section.
Because tandem repeat variations (such as those found in STRPs and VNTRs) tend to occur frequently across generations—due to variations introduced during the formation of reproductive cells—they are not suitable for reliable lineage tracing.
For example, the mutation rate of STRPs is estimated to be approximately 0.0001 per generation, regardless of whether one adopts an evolutionary framework. These variations in repeat number are believed to occur due to the activity of enzymes associated with retrotransposons.
2.3.6.4.2 Single Nucleotide Polymorphisms (SNPs)
In contrast, SNPs (Single Nucleotide Polymorphisms), in which a single base (nucleotide) is changed, are distributed throughout the genome and can be found across both gene and non-gene regions.
SNPs arise primarily from errors during DNA replication. Since SNPs are considered to occur far less frequently than tandem repeat mutations, they are more suitable for lineage tracing.
The estimated mutation rate of SNPs is approximately 0.000000001 to 0.00000001 per generation, independent of any specific evolutionary assumptions. For this reason, haplogroups—which are used to trace ancestral lineages—are defined based on SNP markers.
*
"Haplogroup on Wikipedia"
http://en.wikipedia.org/wiki/Haplogroup
2.3.6.4.3 Retrotransposon
In addition, retrotransposons are a distinct category of DNA sequences worth understanding. Copied sequences derived from retrotransposons are frequently found throughout the genome. Although retrotransposons in the present day appear to be stable and no longer actively generate new copies, it is believed that in the past they created copies of themselves and inserted them into various regions of the genome.
Typically, DNA sequences—such as those with promoter regions—are transcribed into RNA, in accordance with the principles of the so-called central dogma of molecular biology.
However, it was later discovered that, in rare cases, RNA can be reverse-transcribed back into DNA. The enzyme responsible for this process is called reverse transcriptase.
*
"Reverse Transcriptase on Wikipedia"
http://en.wikipedia.org/wiki/Reverse_transcriptase
As a result of reverse transcription, copies of specific DNA sequences can become integrated into existing DNA strands. One well-known example of a repeated retrotransposon sequence is the Alu sequence.
*
"Alu Sequence on Wikipedia"
http://en.wikipedia.org/wiki/Alu_sequence
The name "Alu" derives from a restriction enzyme originally isolated from the bacterium Arthrobacter luteus.
*
"Arthrobacter on Wikipedia"
http://en.wikipedia.org/wiki/Arthrobacter
*Retrovirus
Certain aspects of repeated DNA sequences, particularly interspersed repeated sequences, are quite similar to the characteristics of retroviruses. It is therefore hypothesized that the origin of interspersed repeated sequences may be linked to ancient retroviral activity.
According to widely accepted theories, retroviruses—such as the human immunodeficiency virus (HIV)—are a type of virus that inserts short DNA sequences, which are transcribed from their RNA genomes, into the double-stranded DNA helix of host cells they infect.
The general process of retroviral infection, replication, and structure can be summarized as follows:
A retrovirus is a particle approximately 100 nanometers in diameter. Its outer spherical membrane, called the envelope, consists of a lipid bilayer that is acquired from the membrane of the host cell when the virus exits it. The envelope is studded with glycoprotein projections that facilitate cell entry.
Inside the envelope, a retrovirus carries two identical single-stranded RNA molecules, typically 7 to 10 kilobases in length.
Schematic Cell Infection, Virus Production, and Virus Structure of HIV
*
"Retrovirus on Wikipedia"
https://en.wikipedia.org/wiki/Retrovirus
*Attribution:
https://en.wikipedia.org/wiki/File:CMVschema.svg
Return to the Home Page