Disclaimer: This is Untrue.


2.3.6 Genetic Polymorphism

2.3.6.1 Genetic Tracing

DNA science found out that Y-DNA and mitochondrial DNA analysis is useful for identifying paternal and maternal ancestry.
Sequences of Y-DNAs and mitochondrial DNAs are categorized as "haplogroups." A haplogroup is a group of similar haplotypes. A haplotype is a compound word from "haploid genotype." Haploid is the antonym of diploid. As mentioned before, a cell nucleus of humans originally includes similar but somewhat different 2 sets of 23 double-stranded helix DNAs. 23 double-stranded helix DNAs (one set) from the mother and 23 double-stranded helix DNAs (the other set) from the father. This is called "diploid." In contrast to the concept of "diploid," "haploid" means one set of double-stranded helix DNAs. Then "haploid genotype" means type of genetics consisting of one set of double-stranded helix DNAs from the mother or from the father.
Y-DNA doesn't have its counterpart from the mother. Mitochondrial DNA doesn't have its counterpart from the father. Then Y-DNA and mitochondrial DNA could be called haploid. Y-DNAs and mitochondrial DNAs are categorized as "haploid genotype," consequently, "haplogroup."
Sequences of Y-DNAs are categorized as Y-DNA haplogroups. An outline of the categorization can be seen as a haplogroup tree in the following website.
* "Human Y-DNA Haplogroups in Wikipedia" http://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroups
Sequences of mitochondrial DNAs are categorized as mitochondrial DNA haplogroups. An outline of the categorization can be seen as a haplogroup tree in the following wrbsite.
* "Human Mitochondrial DNA Haplogroups in Wikipedia" http://en.wikipedia.org/wiki/Human_mitochondrial_DNA_haplogroup
Distribution of Y-DNA haplogroups and mitochondrial DNA in the world could be seen in the following website. Click the thumbnail map of the following website to download the haplogroup maps of the world.
* "World Haplogroup Map Illinois" http://www.scs.illinois.edu/~mcdonald/
Details could be seen in the following websites.
* "Y-DNA Haplogroups by Ethnic Groups in Wikipedia" http://en.wikipedia.org/wiki/Y-DNA_haplogroups_by_ethnic_groups
* "Y-Chromosome Haplogroups by Populations in Wikipedia" http://en.wikipedia.org/wiki/Y-chromosome_haplogroups_by_populations
* "Mitochondrial DNA Haplogroups by Populations in Wikipedia" http://en.wikipedia.org/wiki/MtDna_haplogroups_by_populations

2.3.6.2 Categories of Genetic Polymorphism

Based on such context, Haplogroup information would be analized, while knowledge of polymorphism and DNA is required for an accurate understanding.

Base pairs (bp) of double-stranded helix DNAs in a human cell are some 6,000,000,000. One cell has 46 double-stranded helix DNAs. Then one double-stranded helix DNA has some 130,000,000 base pairs (bp) on average. As far as they are humans, base sequences are basically similar. However, they are not the same aside from twins. The difference or diversity of DNA base sequences is called genetic polymorphism. "poly" means "multi," "morph" means "form," and "genetic polymorphism" roughly means "diversity, variation, or variety of DNA base sequences." There are some types of genetic polymorphism.

Base sequences of DNAs sometimes vary. For example, if an original sequence is (1), examples of varieties could be like (2), (3), and (4).

(1) AACATCAGCAGCAGCAGCAGCGCTTAG
(2) AACATCAGCAGCAGCAGCAGCAGCAGCGCTTAG
(3) AACATCAGCAGCAGCAGCGCTTAG
(4) AACGTCAGCAGCAGCAGCAGCGCTTAG

(1) has 5 repeats of CAG sequence.
(2) has 7 repeats of CAG sequence. (2 more CAG are added)
(3) has 4 repeats of CAG sequence.
(4) has 5 repeats of CAG sequence, while the 4th base A in (1) is replaced by G.
These ((2), (3), and (4)) are examples of genetic polymorphism.
Diversity of the generated sequences or generated sequence is called "polymorphism."
* "Polymorphism in Wikipedia" http://en.wikipedia.org/wiki/Polymorphism_(biology)

Genetic polymorphism would commonly be categorized into 3 categories as follows depending on the length of the repetitive units, while the definitions are somewhat controversial.

SNP
The type of genetic polymorphism in (4) could be called "Single Nucleotide Polymorphism (SNP)," since just one base (single nucleotide) is changed in (4) (compared with (1)).

STRP
The type of genetic polymorphism in (2) and (3) could be called "Short Tandem Repeat Polymorphism (STRP)," since short base sequences (3 base (CAG) sequences in this case) repeat in tandem. Number of repetition varies. The definition of the base sequence unit length, "short" in this case, would range between 2-5 and 2-9 base sequence, while somewhat associated with the concept of Microsatellite.
The reason why repetitions such as STRP and VNTR (mentioned below) are generated would be explained associated with "Retrotransposon" mentioned below.
* "Short Tandem Repeat in Wikipedia" http://en.wikipedia.org/wiki/Short_tandem_repeat
* "Microsatellite in Wikipedia" http://en.wikipedia.org/wiki/Microsatellite_(genetics)

VNTR
The other type of genetic polymorphism is "Variable Number of Tandem Repeat (VNTR)." VNTR would be defined as variation of repetition of middle-length base sequences.
An example of VNTR is as follows. In this case, the length of the unit is 13 bp. Numbers of repetition vary as well. It may repeat ranging from 12-17 times in this case.
ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG
The unit length defined as VNTR would be some 10-80 bp, while details are controversial. On the other hand, VNTR is commonly defined associated with Minisatellite.
* "Variable Number Tandem Repeat in Wikipedia" http://en.wikipedia.org/wiki/Variable_number_tandem_repeat
* "Minisatellite in Wikipedia" http://en.wikipedia.org/wiki/Minisatellite

2.3.6.3 Categories of DNA Sequences

Base pairs in one human cell are some 6,000,000,000. One cell has 46 double-stranded helix DNAs. 46 double-stranded helix DNAs consist of 2 genomes. 1 genome consists of 23 double-stranded helix DNAs, 3,000,000,000 base pairs (bp). (2 genomes × 23 double-stranded helix DNAs = 46 DNAs) Then one double-stranded helix DNA has 130,000,000 base pairs (bp) (=6,000,000,000/46) on average. DNA sequence could be categorized into "gene DNA regions" and "non-gene DNA regions."
The definition of "gene" is disputable, while the dominant meaning of "a gene" would be a unit (or information) of "DNA sequence (a region) which will be transcripted into mRNA, tRNA, or rRNA and adjacent related sequences." (One gene basically corresponds to one protein or one RNA to be created. A gene (region) may include introns.) It is said that some 22,000 genes (regions) are in one genome, 23 double-stranded helix DNAs.
Among the 3,000,000,000 base pairs (bp) of 23 double-stranded helix DNAs, gene DNA regions are 30%, 900,000,000 base pairs (bp). The other 70% of might be called "non-gene DNA region."
DNA sequence might be categorized into another way, "coding DNA regions" and "non-coding DNA regions." "coding DNA regions" means base sequence regions which code for amino acids following the codon table. "coding DNA regions" are naturally included in "gene DNA regions."
Then sequence of 3,000,000,000 base pairs (bp) of human 23 double-stranded helix DNAs (one genome) would be roughly categorized as follows.

Gene DNA regions (regions associated with RNAs) (30% of a genome)
  Gene DNA regions associated with mRNA creation (amino acid/protein creation)
    Coding DNA regions (coding sequences: CDS) (1-1.5% of a genome)
    Non-coding DNA region adjacent to coding DNA regions (27% of a genome)
      Untranslated regions (UTR)
      Introns
      Spacer DNA: (*Spacer DNA might be categorized as Non-repeat Sequences)
    (* CDS and UTR are called exon)
  Gene DNA regions associated with non-coding RNAs (2% of a genome)
    Gene DNA regions associated with tRNA creation
    Gene DNA regions associated with rRNA creation
Non-gene DNA regions (regions not so associated with RNAs) (70% of a genome)
  Non-repeat Sequences (16% of a genom)
    Pseudogenes (no functional sequences)
    Spacer DNA: (*Spacer DNA might be categorized as Non-coding DNA)
  Repeated Sequences (54% of a genome)
    Tandem Repeated Sequences (8% of a genome)
      Satellite DNA (large tandemly repeated sequences which mostly compose centromeres)
      Minisatellite (Variable Number of Tandem Repeat) (repeats of some 10-60 base sequences)
        e.g. Telomeres
      Microsatellite (Short Tandem Repeat) (repeats of some 2-6 base sequences)
    Interspersed Repeated Sequences (46% of a genome)
      Retrotransposon Repeated Sequences (43% of a genome)
        Long Terminal Repeat Retrotransposon and Endogenous Retroviruses (8% of a genome)
        Non-Long Terminal Repeat Retrotransposon (35% of a genome)
          Short Interspersed Nuclear Elements (SINEs) (14% of a genome)
            e.g. Alu Sequences
          Long Interspersed Nuclear Elements (LINEs) (21% of a genome)
      Inverted Repeated Sequences surrounding DNA Transposons (3% of a genome)
        
2.3.6.4 Mutability of DNA Sequences and Lineage Tracing

2.3.6.4.1 Tandem Repeated Sequences (STRP and VNTR)

STRP and VNTR, tandem repeating genetic polymorphism mentioned above, correspond to Minisatellite and Microsatellite of Non-gene DNA regions in the categorization above. Since it is said that genetic polymorphism of STRP and VNTR (variation of tandem repetition) frequently occur through generations (through varied reproductive cells), STRP and VNTR don't fit tracing lineage. For example, frequency (probability) of STRP is said to be some 0.0001/generation (regardless of whether based on the theory of evolution or not). Varieties of repetition supposedly occur through enzymes associated with Retrotransposon.

2.3.6.4.2 Single Nucleotide (SNP)

In contrast, SNPs (Single Nucleotide Polymorphism; just one base (single nucleotide) is changed) lie everywhere in a genome over the categorization above. SNPs occur through mistranslation of DNA duplication. Since it is said that genetic polymorphism of SNP less frequently occurs, SNPs fit tracing lineage. Frequency (probability) of SNP is said to be some 0.000000001/generation - 0.00000001/generation (regardless of whether based on the theory of evolution or not). That's why haplogroup is defined associated with SNPs.
* "Haplogroup in Wikipedia" http://en.wikipedia.org/wiki/Haplogroup

2.3.6.4.3 Retrotransposon

In addition, Retrotransposons are distinctive sequences to be learned. Retrotransposons' copied sequences are frequently found in genomes. Present-day Retrotransposons seem stable and they won't create new copied sequences. However, it seems that Retrotransposons might have ever created their copies and inserted them into DNA sequences.
Generally, DNA sequences with promoters and so on would be transcribed into RNAs, specific sequences of DNAs are transcribed into RNAs. This is part of so-called "central dogma."
However, it was found that RNAs could be transcribed into DNA sequences in rare cases (reverse transcription). The enzymes to carry out reverse transcription are called "reverse transcriptase."
* "Reverse Transcriptase in Wikipedia" http://en.wikipedia.org/wiki/Reverse_transcriptase
Then copies of specific DNA sequences could be inserted into existing DNA sequences. A representative repeated Retrotransposon sequence is "Alu sequence."
* "Alu Sequence in Wikipedia" http://en.wikipedia.org/wiki/Alu_sequence
It was named after the relevant enzyme isolated from bacteria "Arthrobacter luteus."
* "Arthrobacter in Wikipedia" http://en.wikipedia.org/wiki/Arthrobacter

*Retrovirus
Aspects of Repeated Sequences, particularly Interspersed Repeated Sequences, are quite similar to Retroviruses. Then origin of Interspersed Repeated Sequences might be attributed to Retroviruses.
According to the widely accepted theories, retroviruses such as the human immunodeficiency viruses (HIV) are a type of viruses that insert short double-stranded helix DNAs transcribed from their RNA genomes into the double-stranded helix DNAs of the host cells that they invaded. Their schematic cell infection, virus production, and virus structure would be like below.
The form of a retrovirus is a particle about 100 nm in diameter. The outer spherical membrane, called envelope, consists of lipid bilayer, which was taken away from the former host cell when the virus left the former host cell. Projections consist of glycoproteins. A retrovirus contains two identical single-stranded RNA molecules 7-10 kilobases in length.

Schematic Cell Infection, Virus Production, and Virus Structure of HIV

* "Retrovirus in Wikipedia" https://en.wikipedia.org/wiki/Retrovirus


*Attribution: https://en.wikipedia.org/wiki/File:CMVschema.svg






Return to the Home Page