Sequence types and features ontology (SO)

A structured controlled vocabulary for sequence annotation, for the exchange of annotation data and for the description of sequence objects in databases.

Open in the Ontology Lookup Service (OLS)


2KB_downstream_variant [SO_0002083]

A sequence variant located within 2KB 3’ of a gene.

2KB_upstream_variant [SO_0001636]

A sequence variant located within 2KB 5’ of a gene.

3_prime_UTR_elongation [SO_0002016]

A sequence variant that causes the extension of 3’ UTR, with regard to the reference sequence.

3_prime_UTR_exon_variant [SO_0002089]

A UTR variant of exonic sequence of the 3’ UTR. Requested by visze github tracker ID 346.

3_prime_UTR_intron_variant [SO_0002090]

A UTR variant of intronic sequence of the 3’ UTR. Requested by visze github tracker ID 346.

3_prime_UTR_truncation [SO_0002015]

A sequence variant that causes the reduction of a the 3’ UTR with regard to the reference sequence.

3_prime_UTR_variant [SO_0001624]

A UTR variant of the 3’ UTR. EBI term 3prime UTR variations - In 3prime UTR.

3D_polypeptide_structure_variant [SO_0001599]

A sequence variant that changes the resulting polypeptide structure.

4_methylcytosine [SO_0001919]

A cytosine methylated at the 4 nitrogen.

5_carboxylcytosine [SO_0001966]

A modified DNA cytosine base feature, modified by a carboxy group at the 5 carbon.

5_formylcytosine [SO_0001961]

A modified DNA cytosine base feature, modified by a formyl group at the 5 carbon.

5_hydroxymethylcytosine [SO_0001960]

A modified DNA cytosine base feature, modified by a hydroxymethyl group at the 5 carbon.

5_methylcytosine [SO_0001918]

A cytosine methylated at the 5 carbon.

5_prime_UTR_elongation [SO_0002014]

A sequence variant that causes the extension of 5’ UTR, with regard to the reference sequence.

5_prime_UTR_exon_variant [SO_0002092]

A UTR variant of exonic sequence of the 5’ UTR. Requested by visze github tracker ID 346.

5_prime_UTR_intron_variant [SO_0002091]

A UTR variant of intronic sequence of the 5’ UTR. Requested by visze github tracker ID 346.

5_prime_UTR_premature_start_codon_gain_variant [SO_0001988]

A 5’ UTR variant where a premature start codon is gained.

5_prime_UTR_premature_start_codon_loss_variant [SO_0001989]

A 5’ UTR variant where a premature start codon is lost.

5_prime_UTR_premature_start_codon_variant [SO_0001983]

A 5’ UTR variant where a premature start codon is introduced, moved or lost. Requested by Andy Menzies at the Sanger. This isn’t necessarily a protein coding change. A premature start codon can effect the production of a mature protein product by providing a competing translation start point. Some genes balance their expression this way, eg THPO requires the presence of a premature start to limit expression, its loss leads to Familial thrombocythemia.

5_prime_UTR_truncation [SO_0002013]

A sequence variant that causes the reduction of a the 5’UTR with regard to the reference sequence.

5_prime_UTR_variant [SO_0001623]

A UTR variant of the 5’ UTR. EBI term: 5prime UTR variations - In 5prime UTR (untranslated region).

500B_downstream_variant [SO_0001634]

A sequence variant located within a half KB of the end of a gene.

5KB_downstream_variant [SO_0001633]

A sequence variant located within 5 KB of the end of a gene. EBI term Downstream variations - Within 5 kb downstream of the 3prime end of a transcript.

5KB_upstream_variant [SO_0001635]

A sequence variant located within 5KB 5’ of a gene. EBI term Upstream variations - Within 5 kb upstream of the 5prime end of a transcript.

8_oxoadenine [SO_0001967]

A modified DNA adenine base,at the 8 carbon, often the product of DNA damage.

8_oxoguanine [SO_0001965]

A modified DNA guanine base,at the 8 carbon, often the product of DNA damage.

A_box_type_1 [SO_0001675]

An A box within an RNA polymerase III type 1 promoter. The A box can be found in the promoters of type 1 and type 2 (pol III) so sub-typing here allows the part of relationship of the subtypes to remain true.

A_box_type_2 [SO_0001676]

An A box within an RNA polymerase III type 2 promoter. The A box can be found in the promoters of type 1 and type 2 (pol III) so sub-typing here allows the part of relationship of the subtypes to remain true.

A_minor_RNA_motif [SO_0000022]

A region forming a motif, composed of adenines, where the minor groove edges are inserted into the minor groove of another helix.

A_to_C_transversion [SO_1000024]

A transversion from adenine to cytidine.

A_to_G_transition [SO_1000015]

A transition of an adenine to a guanine.

A_to_T_transversion [SO_1000025]

A transversion from adenine to thymine.

AACCCT_box [SO_0001901]

A conserved 17-bp sequence (5’-ATCA(C/A)AACCCTAACCCT-3’) commonly present upstream of the start site of histone transcription units functioning as a transcription factor binding site.

aberrant_processed_transcript [SO_0000681]

A transcript that has been processed “incorrectly”, for example by the failure of splicing of one or more exons.

accessible_DNA_region [SO_0002331]

A region of DNA that is depleted of nucleosomes and accessible to DNA-binding proteins including transcription factors and nucleases. Added as part of GREEKC terms. See GitHub Issues #531 & #534.

Ace2_UAS [SO_0001857]

A promoter element with consensus sequence CCAGCC, bound by the fungal transcription factor Ace2.

active_peptide [SO_0001064]

Active peptides are proteins which are biologically active, released from a precursor molecule. Hormones, neuropeptides, antimicrobial peptides, are active peptides. They are typically short (<40 amino acids) in length.

adaptive_island [SO_0000775]

An adaptive island is a genomic island that provides an adaptive advantage to the host. The iron-uptake ability of many pathogens are conveyed by adaptive islands. Nature Reviews Microbiology 2, 414-424 (2004); doi:10.1038 micro 884 GENOMIC ISLANDS IN PATHOGENIC AND ENVIRONMENTAL MICROORGANISMS Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jorg Hacker.

alanine [SO_0001435]

A non-polar, hydorophobic amino acid encoded by the codons GCN (GCT, GCC, GCA and GCG). A place holder for a cross product with chebi.

alanine_tRNA_primary_transcript [SO_0000211]

A primary transcript encoding alanyl tRNA.

alanyl_tRNA [SO_0000254]

A tRNA sequence that has an alanine anticodon, and a 3’ alanine binding region.

allele [SO_0001023]

An allele is one of a set of coexisting sequence variants of a gene.

allelic_frequency [SO_0002119]

A physical quality which inheres to the allele by virtue of the number instances of the allele within a population. This is the relative frequency of the allele at a given locus in a population. Requested by HL7 clinical genomics group.

allelically_excluded [SO_0000137]

Allelic exclusion is a process occurring in diploid organisms, where a gene is inactivated and not expressed in that cell. Examples are x-inactivation and immunoglobulin formation.

allelically_excluded_gene [SO_0000897]

A gene that is allelically_excluded.

allopolyploid [SO_0001256]

A polyploid where the multiple chromosome set was derived from a different organism.

alpha_beta_motif [SO_0100008]

A motif of five consecutive residues and two H-bonds in which: H-bond between CO of residue(i) and NH of residue(i+4), H-bond between CO of residue(i) and NH of residue(i+3),Phi angles of residues(i+1), (i+2) and (i+3) are negative.

alteration_attribute [SO_0001508]

An attribute of alteration of one or more chromosomes.

alternate_sequence_site [SO_0001149]

Description of sequence variants produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting. Discrete.

alternatively_spliced [SO_0000877]

An attribute describing a situation where a gene may encode for more than 1 transcript.

alternatively_spliced_transcript [SO_1001187]

A transcript that is alternatively spliced.

Alu_deletion [SO_0002070]

A deletion of an Alu mobile element with respect to a reference.

Alu_insertion [SO_0002063]

An insertion of sequence from the Alu family of mobile elements.

ambisense_ssRNA_viral_sequence [SO_0001202]

A ambisense_RNA_virus is a ss_RNA_viral_sequence that is the sequence of a single stranded RNA virus with both messenger and anti messenger polarity.

amino_acid_deletion [SO_0001604]

A sequence variant within a CDS resulting in the loss of an amino acid from the resulting polypeptide.

amino_acid_insertion [SO_0001605]

A sequence variant within a CDS resulting in the gain of an amino acid to the resulting polypeptide.

amino_acid_substitution [SO_0001606]

A sequence variant of a codon resulting in the substitution of one amino acid for another in the resulting polypeptide.

amplification_origin [SO_0000750]

An origin_of_replication that is used for the amplification of a chromosomal nucleic acid sequence.

anchor_binding_site [SO_0000977]

Part of an edited transcript only. [anchor_binding_site; transcript_region; anchor binding site]

anchor_region [SO_0000931]

A region of a guide_RNA that base-pairs to a target mRNA.

androgen_response_element [SO_0001853]

A non-palindromic sequence found in the promoters of genes whose expression is regulated in response to androgen.

aneuploid_chromosome [SO_0000550]

A chromosome structural variation whereby either a chromosome exists in addition to the normal chromosome complement or is lacking. Examples are Nullo-4, Haplo-4 and triplo-4 in Drosophila.

annotation_directed_improved_draft [SO_0001489]

The status of a whole genome sequence,where annotation, and verification of coding regions has occurred.

anti_ARRET [SO_0001926]

A non-coding RNA transcript, derived from the transcription of the telomere. These transcripts are antisense of ARRET transcripts. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.

anticodon [SO_0001174]

A sequence of three nucleotide bases in tRNA which recognizes a codon in mRNA.

anticodon_loop [SO_0001173]

A sequence of seven nucleotide bases in tRNA which contains the anticodon. It has the sequence 5’-pyrimidine-purine-anticodon-modified purine-any base-3.

antiparallel_beta_strand [SO_0001112]

A peptide region which hydrogen bonded to another region of peptide running in the oposite direction (one running N-terminal to C-terminal and one running C-terminal to N-terminal). Hydrogen bonding occurs between every other C=O from one strand to every other N-H on the adjacent strand. In this case, if two atoms C-alpha (i) and C-alpha (j) are adjacent in two hydrogen-bonded beta strands, then they form two mutual backbone hydrogen bonds to each other’s flanking peptide groups; this is known as a close pair of hydrogen bonds. The peptide backbone dihedral angles (phi, psi) are about (-140 degrees, 135 degrees) in antiparallel sheets. Range.

antisense_lncRNA [SO_0001904]

Non-coding RNA transcribed from the opposite DNA strand compared with other transcripts and overlap in part with sense RNA. Relationship is_a SO:0000644 antisense_RNA added 23 April 2021. See GitHub Issue #443

antisense_primary_transcript [SO_0000645]

The reverse complement of the primary transcript.

AP_1_binding_site [SO_0001842]

A promoter element with consensus sequence TGACTCA, bound by AP-1 and related transcription factors.

apicoplast_chromosome [SO_0001259]

A chromosome originating in an apicoplast.

apicoplast_gene [SO_0000091]

A gene from apicoplast sequence.

apicoplast_sequence [SO_0000743]

DNA belonging to the genome of an apicoplast, a non-photosynthetic plastid.

aptamer [SO_0000031]

DNA or RNA molecules that have been selected from random pools based on their ability to bind other molecules.

archaeal_intron [SO_1001271]

An intron characteristic of Archaeal tRNA and rRNA genes, where intron transcript generates a bulge-helix-bulge motif that is recognised by a splicing endoribonuclease. Intron characteristic of tRNA genes; splices by an endonuclease-ligase mediated mechanism.

archaeosine [SO_0001323]

Archaeosine is a modified 7-deazoguanosine.

arginine [SO_0001451]

A positively charged, hydorophilic amino acid encoded by the codons CGN (CGT, CGC, CGA and CGG), AGA and AGG. A place holder for a cross product with chebi.

arginine_tRNA_primary_transcript [SO_0000212]

A primary transcript encoding arginyl tRNA (SO:0000255).

arginyl_tRNA [SO_0001036]

A tRNA sequence that has an arginine anticodon, and a 3’ arginine binding region.

ARIA [SO_0001925]

A non-coding RNA transcript, derived from the transcription of the telomere. These transcripts consist of C rich repeats. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.

ARRET [SO_0001924]

A non coding RNA transcript, complementary to subtelomeric tract of TERRA transcript but devoid of the repeats. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.

ARS [SO_0000436]

A sequence that can autonomously replicate, as a plasmid, when transformed into a bacterial host.

ARS_consensus_sequence [SO_0002004]

The ACS is an 11-bp sequence of the form 5’-WTTTAYRTTTW-3’ which is at the core of every yeast ARS, and is necessary but not sufficient for recognition and binding by the origin recognition complex (ORC). Functional ARSs require an ACS, as well as other cis elements in the 5’ (C domain) and 3’ (B domain) flanking sequences of the ACS.

asparagine [SO_0001449]

A polar, hydorophilic amino acid encoded by the codons AAT and AAC. A place holder for a cross product with chebi.

asparagine_tRNA_primary_transcript [SO_0000213]

A primary transcript encoding asparaginyl tRNA (SO:0000256).

asparaginyl_tRNA [SO_0000256]

A tRNA sequence that has an asparagine anticodon, and a 3’ asparagine binding region.

aspartic_acid [SO_0001453]

A negatively charged, hydorophilic amino acid encoded by the codons GAT and GAC. A place holder for a cross product with chebi.

aspartic_acid_tRNA_primary_transcript [SO_0000214]

A primary transcript encoding aspartyl tRNA (SO:0000257).

aspartyl_tRNA [SO_0000257]

A tRNA sequence that has an aspartic acid anticodon, and a 3’ aspartic acid binding region.

ASPE_primer [SO_0001698]

“A primer containing an SNV at the 3’ end for accurate genotyping.

assembly_error_correction [SO_0001525]

A region of sequence where the final nucleotide assignment differs from the original assembly due to an improvement that replaces a mistake.

assortment_derived_aneuploid [SO_0000803]

A multi-chromosome aberration generated by reassortment of other aberration components; presumed to have a deficiency or a duplication.

assortment_derived_deficiency [SO_0000802]

A multi-chromosome deficiency aberration generated by reassortment of other aberration components.

assortment_derived_deficiency_plus_duplication [SO_0000801]

A multi-chromosome aberration generated by reassortment of other aberration components; presumed to have a deficiency and a duplication.

assortment_derived_duplication [SO_0000800]

A multi-chromosome duplication aberration generated by reassortment of other aberration components.

assortment_derived_variation [SO_0001504]

A chromosome variation derived from an event during meiosis.

asx_motif [SO_0001106]

A motif of five consecutive residues and two H-bonds in which: Residue(i) is Aspartate or Asparagine (Asx), side-chain O of residue(i) is H-bonded to the main-chain NH of residue(i+2) or (i+3), main-chain CO of residue(i) is H-bonded to the main-chain NH of residue(i+3) or (i+4).

asx_turn [SO_0000912]

A motif of three consecutive residues and one H-bond in which: residue(i) is Aspartate or Asparagine (Asx), the side-chain O of residue(i) is H-bonded to the main-chain NH of residue(i+2).

asx_turn_left_handed_type_one [SO_0001129]

Left handed type I (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, -90 degrees < psi +120 degrees < +40 degrees. Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.

asx_turn_left_handed_type_two [SO_0001130]

Left handed type II (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, +80 degrees < psi +120 degrees < +180 degrees. Residue(i+1): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.

asx_turn_right_handed_type_one [SO_0001132]

Right handed type I (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, -90 degrees < psi +120 degrees < +40 degrees. Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.

asx_turn_right_handed_type_two [SO_0001131]

Right handed type II (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, +80 degrees < psi +120 degrees < +180 degrees. Residue(i+1): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.

asymmetric_RNA_internal_loop [SO_0000021]

An internal RNA loop where one of the strands includes more bases than the corresponding region on the other strand.

attB_site [SO_0000943]

An integration/excision site of a bacterial chromosome at which a recombinase acts to insert foreign DNA containing a cognate integration/excision site.

attC_site [SO_0000950]

An attC site is a sequence required for the integration of a DNA of an integron.

attCtn_site [SO_0001043]

An attachment site located on a conjugative transposon and used for site-specific integration of a conjugative transposon.

attenuator [SO_0000140]

A sequence segment located within the five prime end of an mRNA that causes premature termination of translation.

attI_site [SO_0000367]

A region within an integron, adjacent to an integrase, at which site specific recombination involving an attC_site takes place.

attL_site [SO_0000944]

A region that results from recombination between attP_site and attB_site, composed of the 5’ portion of attB_site and the 3’ portion of attP_site.

attP_site [SO_0000942]

An integration/excision site of a phage chromosome at which a recombinase acts to insert the phage DNA at a cognate integration/excision site on a bacterial chromosome.

attR_site [SO_0000945]

A region that results from recombination between attP_site and attB_site, composed of the 5’ portion of attP_site and the 3’ portion of attB_site.

AUG_initiated_uORF [SO_0002150]

A uORF beginning with the canonical start codon AUG.

autocatalytically_spliced_intron [SO_0000588]

A self spliced intron.

autoregulated [SO_0000471]

The gene product is involved in its own transcriptional regulation.

autosynaptic_chromosome [SO_1000136]

An autosynaptic chromosome is the aneuploid product of recombination between a pericentric inversion and a cytologically wild-type chromosome.

B_box [SO_0000620]

A variably distant linear promoter region recognized by TFIIIC, with consensus sequence AGGTTCCAnnCC. Binds TFIIIC.

BAC [SO_0000153]

Bacterial Artificial Chromosome, a cloning vector that can be propagated as mini-chromosomes in a bacterial host. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

BAC_cloned_genomic_insert [SO_0000992]

A region of DNA that has been inserted into the bacterial genome using a bacterial artificial chromosome. Requested by Andy Schroder - Flybase Harvard, Nov 2006.

BAC_end [SO_0000999]

A region of sequence from the end of a BAC clone that may provide a highly specific marker. Requested by Keith Boroevich December, 2006.

BAC_read_contig [SO_0001866]

A contig of BAC reads. Requested by Bayer Cropscience December, 2011.

bacterial_RNApol_promoter [SO_0000613]

A DNA sequence to which bacterial RNA polymerase binds, to begin transcription. former parent RNA_polymerase_promoter SO:0001203 was merged with promoter SO:0000167 in Aug 2020 as part of GREEKC.

bacterial_RNApol_promoter_sigma_70 [SO_0001671]

A DNA sequence to which bacterial RNA polymerase sigma 70 binds, to begin transcription.

bacterial_RNApol_promoter_sigma_ecf_element [SO_0001913]

A bacterial promoter with sigma ecf factor binding dependency. This is a type of bacterial promoters that requires a sigma ECF factor to bind to identified -10 and -35 sequence regions in order to mediate binding of the RNA polymerase to the promoter region as part of transcription initiation. Requested by Kevin Clancy - invitrogen -May 2012.

bacterial_RNApol_promoter_sigma54 [SO_0001672]

A DNA sequence to which bacterial RNA polymerase sigma 54 binds, to begin transcription.

bacterial_terminator [SO_0000614]

A terminator signal for bacterial transcription. Moved to transcriptional_cis_regulatory_region (SO:0001055) from gene_group_regulatory_region (SO:0000752) on 11 Feb 2021 when SO:0000752 was merged into SO:0001055. See GitHub Issue #529.

base_call_error_correction [SO_0001526]

A region of sequence where the final nucleotide assignment is different from that given by the base caller due to an improvement that replaces a mistake.

base_pair [SO_0000028]

Two bases paired opposite each other by hydrogen bonds creating a secondary structure.

benign_variant [SO_0001770]

A variant that does not affect the function of the gene or cause disease.

beta_bulge [SO_0001107]

A motif of three residues within a beta-sheet in which the main chains of two consecutive residues are H-bonded to that of the third, and in which the dihedral angles are as follows: Residue(i): -140 degrees < phi(l) -20 degrees , -90 degrees < psi(l) < 40 degrees. Residue (i+1): -180 degrees < phi < -25 degrees or +120 degrees < phi < +180 degrees, +40 degrees < psi < +180 degrees or -180 degrees < psi < -120 degrees.

beta_bulge_loop [SO_0001108]

A motif of three residues within a beta-sheet consisting of two H-bonds. Beta bulge loops often occur at the loop ends of beta-hairpins.

beta_bulge_loop_five [SO_0001109]

A motif of three residues within a beta-sheet consisting of two H-bonds in which: the main-chain NH of residue(i) is H-bonded to the main-chain CO of residue(i+4), the main-chain CO of residue i is H-bonded to the main-chain NH of residue(i+3), these loops have an RL nest at residues i+2 and i+3.

beta_bulge_loop_six [SO_0001110]

A motif of three residues within a beta-sheet consisting of two H-bonds in which: the main-chain NH of residue(i) is H-bonded to the main-chain CO of residue(i+5), the main-chain CO of residue i is H-bonded to the main-chain NH of residue(i+4), these loops have an RL nest at residues i+3 and i+4.

beta_turn [SO_0001133]

A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles of the second and third residues, which are the basis for sub-categorization.

beta_turn_left_handed_type_one [SO_0001134]

Left handed type I:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles:- Residue(i+1): -140 degrees > phi > -20 degrees, -90 degrees > psi > +40 degrees. Residue(i+2): -140 degrees > phi > -20 degrees, -90 degrees > psi > +40 degrees.

beta_turn_left_handed_type_two [SO_0001135]

Left handed type II: A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees > phi > -20 degrees, +80 degrees > psi > +180 degrees. Residue(i+2): +20 degrees > phi > +140 degrees, -40 degrees > psi > +90 degrees.

beta_turn_right_handed_type_one [SO_0001136]

Right handed type I:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees. Residue(i+2): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.

beta_turn_right_handed_type_two [SO_0001137]

Right handed type II:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees < phi < -20 degrees, +80 degrees < psi < +180 degrees. Residue(i+2): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.

beta_turn_type_eight [SO_0001155]

A motif of four consecutive peptide residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -60 degrees, psi ~ -30 degrees. Residue(i+2): phi ~ -120 degrees, psi ~ 120 degrees.

beta_turn_type_six [SO_0001150]

A motif of four consecutive peptide resides of type VIa or type VIb and where the i+2 residue is cis-proline.

beta_turn_type_six_a [SO_0001151]

A motif of four consecutive peptide residues, of which the i+2 residue is proline, and that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -60 degrees, psi ~ 120 degrees. Residue(i+2): phi ~ -90 degrees, psi ~ 0 degrees.

beta_turn_type_six_a_one [SO_0001152]

A type VIa beta turn with the following phi and psi sngles on amino acid residues 2 and 3: phi-2 = -60 degrees, psi-2 = 120 degrees, phi-3 = -90 degrees, psi-3 = 0 degrees.

beta_turn_type_six_a_two [SO_0001153]

A type VIa beta turn with the following phi and psi sngles on amino acid residues 2 and 3: phi-2 = -120 degrees, psi-2 = 120 degrees, phi-3 = -60 degrees, psi-3 = 0 degrees.

beta_turn_type_six_b [SO_0001154]

A motif of four consecutive peptide residues, of which the i+2 residue is proline, and that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -120 degrees, psi ~ 120 degrees. Residue(i+2): phi ~ -60 degrees, psi ~ 0 degrees.

bidirectional_gene_fusion [SO_0002086]

A sequence variant whereby two genes, on alternate strands have become joined. Requested by SNPEFF team. Feb 2016.

bidirectional_promoter [SO_0000568]

A promoter that can allow for transcription in both directions. Definition updated in Aug 2020 by Dave Sant.

binding_site [SO_0000409]

A biological_region of sequence that, in the molecule, interacts selectively and non-covalently with other molecules. A region on the surface of a molecule that may interact with another molecule. When applied to polypeptides: Amino acids involved in binding or interactions. It can also apply to an amino acid bond which is represented by the positions of the two flanking amino acids. See GO:0005488 : binding.

biochemical_region_of_peptide [SO_0100001]

A region of a peptide that is involved in a biochemical function. Range.

biological_region [SO_0001411]

A region defined by its disposition to be involved in a biological process.

biomaterial_region [SO_0001409]

A region which is intended for use in an experiment.

bipartite_duplication [SO_1000149]

An interchromosomal mutation whereby the (large) region between the first two breaks listed is lost, and the two flanking segments (one of them centric) are joined as a translocation to the free ends resulting from the third break.

bipartite_inversion [SO_1000151]

A chromosomal inversion caused by three breaks in the same chromosome; both central segments are inverted in place (i.e., they are not transposed).

blocked_reading_frame [SO_0000718]

A reading_frame that is interrupted by one or more stop codons; usually identified through inter-genomic sequence comparisons. Term requested by Rama from SGD.

blunt_end_restriction_enzyme_cleavage_junction [SO_0001693]

A restriction enzyme cleavage site where both strands are cut at the same position.

blunt_end_restriction_enzyme_cleavage_site [SO_0001691]

A restriction enzyme recognition site that, when cleaved, results in no overhangs.

bound_by_factor [SO_0000277]

An attribute describing a sequence that is bound by another molecule. Formerly called transcript_by_bound_factor.

bound_by_nucleic_acid [SO_0000876]

An attribute describing a sequence that is bound by a nucleic acid.

bound_by_protein [SO_0000875]

An attribute describing a sequence that is bound by a protein.

boundary_element [SO_0002020]

Boundary elements are DNA motifs that prevent heterochromatin from spreading into neighboring euchromatic regions. Requested by Antonia Lock. Insulator is included as a related synonym since this is used to refer to insulator in the literature (NCBI:cf).

branch_site [SO_0000611]

A pyrimidine rich sequence near the 3’ end of an intron to which the 5’end becomes covalently bound during nuclear splicing. The resulting structure resembles a lariat.

BREd_motif [SO_0001663]

A core RNA polymerase II promoter element with consensus (G/A)T(T/G/A)(T/A)(G/T)(T/G)(T/G).

BREu_motif [SO_0000016]

A sequence element characteristic of some RNA polymerase II promoters, located immediately upstream of some TATA box elements at -37 to -32 with respect to the TSS (+1). Consensus sequence is (G|C)(G|C)(G|A)CGCC. Binds TFIIB. Binds TFIIB.

Bruno_response_element [SO_0001181]

A cis-acting element found in the 3’ UTR of some mRNA which is bound by the Drosophila Bruno protein and its homologs. Not to be confused with BRE_motif (SO:0000016), which binds transcription factor II B.

C_box [SO_0000622]

An RNA polymerase III type 1 promoter with consensus sequence CAnnCCn.

C_cluster [SO_0000558]

Genomic DNA of immunoglobulin/T-cell receptor gene including more than one C-gene.

C_D_box_snoRNA [SO_0000593]

Most box C/D snoRNAs also contain long (>10 nt) sequences complementary to rRNA. Boxes C and D, as well as boxes C’ and D’, are usually located in close proximity, and form a structure known as the box C/D motif. This motif is important for snoRNA stability, processing, nucleolar targeting and function. A small number of box C/D snoRNAs are involved in rRNA processing; most, however, are known or predicted to serve as guide RNAs in ribose methylation of rRNA. Targeting involves direct base pairing of the snoRNA at the rRNA site to be modified and selection of a rRNA nucleotide a fixed distance from box D or D'.

C_D_box_snoRNA_encoding [SO_0000585]

snoRNA that is associated with guiding methylation of nucleotides. It contains two short conserved sequence motifs: C (RUGAUGA) near the 5-prime end and D (CUGA) near the 3-prime end.

C_D_box_snoRNA_primary_transcript [SO_0000595]

A primary transcript encoding a small nucleolar RNA of the box C/D family.

C_gene_segment [SO_0000478]

Genomic DNA of immunoglobulin/T-cell receptor gene including C-region (and introns if present) with 5’ UTR (SO:0000204) and 3’ UTR (SO:0000205).

C_region [SO_0001834]

The constant region of an immunoglobulin polypeptide sequence.

c_terminal_region [SO_0100015]

The more polar, carboxy-terminal region of the signal peptide (approx 3-7 aa).

C_to_A_transversion [SO_1000019]

A transversion from cytidine to adenine.

C_to_G_transversion [SO_1000020]

A transversion of a cytidine to a guanine.

C_to_T_transition [SO_1000011]

A transition of a cytidine to a thymine.

C_to_T_transition_at_pCpG_site [SO_1000012]

The transition of cytidine to thymine occurring at a pCpG site as a consequence of the spontaneous deamination of 5’-methylcytidine.

CAGE_cluster [SO_0001917]

A kind of transcription_initiation_cluster defined by the clustering of CAGE tags on a sequence region.

CAGE_tag [SO_0001916]

A CAGE tag is a sequence tag hat corresponds to 5’ ends of mRNA at cap sites, produced by cap analysis gene expression and used to identify transcriptional start sites.

candidate_gene [SO_0001867]

A gene suspected of being involved in the expression of a trait. Requested by Bayer Cropscience December, 2011.

canonical_five_prime_splice_site [SO_0000677]

The canonical 5’ splice site has the sequence “GT”.

canonical_three_prime_splice_site [SO_0000676]

The canonical 3’ splice site has the sequence “AG”.

cap [SO_0000581]

A structure consisting of a 7-methylguanosine in 5’-5’ triphosphate linkage with the first nucleotide of an mRNA. It is added post-transcriptionally, and is not encoded in the DNA.

capped [SO_0000146]

An attribute describing when a sequence, usually an mRNA is capped by the addition of a modified guanine nucleotide at the 5’ end.

capped_mRNA [SO_0000862]

An mRNA that is capped.

capped_primary_transcript [SO_0000861]

A primary transcript that is capped.

CArG_box [SO_0002156]

A promoter element bound by the MADS family of transcription factors with consensus 5’-(C/T)TA(T/A)4TA(G/A)-3’. Requested by Antonia Lock

cassette_array_member [SO_0005847]

A gene that is a member of a gene cassette, which is a mobile genetic element.

cassette_pseudogene [SO_0001434]

A cassette pseudogene is a kind of gene in an inactive form which may recombine at a telomeric locus to form a functional copy. Requested by the Trypanosome community.

catalytic_residue [SO_0001104]

Amino acid involved in the activity of an enzyme. Discrete.

catmat_left_handed_four [SO_0100005]

A motif of 4 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i psi -10 bounds -50 to 30, res i+1: phi -90 bounds -120 to -60, res i+1: psi -10 bounds -50 to 30, res i+2: phi -75 bounds -100 to -50, res i+2: psi 140 bounds 110 to 170. The extra restriction of the length of the O to O distance is similar, that it be less than 5 Angstrom. In this case these two Oxygen atoms are the main chain carbonyl oxygen atoms of residues i-1 and i+2.

catmat_left_handed_three [SO_0100004]

A motif of 3 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -75 bounds -100 to -50, res i+1: psi 140 bounds 110 to 170. An extra restriction of the length of the O to O distance would be useful, that it be less than 5 Angstrom. More precisely these two oxygens are the main chain carbonyl oxygen atoms of residues i-1 and i+1.

catmat_right_handed_four [SO_0100007]

A motif of 4 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -90 bounds -120 to -60, res i+1: psi -10 bounds -50 to 30, res i+2: phi -75 bounds -100 to -50, res i+2: psi 140 bounds 110 to 170. The extra restriction of the length of the O to O distance is similar, that it be less than 5 Angstrom. In this case these two Oxygen atoms are the main chain carbonyl oxygen atoms of residues i-1 and i+2.

catmat_right_handed_three [SO_0100006]

A motif of 3 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -75 bounds -100 to -50, res i+1: psi 140 bounds 110 to 170. An extra restriction of the length of the O to O distance would be useful, that it be less than 5 Angstrom. More precisely these two oxygens are the main chain carbonyl oxygen atoms of residues i-1 and i+1.

CCA_tail [SO_0001175]

Base sequence at the 3’ end of a tRNA. The 3’-hydroxyl group on the terminal adenosine is the attachment point for the amino acid.

CCAAT_motif [SO_0001856]

A promoter element with consensus sequence CCAAT, bound by a protein complex that represses transcription in response to low iron levels.

cDNA_clone [SO_0000317]

Complementary DNA; A piece of DNA copied from an mRNA and spliced into a vector for propagation in a suitable host. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

cDNA_match [SO_0000689]

A match against cDNA sequence.

CDRE_motif [SO_0001865]

An RNA polymerase II promoter element found in the promoters of genes regulated by calcineurin. The consensus sequence is GNGGCKCA.

CDS [SO_0000316]

A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon.

CDS_extension [SO_0002227]

A sequence variant extending the CDS, that causes elongation of the resulting polypeptide sequence. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)

CDS_five_prime_extension [SO_0002228]

A sequence variant extending the CDS at the 5’ end, that causes elongation of the resulting polypeptide sequence at the N terminus. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)

CDS_fragment [SO_0001384]

A portion of a CDS that is not the complete CDS.

CDS_independently_known [SO_1001246]

A CDS with the evidence status of being independently known.

CDS_predicted [SO_1001254]

A CDS that is predicted.

CDS_region [SO_0000851]

A region of a CDS.

CDS_supported_by_domain_match_data [SO_1001249]

A CDS that is supported by domain similarity.

CDS_supported_by_EST_or_cDNA_data [SO_1001259]

A CDS that is supported by similarity to EST or cDNA data.

CDS_supported_by_peptide_spectrum_match [SO_0002071]

A CDS that is supported by proteomics data.

CDS_supported_by_sequence_similarity_data [SO_1001251]

A CDS that is supported by sequence similarity data.

CDS_three_prime_extension [SO_0002229]

A sequence variant extending the CDS at the 3’ end, that causes elongation of the resulting polypeptide sequence at the C terminus. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)

central_hydrophobic_region_of_signal_peptide [SO_0100016]

The central, hydrophobic region of the signal peptide (approx 7-15 aa).

centromere [SO_0000577]

A region of chromosome where the spindle fibers attach during mitosis and meiosis.

centromere_DNA_Element_I [SO_0001493]

A centromere DNA Element I (CDEI) is a conserved region, part of the centromere, consisting of a consensus region composed of 8-11bp which enables binding by the centromere binding factor 1(Cbf1p). This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.

centromere_DNA_Element_II [SO_0001494]

A centromere DNA Element II (CDEII) is part a conserved region of the centromere, consisting of a consensus region that is AT-rich and ~ 75-100 bp in length. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.

centromere_DNA_Element_III [SO_0001495]

A centromere DNA Element I (CDEI) is a conserved region, part of the centromere, consisting of a consensus region that consists of a 25-bp which enables binding by the centromere DNA binding factor 3 (CBF3) complex. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.

centromeric_repeat [SO_0001797]

A repeat region found within the modular centromere.

chimeric_cDNA_clone [SO_0000810]

A cDNA clone invalidated because it is chimeric.

ChIP_seq_region [SO_0001697]

A region of sequence identified by CHiP seq technology to contain a protein binding site.

chloroplast_chromosome [SO_0000820]

A chromosome originating in a chloroplast.

chloroplast_DNA [SO_0001033]

DNA belonging to the genome of a chloroplast, a photosynthetic plastid. This term is used by MO.

chloroplast_DNA_read [SO_0001930]

A sequencer read of a chloroplast DNA sample. Requested by Bayer Cropscience, October, 2012.

chloroplast_sequence [SO_0000745]

DNA belonging to the genome of a chloroplast, a green plastid for photosynthesis.

chromoplast_chromosome [SO_0000821]

A chromosome originating in a chromoplast.

chromoplast_gene [SO_0000093]

A gene from chromoplast_sequence.

chromoplast_sequence [SO_0000744]

DNA belonging to the genome of a chromoplast, a colored plastid for synthesis and storage of pigments.

chromosomal_deletion [SO_1000029]

An incomplete chromosome.

chromosomal_regulatory_element [SO_0000626]

Regions of the chromosome that are important for regulating binding of chromosomes to the nuclear matrix.

chromosomal_structural_element [SO_0000628]

Regions of the chromosome that are important for structural elements.

chromosomal_transposition [SO_0000453]

A chromosome structure variant whereby a region of a chromosome has been transferred to another position. Among interchromosomal rearrangements, the term transposition is reserved for that class in which the telomeres of the chromosomes involved are coupled (that is to say, form the two ends of a single DNA molecule) as in wild-type.

chromosomal_variation_attribute [SO_0001509]

An attribute of a change in the structure or number of a chromosomes.

chromosomally_aberrant_genome [SO_0001524]

When a genome contains an abnormal amount of chromosomes.

chromosome arm [SO_0000105]

A region of the chromosome between the centromere and the telomere. Human chromosomes have two arms, the p arm (short) and the q arm (long) which are separated from each other by the centromere.

chromosome band [SO_0000341]

A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark. “Band’ is a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.

chromosome_breakage_sequence [SO_0000670]

A sequence within the micronuclear DNA of ciliates at which chromosome breakage and telomere addition occurs during nuclear differentiation.

chromosome_breakpoint [SO_0001021]

A chromosomal region that may sustain a double-strand break, resulting in a recombination event.

chromosome_fission [SO_1000141]

A chromosome that occurred by the division of a larger chromosome.

chromosome_number_variation [SO_1000182]

A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number.

chromosome_part [SO_0000830]

A region of a chromosome. This is a manufactured term, that serves the purpose of allow the parts of a chromosome to have an is_a path to the root.

chromosome_structure_variation [SO_1000183]

An alteration of the genome that leads to a change in the structure or number of one or more chromosomes.

chromosome_variation [SO_0000240]

A deviation in chromosome structure or number.

circular [SO_0000988]

A quality of a nucleotide polymer that has no terminal nucleotide residues. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

circular_double_stranded_DNA_chromosome [SO_0000958]

Structural unit composed of a self-replicating, double-stranded, circular DNA molecule.

circular_double_stranded_RNA_chromosome [SO_0000967]

Structural unit composed of a self-replicating, double-stranded, circular RNA molecule.

circular_single_stranded_DNA_chromosome [SO_0000960]

Structural unit composed of a self-replicating, single-stranded, circular DNA molecule.

circular_single_stranded_RNA_chromosome [SO_0000966]

Structural unit composed of a self-replicating, single-stranded, circular DNA molecule.

cis_acting_homologous_chromosome_pairing_region [SO_0002025]

A genome region where chromosome pairing occurs preferentially during homologous chromosome pairing during early meiotic prophase of Meiosis I. Comment: An example of this is the Sme2 locus in fission yeast S. pombe, where is coincident with an ribonuclear complex termed the “Mei2 dot”. This term was Requested by Val Wood, PomBase.

cis_regulatory_frameshift_element [SO_0001427]

A structural region in an RNA molecule which promotes ribosomal frameshifting of cis coding sequence. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

cis_regulatory_module [SO_0000727]

A regulatory region where transcription factor binding sites are clustered to regulate various aspects of transcription activities. (CRMs can be located a few kb to hundreds of kb upstream of the core promoter, in the coding sequence, within introns, or in the untranslated regions (UTR) sequences, and even on a different chromosome). A single gene can be regulated by multiple CRMs to give precise control of its spatial and temporal expression. CRMs function as nodes in large, intertwined regulatory network. CRM DNA accessibility is subject to regulation by dbTFs and transcription co-TFs. Requested by Stephen Grossmann Dec 2004. Changed relationship from has_part SO:0000235 TF_binding site to TF_binding_site is part_of SO:0000727 CRM in response to requests from GREEKC initiative in Aug 2020. Removed 3’ from definition because 5’ UTRs are included as well, notified by Colin Logie of GREEKC. Nov 9 2020. DS Updated name from ‘CRM’ to ‘cis_regulatory_module’ on 08 Feb 2021. See GitHub Issue #526. DS Added final sentence to definition as part of GREEKC Feb 16, 2021. See GitHub Issue #534.

cis_splice_site [SO_0001419]

Intronic 2 bp region bordering exon. A splice_site that adjacent_to exon and overlaps intron.

class_I_RNA [SO_0000990]

Small non-coding RNA (55-65 nt long) containing highly conserved 5’ and 3’ ends (16 and 8 nt, respectively) that are predicted to come together to form a stem structure. Identified in the social amoeba Dictyostelium discoideum and localized in the cytoplasm. Requested by Karen Pilcher - Dictybase. song-Term Tracker-1574577.

class_II_RNA [SO_0000989]

Small non-coding RNA (59-60 nt long) containing 5’ and 3’ ends that are predicted to come together to form a stem structure. Identified in the social amoeba Dictyostelium discoideum and localized in the cytoplasm.

cleaved_for_gpi_anchor_region [SO_0001408]

The C-terminal residues of a polypeptide which are exchanged for a GPI-anchor.

cleaved_initiator_methionine [SO_0000691]

The initiator methionine that has been cleaved from a mature polypeptide sequence.

cleaved_peptide_region [SO_0100011]

The cleaved_peptide_region is the region of a peptide sequence that is cleaved during maturation. Range.

clip [SO_0000303]

Part of the primary transcript that is clipped off during processing.

clone_end [SO_0001793]

A read from an end of the clone sequence.

clone_insert [SO_0000753]

The region of sequence that has been inserted and is being propagated by the clone.

clone_insert_end [SO_0000103]

The end of the clone insert.

clone_insert_start [SO_0000179]

The start of the clone insert.

cloned_cDNA_insert [SO_0000913]

A clone insert made from cDNA.

cloned_genomic_insert [SO_0000914]

A clone insert made from genomic DNA.

cloned_region [SO_0000785]

The region of sequence that has been inserted and is being propagated by the clone. Added in response to Lynn Crosby. A clone insert may be composed of many cloned regions.

coding_conserved_region [SO_0000332]

Coding region of sequence similarity by descent from a common ancestor.

coding_end [SO_0000327]

The last base to be translated into protein. It does not include the stop codon.

coding_exon [SO_0000195]

An exon whereby at least one base is part of a codon (here, ‘codon’ is inclusive of the stop_codon).

coding_region_of_exon [SO_0001215]

The region of an exon that encodes for protein sequence. An exon containing either a start or stop codon will be partially coding and partially non coding.

coding_sequence_variant [SO_0001580]

A sequence variant that changes the coding sequence.

coding_start [SO_0000323]

The first base to be translated into protein.

coding_transcript_intron_variant [SO_0001969]

A transcript variant occurring within an intron of a coding transcript.

coding_transcript_variant [SO_0001968]

A transcript variant of a protein coding gene.

coding_transcript_with_retained_intron [SO_0002112]

A protein coding transcript containing a retained intron. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

coding_variant_quality [SO_0001814]

An attribute of a coding genomic variant.

codon [SO_0000360]

A set of (usually) three nucleotide bases in a DNA or RNA sequence, which together code for a unique amino acid or the termination of translation and are contained within the CDS.

codon_redefined [SO_0000882]

An attribute describing the alteration of codon meaning.

cointegrated_plasmid [SO_0001045]

A MGE region consisting of two fused plasmids resulting from a replicative transposition event.

common_variant [SO_0001767]

When a variant from the genomic sequence is commonly found in the general population.

compensatory_transcript_secondary_structure_variant [SO_0001597]

A secondary structure variant that compensate for the change made by a previous variant.

complex_3D_structural_variant [SO_0001600]

A sequence variant that changes the resulting polypeptide structure.

complex_change_of_translational_product_variant [SO_0001602]

A variant that changes the translational product with respect to the reference.

complex_chromosomal_rearrangement [SO_0002062]

A contiguous cluster of translocations, usually the result of a single catastrophic event such as chromothripsis or chromoanasynthesis.

complex_structural_alteration [SO_0001784]

A structural sequence alteration or rearrangement encompassing one or more genome fragments, with 4 or more breakpoints.

complex_substitution [SO_1000005]

When no simple or well defined DNA mutation event describes the observed DNA change, the keyword “complex” should be used. Usually there are multiple equally plausible explanations for the change.

complex_transcript_variant [SO_0001577]

A transcript variant with a complex INDEL- Insertion or deletion that spans an exon/intron border or a coding sequence/UTR border. EBI term: Complex InDel - Insertion or deletion that spans an exon/intron border or a coding sequence/UTR border.

compositionally_biased_region_of_peptide [SO_0001066]

Polypeptide region that is rich in a particular amino acid or homopolymeric and greater than three residues in length. Range.

compound_chromosome [SO_1000042]

A chromosome structure variant where a monocentric element is caused by the fusion of two chromosome arms.

compound_chromosome_arm [SO_0000060]

One arm of a compound chromosome. FLAG - this term is should probably be a part of rather than an is_a.

conformational_change_variant [SO_0001601]

A sequence variant in the CDS region that causes a conformational change in the resulting polypeptide sequence.

conformational_switch [SO_0001422]

A region of a polypeptide, involved in the transition from one conformational state to another. MM Young, K Kirshenbaum, KA Dill & S Highsmith. Predicting conformational switches in proteins. Protein Science, 1999, 8, 1752-64. K. Kirshenbaum, M.M. Young and S. Highsmith. Predicting Allosteric Switches in Myosins. Protein Science 8(9):1806-1815. 1999.

conjugative_transposon [SO_0000371]

A transposon that encodes function required for conjugation.

consensus [SO_0000993]

A sequence produced from an aligment algorithm that uses multiple sequences as input. Term added Dec 06 to comply with mapping to MGED terms. It should be used to generate consensus regions. The specific cross product terms they require are consensus_region and consensus_mRNA.

consensus_AFLP_fragment [SO_0001991]

A consensus AFLP fragment is an AFLP sequence produced from any alignment algorithm which uses assembled multiple AFLP sequences as input. Requested by Bayer Cropscience September, 2013.

consensus_gDNA [SO_0001931]

Genomic DNA sequence produced from some base calling or alignment algorithm which uses aligned or assembled multiple gDNA sequences as input. Requested by Bayer Cropscience November, 2012.

consensus_mRNA [SO_0000995]

An mRNA sequence produced from an aligment algorithm that uses multiple sequences as input. DO not obsolete without considering MGED mapping.

consensus_region [SO_0000994]

A region that has a known consensus sequence. DO not obsolete without considering MGED mapping.

conservative_amino_acid_substitution [SO_0001607]

A sequence variant of a codon causing the substitution of a similar amino acid for another in the resulting polypeptide.

conservative_inframe_deletion [SO_0001825]

An inframe decrease in cds length that deletes one or more entire codons from the coding sequence but does not change any remaining codons.

conservative_inframe_insertion [SO_0001823]

An inframe increase in cds length that inserts one or more codons into the coding sequence between existing codons.

conservative_missense_variant [SO_0001585]

A sequence variant whereby at least one base of a codon is changed resulting in a codon that encodes for a different but similar amino acid. These variants may or may not be deleterious.

conserved [SO_0000856]

A region that is similar or identical across more than one species.

conserved_intergenic_variant [SO_0002017]

A sequence variant located in a conserved intergenic region, between genes. Requested by Uma Paila (UVA) for snpEff.

conserved_intron_variant [SO_0002018]

A transcript variant occurring within a conserved region of an intron. Requested by Uma Paila (UVA) for snpEff.

conserved_region [SO_0000330]

Region of sequence similarity by descent from a common ancestor.

constitutive_promoter [SO_0002050]

A promoter that allows for continual transcription of gene.

contig_collection [SO_0001462]

A collection of contigs. See tracker ID: 2138359.

contig_read [SO_0000476]

A DNA sequencer read which is part of a contig.

copy_number_change [SO_0001563]

A sequence variant where copies of a feature (CNV) are either increased or decreased.

copy_number_decrease [SO_0001912]

A sequence variant where copies of a feature are decreased relative to the reference.

copy_number_gain [SO_0001742]

A sequence alteration whereby the copy number of a given regions is greater than the reference sequence.

copy_number_increase [SO_0001911]

A sequence variant where copies of a feature are increased relative to the reference.

copy_number_loss [SO_0001743]

A sequence alteration whereby the copy number of a given region is less than the reference sequence.

copy_number_variation [SO_0001019]

A variation that increases or decreases the copy number of a given region.

core_eukaryotic_promoter_element [SO_0001660]

An element that only exists within the promoter region of a eukaryotic gene.

core_promoter_element [SO_0002309]

An element that always exists within the promoter region of a gene. When multiple transcripts exist for a gene, the separate transcripts may have separate core_promoter_elements. Added by Dave to be consistent with other ontologies updated with GREEKC initiative.

cosmid [SO_0000156]

A cloning vector that is a hybrid of lambda phages and a plasmid that can be propagated as a plasmid or packaged as a phage,since they retain the lambda cos sites. Paper: vans GA et al. High efficiency vectors for cosmid microcloning and genomic analysis. Gene 1989; 79(1):9-20. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

CRE [SO_0001843]

MERGED DEFINITION: TARGET DEFINITION: A promoter element with consensus sequence TGACGTCA; bound by the ATF/CREB family of transcription factors. ——————– SOURCE DEFINITION: A promoter element that contains a core sequence TGACGT, bound by a protein complex that regulates transcription of genes encoding PKA pathway components. New synonym Atf1/Pcr1 recognition motif added in response to Antonia Lock GitHub Issue Request #437, PMID:15716492

CRISPR [SO_0001459]

Clustered Palindromic Repeats interspersed with bacteriophage derived spacer sequences.

cross_genome_match [SO_0000177]

A nucleotide match against a sequence from another organism.

cryptic [SO_0000976]

A feature_attribute describing a feature that is not manifest under normal conditions.

cryptic_gene [SO_0001431]

A gene that is not transcribed under normal conditions and is not critical to normal cellular functioning.

cryptic_prophage [SO_0001007]

A remnant of an integrated prophage in the host genome or an “island” in the host genome that includes phage like-genes. This is not cryptic in the same sense as a cryptic gene or cryptic splice site.

cryptic_splice_acceptor [SO_0001570]

A sequence variant whereby a new splice site is created due to the activation of a new acceptor.

cryptic_splice_donor [SO_0001571]

A sequence variant whereby a new splice site is created due to the activation of a new donor.

cryptic_splice_site [SO_0001533]

A splice site that is in part of the transcript not normally spliced. They occur via mutation or transcriptional error.

cryptic_splice_site_variant [SO_0001569]

A sequence variant causing a new (functional) splice site.

cryptogene [SO_1001196]

A maxicircle gene so extensively edited that it cannot be matched to its edited mRNA sequence.

CSL_response_element [SO_0001839]

A promoter element with consensus sequence GTGRGAA, bound by CSL (CBF1/RBP-JK/Suppressor of Hairless/LAG-1) transcription factors.

CsrB_RsmB_RNA [SO_0000377]

An enterobacterial RNA that binds the CsrA protein. The CsrB RNAs contain a conserved motif CAGGXXG that is found in up to 18 copies and has been suggested to bind CsrA. The Csr regulatory system has a strong negative regulatory effect on glycogen biosynthesis, glyconeogenesis and glycogen catabolism and a positive regulatory effect on glycolysis. In other bacteria such as Erwinia caratovara the RsmA protein has been shown to regulate the production of virulence determinants, such extracellular enzymes. RsmA binds to RsmB regulatory RNA which is also a member of this family.

ct_gene [SO_0000092]

A gene from chloroplast sequence.

CTCF_binding_site [SO_0001974]

A transcription factor binding site with consensus sequence CCGCGNGGNGGCAG, bound by CCCTF-binding factor.

CTG_start_codon [SO_1001273]

A non-canonical start codon of sequence CTG.

CuRE [SO_0001844]

A promoter element bound by copper ion-sensing transcription factors such as S. cerevisiae Mac1p or S. pombe Cuf1; the consensus sequence is HTHNNGCTGD (more specifically TTTGCKCR in budding yeast).

cyanelle_chromosome [SO_0000822]

A chromosome originating in a cyanelle.

cyanelle_gene [SO_0000094]

A gene from cyanelle sequence.

cyanelle_sequence [SO_0000746]

DNA belonging to the genome of a cyanelle, a photosynthetic plastid found in algae.

cyclic_translocation [SO_1000150]

A chromosomal translocation whereby three breaks occurred in three different chromosomes. The centric segment resulting from the first break listed is joined to the acentric segment resulting from the second, rather than the third.

cysteine [SO_0001447]

A polar amino acid encoded by the codons TGT and TGC. A place holder for a cross product with chebi.

cysteine_tRNA_primary_transcript [SO_0000215]

A primary transcript encoding cysteinyl tRNA (SO:0000258).

cysteinyl_tRNA [SO_0000258]

A tRNA sequence that has a cysteine anticodon, and a 3’ cysteine binding region.

cytoplasmic_polypeptide_region [SO_0001073]

Polypeptide region that is localized inside the cytoplasm.

cytosolic_16S_rRNA [SO_0001000]

Cytosolic 16S rRNA is an RNA component of the small subunit of cytosolic ribosomes in prokaryotes. Renamed to cytosolic_16S_rRNA from rRNA_16S on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.

cytosolic_23S_rRNA [SO_0001001]

Cytosolic 23S rRNA is an RNA component of the large subunit of cytosolic ribosomes in prokaryotes. Renamed from rRNA_23S to cytosolic_23S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.

cytosolic_5S_rRNA [SO_0000652]

Cytosolic 5S rRNA is an RNA component of the large subunit of cytosolic ribosomes in both prokaryotes and eukaryotes. Renamed from rRNA_5S to cytosolic_5S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.

cytosolic_LSU_rRNA [SO_0000651]

Cytosolic LSU rRNA is an RNA component of the large subunit of cytosolic ribosomes. Renamed to cytosolic_LSU_rRNA from large_subunit_rRNA on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.

cytosolic_LSU_rRNA_gene [SO_0002361]

A gene that codes for cytosolic LSU rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA [SO_0002343]

Cytosolic rRNA is an RNA component of the small or large subunits of cytosolic ribosomes. Added as a request from EBI. See GitHub Issue #493

cytosolic_rRNA_16S_gene [SO_0002237]

A gene which codes for 16S_rRNA, which functions as the small subunit of the ribosome in prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_18S_gene [SO_0002236]

A gene which codes for 18S_rRNA, which functions as the small subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_23S_gene [SO_0002243]

A gene which codes for 23S_rRNA, which functions as a component of the large subunit of the ribosome in prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_25S_gene [SO_0002242]

A gene which codes for 25S_rRNA, which functions as a component of the large subunit of the ribosome in some eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_28S_gene [SO_0002239]

A gene which codes for 28S_rRNA, which functions as a component of the large subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_5_8S_gene [SO_0002240]

A gene which codes for 5_8S_rRNA (5.8S rRNA), which functions as a component of the large subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_5S_gene [SO_0002238]

A gene which codes for 5S_rRNA, which is a portion of the large subunit of the ribosome in both eukaryotes and prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_gene [SO_0002360]

A gene that codes for cytosolic rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_SSU_rRNA [SO_0000650]

Cytosolic SSU rRNA is an RNA component of the small subunit of cytosolic ribosomes. Renamed to cytosolic_SSU_rRNA from small_subunit_rRNA on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.

cytosolic_SSU_rRNA_gene [SO_0002362]

A gene that codes for cytosolic SSU rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

D_cluster [SO_0000559]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including more than one D-gene.

D_DJ_C_cluster [SO_0000504]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene and one C-gene.

D_DJ_cluster [SO_0000505]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene and one DJ-gene.

D_DJ_J_C_cluster [SO_0000506]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene, one J-gene and one C-gene.

D_DJ_J_cluster [SO_0000508]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene, and one J-gene.

D_gene_recombination_feature [SO_0000492]

Recombination signal including D-heptamer, D-spacer and D-nonamer in 5’ of D-region of a D-gene or D-sequence.

D_gene_segment [SO_0000458]

Germline genomic DNA including D-region with 5’ UTR and 3’ UTR, also designated as D-segment.

D_J_C_cluster [SO_0000509]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one D-gene, one J-gene and one C-gene.

D_J_cluster [SO_0000560]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one D-gene and one J-gene.

DArT_marker [SO_0001646]

A genetic marker, discovered using Diversity Arrays Technology (DArT) technology.

databank_entry [SO_2000061]

The sequence referred to by an entry in a databank such as GenBank or SwissProt.

dCAPS_primer [SO_0001699]

A primer with one or more mismatches to the DNA template corresponding to a position within a restriction enzyme recognition site.

DCE [SO_0001664]

A discontinuous core element of RNA polymerase II transcribed genes, situated downstream of the TSS. It is composed of three sub elements: SI, SII and SIII.

DCE_SI [SO_0001665]

A sub element of the DCE core promoter element, with consensus sequence CTTC.

DCE_SII [SO_0001666]

A sub element of the DCE core promoter element with consensus sequence CTGT.

DCE_SIII [SO_0001667]

A sub element of the DCE core promoter element with consensus sequence AGC.

DDB_box [SO_0001804]

A conserved polypeptide motif that mediates protein-protein interaction and defines adaptor proteins for DDB1/cullin 4 ubiquitin ligases. Note: PMID:18794354 describes the DDB box, and has lots of alignments, but doesn’t actually come out with a consensus sequence.

de_novo_variant [SO_0001781]

A variant arising in the offspring that is not found in either of the parents.

decayed_exon [SO_0000464]

A non-functional descendant of an exon. Does not have to be part of a pseudogene.

decreased_polyadenylation_variant [SO_0001803]

A transcript processing variant whereby polyadenylation of the encoded transcript is decreased with respect to the reference. Term requested by M. Dumontier, June 1 2011.

decreased_transcript_level_variant [SO_0001541]

A sequence variant that decreases the level of mature, spliced and processed RNA with respect to a reference sequence.

decreased_transcript_stability_variant [SO_0001547]

A sequence variant that decreases transcript stability with respect to a reference sequence.

decreased_transcription_rate_variant [SO_0001552]

A sequence variant that decreases the rate of transcription with respect to a reference sequence.

decreased_translational_product_level [SO_0001555]

A sequence variant which decreases the translational product level with respect to a reference sequence.

defective_conjugative_transposon [SO_0001049]

An island that contains genes for integration/excision and the gene and site for the initiation of intercellular transfer by conjugation. It can be complemented for transfer by a conjugative transposon.

deficient_interchromosomal_transposition [SO_0000063]

An interchromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining.

deficient_intrachromosomal_transposition [SO_0000062]

An intrachromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining.

deficient_inversion [SO_1000171]

A chromosomal deletion whereby three breaks occur in the same chromosome; one central region is lost, and the other is inverted.

deficient_translocation [SO_1000147]

A chromosomal deletion whereby a translocation occurs in which one of the four broken ends loses a segment before re-joining.

deletion [SO_0000159]

The point at which one or more contiguous nucleotides were excised.

deletion_breakpoint [SO_0001415]

The point within a chromosome where a deletion begins or ends.

deletion_junction [SO_0000687]

The space between two bases in a sequence which marks the position where a deletion has occurred.

delins [SO_1000032]

A sequence alteration which included an insertion and a deletion, affecting 2 or more bases. Indels can have a different number of bases than the corresponding reference sequence. The term name was changed from indel to delins on 2/24/2019 to align with the HGVS nomenclature term for a deletion-insertion. Indel was causing confusion in the annotation community (github issue 445). The HGVS nomenclature definition of deletion-insertion (delins) is a sequence change where, compared to a reference sequence, one or more nucleotides are replaced by one or more other nucleotides and which is not a substitution, inversion or conversion. Indels can have a different number of bases than the corresponding reference sequence.

designed_sequence [SO_0000546]

An oligonucleotide sequence that was designed by an experimenter that may or may not correspond with any natural sequence.

destruction_box [SO_0001805]

A conserved polypeptide motif that can be recognized by both Fizzy/Cdc20- and FZR/Cdh1-activated anaphase-promoting complex/cyclosome (APC/C) and targets a protein for ubiquitination and subsequent degradation by the APC/C. The consensus sequence is RXXLXXXXN.

dextrosynaptic_chromosome [SO_1000142]

An autosynaptic chromosome carrying the two right (D = dextro) telomeres. Corrected spelling from dexstrosynaptic_chromosome to dextrosynaptic_chromosome on April 14, 2020 in response to GitHub request #447

dg_repeat [SO_0001898]

A repeat region which is part of the regional centromere outer repeat region. For the S. pombe project - requested by Val Wood.

dh_repeat [SO_0001899]

A repeat region which is part of the regional centromere outer repeat region. For the S. pombe project - requested by Val Wood.

DHU_loop [SO_0001176]

Non-base-paired sequence of nucleotide bases in tRNA. It contains several dihydrouracil residues.

dicistronic [SO_0000879]

An attribute describing a sequence that contains the code for two gene products.

dicistronic_mRNA [SO_0000716]

An mRNA that has the quality dicistronic.

dicistronic_primary_transcript [SO_1001197]

A primary transcript that has the quality dicistronic.

dicistronic_transcript [SO_0000079]

A transcript that is dicistronic.

dif_site [SO_0000949]

A site at which replicated bacterial circular chromosomes are decatenated by site specific resolvase.

dihydrouridine [SO_0001228]

A modified RNA base in which the 5,6-dihydrouracil is bound to the ribose ring.

dinucleotide_repeat_microsatellite_feature [SO_0000290]

A region of a repeating dinucleotide sequence (two bases).

diplotype [SO_0001028]

A diplotype is a pair of haplotypes from a given individual. It is a genotype where the phase is known.

direct [SO_0001514]

A quality of an insertion where the insert is not in a cytologically inverted orientation.

direct_tandem_duplication [SO_1000039]

A tandem duplication where the individual regions are in the same orientation.

direction_attribute [SO_0001029]

The attribute of whether the sequence is the same direction as a feature (forward) or the opposite direction as a feature (reverse).

disabled_reading_frame [SO_0002048]

A reading frame that could encode a full-length protein but which contains obvious mid-sequence disablements (frameshifts or premature stop codons).

disease_associated_variant [SO_0001771]

A variant that has been found to be associated with disease.

disease_causing_variant [SO_0001772]

A variant that has been found to cause disease.

disruptive_inframe_deletion [SO_0001826]

An inframe decrease in cds length that deletes bases from the coding sequence starting within an existing codon.

disruptive_inframe_insertion [SO_0001824]

An inframe increase in cds length that inserts one or more codons into the coding sequence within an existing codon.

distal_duplication [SO_0001928]

A duplication of the distal region of a chromosome. This term is used by Complete Genomics in the structural variant analysis files.

distal_promoter_element [SO_0001670]

A regulatory promoter element that is distal from the TSS.

distant_three_prime_recoding_signal [SO_1001287]

A recoding signal that is found many hundreds of nucleotides 3’ of a redefined stop codon.

DJ_C_cluster [SO_0000539]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one DJ-gene and one C-gene.

DJ_gene_segment [SO_0000572]

Genomic DNA of immunoglobulin/T-cell receptor gene in partially rearranged genomic DNA including D-J-region with 5’ UTR and 3’ UTR, also designated as D-J-segment.

DJ_J_C_cluster [SO_0000540]

Genomic DNA in rearranged configuration including at least one D-J-GENE, one J-GENE and one C-GENE.

DJ_J_cluster [SO_0000485]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one DJ-gene, and one J-gene.

DMv1_motif [SO_0001165]

A promoter motif with consensus sequence CARCCCT.

DMv2_motif [SO_0001161]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -60 and -45 relative to the TSS. Consensus sequence is MKSYGGCARCGSYSS. Tends to co-occur with DMv3 (SO:0001160). Tends to not occur with DPE motif (SO:0000015) or MTE (SO:0001162).

DMv3_motif [SO_0001160]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -30 and +15 relative to the TSS. Consensus sequence is KNNCAKCNCTRNY. Tends to co-occur with DMv2 (SO:0001161). Tends to not occur with DPE motif (SO:0000015) or MTE (0001162).

DMv4_motif [SO_0001157]

A sequence element characteristic of some RNA polymerase II promoters, located immediately upstream of some TATA box elements with respect to the TSS (+1). Consensus sequence is YGGTCACACTR. Marked spatial preference within core promoter; tend to occur near the TSS, although not as tightly as INR (SO:0000014).

DMv5_motif [SO_0001159]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -50 and -10 relative to the TSS. Consensus sequence is KTYRGTATWTTT. Tends to co-occur with DMv4 (SO:0001157) . Tends to not occur with DPE motif (SO:0000015) or MTE (SO:0001162).

DNA [SO_0000352]

An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a 2-deoxy-D-ribose ring connected to a phosphate backbone.

DNA_aptamer [SO_0000032]

DNA molecules that have been selected from random pools based on their ability to bind other molecules.

DNA_binding_site [SO_0001429]

A binding site that, in the molecule, interacts selectively and non-covalently with DNA.

DNA_chromosome [SO_0000954]

Structural unit composed of a self-replicating, DNA molecule.

DNA_constraint_sequence [SO_0001009]

A double-stranded DNA used to control macromolecular structure and function.

DNA_sequence_secondary_structure [SO_0000142]

A folded DNA sequence.

DNA_transposon [SO_0000182]

A transposon where the mechanism of transposition is via a DNA intermediate.

DNaseI_hypersensitive_site [SO_0000685]

DNA region representing open chromatin structure that is hypersensitive to digestion by DNase I.

DNAzyme [SO_0001012]

A DNA sequence with catalytic activity. Added by request from Colin Batchelor.

dominant_negative_variant [SO_0002052]

A variant where the mutated gene product adversely affects the other (wild type) gene product. Requested by Deanna Church.

double [SO_0000985]

When a nucleotide polymer has two strands that are reverse-complement to one another and pair together. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

double_stranded_cDNA [SO_0000758]

DNA synthesized from RNA by reverse transcriptase that has been copied by PCR to make it double stranded.

double_stranded_DNA_chromosome [SO_0000955]

Structural unit composed of a self-replicating, double-stranded DNA molecule.

double_stranded_RNA_chromosome [SO_0000965]

Structural unit composed of a self-replicating, double-stranded RNA molecule.

downstream_gene_variant [SO_0001632]

A sequence variant located 3’ of a gene. Different groups annotate up and downstream to different lengths. The subtypes are specific and are backed up with cross references.

downstream_transcript_variant [SO_0001987]

A feature variant, where the alteration occurs downstream of the transcript termination site. Requested by Graham Ritchie, EBI/Sanger.

DPE_motif [SO_0000015]

A sequence element characteristic of some RNA polymerase II promoters; Positioned from +28 to +32 with respect to the TSS (+1). Experimental results suggest that the DPE acts in conjunction with the INR_motif to provide a binding site for TFIID in the absence of a TATA box to mediate transcription of TATA-less promoters. Consensus sequence (A|G)G(A|T)(C|T)(G|A|C). Binds TAF6, TAF9.

DPE1_motif [SO_0001164]

A promoter motif with consensus sequence CGGACGT.

DRE [SO_0001845]

A promoter element with consensus sequence CGWGGWNGMM, bound by transcription factors related to RecA and found in promoters of genes expressed following several types of DNA damage or inhibition of DNA synthesis.

DRE_motif [SO_0001156]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -10 and -60 relative to the TSS. Consensus sequence is WATCGATW. This consensus sequence was identified computationally using the MEME algorithm within core promoter sequences from -60 to +40, with an E value of 1.7e-183. Tends to co-occur with Motif 7. Tends to not occur with DPE motif (SO:0000015) or motif 10.

ds_DNA_viral_sequence [SO_0001198]

A ds_DNA_viral_sequence is a viral_sequence that is the sequence of a virus that exists as double stranded DNA.

ds_oligo [SO_0000442]

A double stranded oligonucleotide. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

ds_RNA_viral_sequence [SO_0001169]

A ds_RNA_viral_sequence is a viral_sequence that is the sequence of a virus that exists as double stranded RNA.

DSR_motif [SO_0002005]

The determinant of selective removal (DSR) motif consists of repeats of U(U/C)AAAC. The motif targets meiotic transcripts for removal during mitosis via the exosome. Requested by Antonia Locke, (Pombe).

DsrA_RNA [SO_0000378]

DsrA RNA regulates both transcription, by overcoming transcriptional silencing by the nucleoid-associated H-NS protein, and translation, by promoting efficient translation of the stress sigma factor, RpoS. These two activities of DsrA can be separated by mutation: the first of three stem-loops of the 85 nucleotide RNA is necessary for RpoS translation but not for anti-H-NS action, while the second stem-loop is essential for antisilencing and less critical for RpoS translation. The third stem-loop, which behaves as a transcription terminator, can be substituted by the trp transcription terminator without loss of either DsrA function. The sequence of the first stem-loop of DsrA is complementary with the upstream leader portion of RpoS messenger RNA, suggesting that pairing of DsrA with the RpoS message might be important for translational regulation.

duplicated_pseudogene [SO_0001758]

A pseudogene that arose via gene duplication. Generally duplicated pseudogenes have the same structure as the original gene, including intron-exon structure and some regulatory sequence.

Duplication [SO_1000035]

One or more nucleotides are added between two adjacent nucleotides in the sequence; the inserted sequence derives from, or is identical in sequence to, nucleotides adjacent to insertion point.

duplication_attribute [SO_0001523]

An attribute of a duplication, which is an insertion which derives from, or is identical in sequence to, nucleotides present at a known location in the genome.

dye_terminator_read [SO_0001423]

A read produced by the dye terminator method of sequencing.

E_box_motif [SO_0001158]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -60 and +1 relative to the TSS. Consensus sequence is AWCAGCTGWT. Tends to co-occur with DMv2 (SO:0001161). Tends to not occur with DPE motif (SO:0000015).

early_origin_of_replication [SO_0002140]

An origin of replication that initiates early in S phase.

edited [SO_0000116]

An attribute describing a sequence that is modified by editing.

edited_CDS [SO_0000935]

A CDS that is edited.

edited_mRNA [SO_0000929]

An mRNA that is edited.

edited_transcript [SO_0000873]

A transcript that is edited.

edited_transcript_by_A_to_I_substitution [SO_0000874]

A transcript that has been edited by A to I substitution.

edited_transcript_feature [SO_0000579]

A locatable feature on a transcript that is edited.

editing_block [SO_0000604]

Edited mRNA sequence mediated by a single guide RNA (SO:0000602).

editing_domain [SO_0000606]

Edited mRNA sequence mediated by two or more overlapping guide RNAs (SO:0000602).

editing_variant [SO_0001544]

A transcript processing variant whereby the process of editing is disrupted with respect to the reference.

elongated_in_frame_polypeptide_C_terminal [SO_0001612]

A sequence variant with in the CDS that causes in frame elongation of the resulting polypeptide sequence at the C terminus.

elongated_in_frame_polypeptide_N_terminal_elongation [SO_0001614]

A sequence variant with in the CDS that causes in frame elongation of the resulting polypeptide sequence at the N terminus.

elongated_out_of_frame_polypeptide_C_terminal [SO_0001613]

A sequence variant with in the CDS that causes out of frame elongation of the resulting polypeptide sequence at the C terminus.

elongated_out_of_frame_polypeptide_N_terminal [SO_0001615]

A sequence variant with in the CDS that causes out of frame elongation of the resulting polypeptide sequence at the N terminus.

elongated_polypeptide [SO_0001609]

An elongation of a polypeptide sequence deriving from a sequence variant extending the CDS.

elongated_polypeptide_C_terminal [SO_0001610]

An elongation of a polypeptide sequence at the C terminus deriving from a sequence variant extending the CDS.

elongated_polypeptide_N_terminal [SO_0001611]

An elongation of a polypeptide sequence at the N terminus deriving from a sequence variant extending the CDS.

encodes_1_polypeptide [SO_1001188]

A gene that is alternately spliced, but encodes only one polypeptide.

encodes_alternate_transcription_start_sites [SO_0001241]

A gene that has multiple possible transcription start sites.

encodes_alternately_spliced_transcripts [SO_0000463]

A gene that encodes more than one transcript.

encodes_different_polypeptides_different_stop [SO_1001190]

A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different stop codons.

encodes_disjoint_polypeptides [SO_1001192]

A gene that is alternately spliced, and encodes more than one polypeptide, that do not have overlapping peptide sequences.

encodes_greater_than_1_polypeptide [SO_1001189]

A gene that is alternately spliced, and encodes more than one polypeptide.

encodes_overlapping_peptides [SO_1001195]

A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences.

encodes_overlapping_peptides_different_start [SO_1001191]

A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different start codons.

encodes_overlapping_polypeptides_different_start_and_stop [SO_1001193]

A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different start and stop codons.

endogenous_retroviral_gene [SO_0000100]

A proviral gene with origin endogenous retrovirus.

endogenous_retroviral_sequence [SO_0000903]

Endogenous DNA sequence that are likely to have arisen from retroviruses.

Endogenous_Retrovirus_LTR_retrotransposon [SO_0002268]

Endogenous retrovirus (ERV) retrotransposons are abundant in the genomes of jawed vertebrates. Human ERVs (HERVs) are classified based on their homologies to animal retroviruses. Class I families are similar in sequence to mammalian Gammaretroviruses (type C) and Epsilonretroviruses (Type E). Class II families show homology to mammalian Betaretroviruses (Type B) and Deltaretroviruses (Type D). F-Class III families are similar to foamy viruses. Added as per GitHub Issue Request #488 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/488)

endonuclease_spliced_intron [SO_0001216]

An intron that spliced via endonucleolytic cleavage and ligation rather than transesterification.

endosomal_localization_signal [SO_0001529]

A polypeptide region that targets a polypeptide to the endosome.

engineered [SO_0000783]

An attribute to describe a region that was modified in vitro.

engineered_episome [SO_0000779]

An episome that is engineered. Requested by Lynn Crosby Jan 2006.

engineered_foreign_gene [SO_0000281]

A gene that is engineered and foreign.

engineered_foreign_region [SO_0000805]

A region that is engineered and foreign.

engineered_foreign_repetitive_element [SO_0000293]

A repetitive element that is engineered and foreign.

engineered_foreign_transposable_element [SO_0000799]

A transposable_element that is engineered and foreign.

engineered_foreign_transposable_element_gene [SO_0000283]

A transposable_element that is engineered and foreign.

engineered_fusion_gene [SO_0000288]

A fusion gene that is engineered.

engineered_gene [SO_0000280]

A gene that is engineered.

engineered_insert [SO_0000915]

A clone insert that is engineered.

engineered_region [SO_0000804]

A region that is engineered.

engineered_rescue_region [SO_0000794]

A rescue region that is engineered.

engineered_tag [SO_0000807]

A tag that is engineered.

engineered_transposable_element [SO_0000798]

TE that has been modified by manipulations in vitro.

enhancer [SO_0000165]

A cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. An enhancer may participate in an enhanceosome GO:0034206. A protein-DNA complex formed by the association of a distinct set of general and specific transcription factors with a region of enhancer DNA. The cooperative assembly of an enhanceosome confers specificity of transcriptional regulation. This comment is a place holder should we start to make cross products with GO.

enhancer_binding_site [SO_0001461]

A binding site that, in the enhancer region of a nucleotide molecule, interacts selectively and non-covalently with polypeptide residues.

enhancer_bound_by_factor [SO_0000166]

An enhancer bound by a factor.

enhancer_trap_construct [SO_0001479]

An enhancer trap construct is a type of engineered plasmid which is designed to integrate into a genome and express a reporter when the expression from a basic minimal promoter is enhanced by genomic enhancer elements. Enhancer traps contain promoter elements and are not usually mutagenic.

enhancerRNA [SO_0001870]

A short ncRNA that is transcribed from an enhancer. May have a regulatory function.

enzymatic [SO_0001185]

An attribute describing the sequence of a transcript that has catalytic activity with or without an associated ribonucleoprotein. Do not use this for feature annotation. Use enzymatic_RNA (SO:0000372) instead.

enzymatic_RNA [SO_0000372]

An RNA sequence that has catalytic activity with or without an associated ribonucleoprotein. This was moved to be a child of transcript (SO:0000673) because some enzymatic RNA regions are part of primary transcripts and some are part of processed transcripts.

enzymatic_RNA_gene [SO_0002180]

A gene that encodes an enzymatic RNA.

epigenetically_modified [SO_0000133]

This attribute describes a gene where heritable changes other than those in the DNA sequence occur. These changes include: modification to the DNA (such as DNA methylation, the covalent modification of cytosine), and post-translational modification of histones.

epigenetically_modified_gene [SO_0000898]

A gene that is epigenetically modified.

epigenetically_modified_region [SO_0001720]

A biological DNA region implicated in epigenomic changes caused by mechanisms other than changes in the underlying DNA sequence. This includes, nucleosomal histone post-translational modifications, nucleosome depletion to render DNA accessible and post-replicational base modifications such as cytosine modification. Moved from is_a biological_region (SO:0001411) to is_a regulatory_region (SO:0005836) on 11 Feb 2021. GREEKC members pointed out that this would be a more appropriate location. See GitHub Issue #530. 11 Feb 2021 updated definition along with addition of epigenomically_modified_region (SO:0002332). Epigenetically modified region is now not inherited while epigenomically modified region is not annotated as inherited. See GitHub Issue #532 and issue #534.

episome [SO_0000768]

A plasmid that may integrate with a chromosome.

epoxyqueuosine [SO_0001318]

Epoxyqueuosine is a modified 7-deazoguanosine.

ER_retention_signal [SO_0001806]

A C-terminal tetrapeptide motif that mediates retention of a protein in (or retrieval to) the endoplasmic reticulum. In mammals the sequence is KDEL, and in fungi HDEL or DDEL.

EST [SO_0000345]

A tag produced from a single sequencing read from a cDNA clone or PCR product; typically a few hundred base pairs long. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

EST_match [SO_0000668]

A match against an EST sequence.

eukaryotic_promoter [SO_0002221]

A regulatory_region including the Transcription Start Site (TSS) of a gene and serving as a platform for Pre-Initiation Complex (PIC) assembly, enabling transcription of a gene under certain conditions.

eukaryotic_terminator [SO_0000951]

A signal for RNA polymerase to terminate transcription.

exemplar [SO_0000864]

An attribute describing a sequence is representative of a class of similar sequences.

exemplar_mRNA [SO_0000734]

An exemplar is a representative cDNA sequence for each gene. The exemplar approach is a method that usually involves some initial clustering into gene groups and the subsequent selection of a representative from each gene group. Added for the MO people.

exon_junction [SO_0000333]

The boundary between two exons in a processed transcript.

exon_loss_variant [SO_0001572]

A sequence variant whereby an exon is lost from the transcript.

exon_of_single_exon_gene [SO_0005845]

An exon that is the only exon in a gene.

exon_region [SO_0000852]

A region of an exon.

exon_variant [SO_0001791]

A sequence variant that changes exon sequence.

exonic_splice_enhancer [SO_0000683]

Exonic splicing enhancers (ESEs) facilitate exon definition by assisting in the recruitment of splicing factors to the adjacent intron.

exonic_splice_region_variant [SO_0002084]

A sequence variant in which a change has occurred within the exonic region of the splice site, 1-2 bases from boundary.

exonic_splicing_silencer [SO_0002058]

An exonic splicing regulatory element that functions to recruit trans acting splicing factors suppress the transcription of the gene or genes they control. Requested by Javier Diez Perez.

experimental_feature_attribute [SO_0001684]

An attribute of an experimentally derived feature.

experimental_result_region [SO_0000703]

A region of sequence implicated in an experimental result.

experimentally_defined_binding_region [SO_0001696]

A region that has been implicated in binding although the exact coordinates of binding may be unknown.

experimentally_determined [SO_0000312]

Attribute to describe a feature that has been experimentally verified.

expressed_sequence_assembly [SO_0001428]

A sequence assembly derived from expressed sequences. From tracker [ 2372385 ] expressed_sequence_assembly.

expressed_sequence_match [SO_0000102]

A match to an EST or cDNA sequence.

extended_cis_splice_site [SO_0001993]

Intronic positions associated with cis-splicing. Contains the first and second positions immediately before the exon and the first, second and fifth positions immediately after. Added by Andy Menzies (Sanger).

extended_intronic_splice_region [SO_0001996]

Region of intronic sequence within 10 bases of an exon.

extended_intronic_splice_region_variant [SO_0001995]

A sequence variant occurring in the intron, within 10 bases of exon. Added by Andy Menzies (Sanger).

external_transcribed_spacer_region [SO_0000640]

Non-coding regions of DNA that precede the sequence that codes for the ribosomal RNA.

extrachromosomal_mobile_genetic_element [SO_0001038]

An MGE that is not integrated into the host chromosome.

extramembrane_polypeptide_region [SO_0001072]

Polypeptide region that is localized outside of a lipid bilayer. Range.

feature_ablation [SO_0001879]

A sequence variant, caused by an alteration of the genomic sequence, where the deletion, is greater than the extent of the underlying genomic features. Created in conjunction with the EBI.

feature_amplification [SO_0001880]

A sequence variant, caused by an alteration of the genomic sequence, where the structural change, an amplification of sequence, is greater than the extent of the underlying genomic features. Created in conjunction with the EBI.

feature_attribute [SO_0000733]

An attribute describing a located_sequence_feature.

feature_elongation [SO_0001907]

A sequence variant that causes the extension of a genomic feature, with regard to the reference sequence.

feature_fusion [SO_0001882]

A sequence variant, caused by an alteration of the genomic sequence, where a deletion fuses genomic features. Created in conjunction with the EBI.

feature_translocation [SO_0001881]

A sequence variant, caused by an alteration of the genomic sequence, where the structural change, a translocation, is greater than the extent of the underlying genomic features. Created in conjunction with the EBI.

feature_truncation [SO_0001906]

A sequence variant that causes the reduction of a genomic feature, with regard to the reference sequence.

feature_variant [SO_0001878]

A sequence variant that falls entirely or partially within a genomic feature. Created in conjunction with the EBI.

fingerprint_map [SO_0001250]

A fingerprint_map is a physical map composed of restriction fragments.

finished_genome [SO_0001491]

The status of a whole genome sequence, with less than 1 error per 100,000 base pairs.

five_aminomethyl_two_thiouridine [SO_0001363]

5_aminomethyl_2_thiouridine is a modified uridine base feature.

five_carbamoylmethyl_two_prime_O_methyluridine [SO_0001368]

5_carbamoylmethyl_2_prime_O_methyluridine is a modified uridine base feature.

five_carbamoylmethyluridine [SO_0001367]

5_carbamoylmethyluridine is a modified uridine base feature.

five_carboxyhydroxymethyl_uridine [SO_0001358]

5_carboxyhydroxymethyl_uridine is a modified uridine base feature.

five_carboxyhydroxymethyl_uridine_methyl_ester [SO_0001359]

5_carboxyhydroxymethyl_uridine_methyl_ester is a modified uridine base feature.

five_carboxymethylaminomethyl_two_prime_O_methyluridine [SO_0001370]

5_carboxymethylaminomethyl_2_prime_O_methyluridine is a modified uridine base feature.

five_carboxymethylaminomethyl_two_thiouridine [SO_0001371]

5_carboxymethylaminomethyl_2_thiouridine is a modified uridine base feature.

five_carboxymethylaminomethyluridine [SO_0001369]

5_carboxymethylaminomethyluridine is a modified uridine base feature.

five_carboxymethyluridine [SO_0001374]

5_carboxymethyluridine is a modified uridine base feature.

five_formyl_two_prime_O_methylcytidine [SO_0001293]

5-formyl-2’-O-methylcytidine is a modified cytidine.

five_formylcytidine [SO_0001286]

5-formylcytidine is a modified cytidine.

five_hydroxymethylcytidine [SO_0001292]

5-hydroxymethylcytidine is a modified cytidine.

five_hydroxyuridine [SO_0001354]

5_hydroxyuridine is a modified uridine base feature.

five_isopentenylaminomethyl_two_prime_O_methyluridine [SO_0001382]

5_isopentenylaminomethyl_2prime_O_methyluridine is a modified uridine base feature.

five_isopentenylaminomethyl_two_thiouridine [SO_0001381]

5_isopentenylaminomethyl_2_thiouridine is a modified uridine base feature.

five_isopentenylaminomethyl_uridine [SO_0001380]

5_isopentenylaminomethyl_uridine is a modified uridine base feature.

five_methoxycarbonylmethyl_two_prime_O_methyluridine [SO_0001361]

Five_methoxycarbonylmethyl_2_prime_O_methyluridine is a modified uridine base feature.

five_methoxycarbonylmethyl_two_thiouridine [SO_0001362]

5_methoxycarbonylmethyl_2_thiouridine is a modified uridine base feature.

five_methoxycarbonylmethyluridine [SO_0001360]

Five_methoxycarbonylmethyluridine is a modified uridine base feature.

five_methoxyuridine [SO_0001355]

5_methoxyuridine is a modified uridine base feature.

five_methyl_2_thiouridine [SO_0001351]

5_methyl_2_thiouridine is a modified uridine base feature.

five_methylaminomethyl_two_selenouridine [SO_0001366]

5_methylaminomethyl_2_selenouridine is a modified uridine base feature.

five_methylaminomethyl_two_thiouridine [SO_0001365]

5_methylaminomethyl_2_thiouridine is a modified uridine base feature.

five_methylaminomethyluridine [SO_0001364]

5_methylaminomethyluridine is a modified uridine base feature.

five_methylcytidine [SO_0001282]

5-methylcytidine is a modified cytidine.

five_methyldihydrouridine [SO_0001376]

5_methyldihydrouridine is a modified uridine base feature.

five_methyluridine [SO_0001344]

5_methyluridine is a modified uridine base feature.

five_prime_cis_splice_site [SO_0000163]

Intronic 2 bp region bordering the exon, at the 5’ edge of the intron. A splice_site that is downstream_adjacent_to exon and starts intron.

five_prime_clip [SO_0000555]

5’ most region of a precursor transcript that is clipped off during processing.

five_prime_coding_exon [SO_0000200]

The 5’ most coding exon.

five_prime_coding_exon_coding_region [SO_0000196]

The sequence of the five_prime_coding_exon that codes for protein.

five_prime_coding_exon_noncoding_region [SO_0000486]

The sequence of the 5’ exon preceding the start codon.

five_prime_D_heptamer [SO_0000496]

7 nucleotide recombination site (e.g. CACTGTG), part of a 5’ D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene.

five_prime_D_nonamer [SO_0000497]

9 nucleotide recombination site (e.g. GGTTTTTGT), part of a five_prime_D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene.

five_prime_D_recombination_signal_sequence [SO_0000556]

Recombination signal of an immunoglobulin/T-cell receptor gene, including the 5’ D-nonamer (SO:0000497), 5’ D-spacer (SO:0000498), and 5’ D-heptamer (SO:0000396) in 5’ of the D-region of a D-gene, or in 5’ of the D-region of DJ-gene.

five_prime_D_spacer [SO_0000498]

12 or 23 nucleotide spacer between the 5’ D-heptamer (SO:0000496) and 5’ D-nonamer (SO:0000497) of a 5’ D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene.

five_prime_EST [SO_0001208]

An EST read from the 5’ end of a transcript that usually codes for a protein. These regions tend to be conserved across species and do not change much within a gene family.

five_prime_five_prime_overlap [SO_0000074]

An attribute to describe a gene when the five prime region overlaps with another gene’s five prime region.

five_prime_flanking_region [SO_0001416]

A flanking region located five prime of a specific region.

five_prime_intron [SO_0000190]

An intron that is the most 5-prime in a given transcript.

five_prime_LTR [SO_0000425]

The long terminal repeat found at the five-prime end of the sequence to be inserted into the host genome.

five_prime_LTR_component [SO_0000850]

A component of the five-prime long terminal repeat.

five_prime_noncoding_exon [SO_0000445]

Non-coding exon in the 5’ UTR.

five_prime_open_reading_frame [SO_0000629]

An open reading frame found within the 5’ UTR that can be translated and stall the translation of the downstream open reading frame.

five_prime_recoding_site [SO_1001280]

The recoding stimulatory signal located upstream of the recoding site.

five_prime_restriction_enzyme_junction [SO_0001689]

The restriction enzyme cleavage junction on the 5’ strand of the nucleotide sequence.

five_prime_RST [SO_0001469]

A tag produced from a single sequencing read from a 5’-RACE product; typically a few hundred base pairs long.

five_prime_sticky_end_restriction_enzyme_cleavage_site [SO_0001975]

A restriction enzyme recognition site that, when cleaved, results in 5 prime overhangs. Requested by Jackie Quinn. The sticky restriction sites are different from junctions because they include the sequence that is cut, inclusive of the five prime junction and the three prime junction.

five_prime_terminal_inverted_repeat [SO_0000420]

An inverted repeat (SO:0000294) occurring at the 5-prime termini of a DNA transposon.

five_prime_three_prime_overlap [SO_0000073]

An attribute to describe a gene when the five prime region overlaps with another gene’s 3’ region.

five_prime_UST [SO_0001466]

An UST located in the 5’UTR of a protein-coding transcript.

five_prime_UTR_intron [SO_0000447]

An intron located in the 5’ UTR.

five_prime_UTR_premature_start_codon_location_variant [SO_0001990]

A 5’ UTR variant where a premature start codon is moved.

five_taurinomethyl_two_thiouridine [SO_0001379]

5_taurinomethyl_2_thiouridineis a modified uridine base feature.

five_taurinomethyluridine [SO_0001378]

5_taurinomethyluridine is a modified uridine base feature.

five_two_prime_O_dimethylcytidine [SO_0001287]

5,2’-O-dimethylcytidine is a modified cytidine.

five_two_prime_O_dimethyluridine [SO_0001346]

5_2_prime_O_dimethyluridine is a modified uridine base feature.

fixed_variant [SO_0001768]

When a variant has become fixed in the population so that it is now the only variant.

flanked [SO_0000357]

An attribute describing a region that is bounded either side by a particular kind of region.

flanking_region [SO_0000239]

The sequences extending on either side of a specific region.

flanking_three_prime_quadruplet_recoding_signal [SO_1001281]

Four base pair sequence immediately downstream of the redefined region. The redefined region is a frameshift site. The quadruplet is 2 overlapping codons.

FLEX_element [SO_0001846]

A promoter element that has consensus sequence GTAAACAAACAAAM and contains a heptameric core GTAAACA, bound by transcription factors with a forkhead DNA-binding domain.

floxed_gene [SO_0000363]

A transgene that is floxed.

foldback_element [SO_0000238]

A transposable element with extensive secondary structure, characterized by large modular imperfect long inverted repeats.

foreign [SO_0000784]

An attribute to describe a region from another species.

foreign_gene [SO_0000285]

A gene that is foreign.

foreign_transposable_element [SO_0000720]

A transposable element that is foreign. requested by Michael on 19 Nov 2004.

forkhead_motif [SO_0001847]

A promoter element with consensus sequence TTTRTTTACA, bound by transcription factors with a forkhead DNA-binding domain.

forward [SO_0001030]

Forward is an attribute of the feature, where the feature is in the 5’ to 3’ direction.

forward_primer [SO_0000121]

A single stranded oligo used for polymerase chain reaction. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

four_bp_start_codon [SO_1001269]

A non-canonical start codon with 4 base pairs.

four_demethylwyosine [SO_0001341]

4_demethylwyosine is a modified guanosine base feature.

four_thiouridine [SO_0001350]

4_thiouridine is a modified uridine base feature.

fragment_assembly [SO_0001249]

A fragment assembly is a genome assembly that orders overlapping fragments of the genome based on landmark sequences. The base pair distance between the landmarks is known allowing additivity of lengths.

fragmentary [SO_0000731]

An attribute to describe a feature that is incomplete. Term added because of request by MO people.

frame_restoring_variant [SO_0001591]

A sequence variant that reverts the sequence of a previous frameshift mutation back to the initial frame.

frameshift [SO_0000865]

An attribute describing a sequence that contains a mutation involving the deletion or insertion of one or more bases, where this number is not divisible by 3.

Frameshift [SO_0001589]

A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three. EBI term:Frameshift variations - In coding sequence, resulting in a frameshift.

frameshift_elongation [SO_0001909]

A frameshift variant that causes the translational reading frame to be extended relative to the reference feature.

frameshift_truncation [SO_0001910]

A frameshift variant that causes the translational reading frame to be shortened relative to the reference feature.

FRE [SO_0002046]

A FRE is an enhancer element necessary and sufficient to confer filamentation associated expression in S. cerevisiae. Requested by Rama, SGD.

free [SO_0001516]

The quality of a duplication where the new region exists independently of the original.

free_chromosome_arm [SO_0000065]

A chromosome structure variation whereby an arm exists as an individual chromosome element.

free_duplication [SO_1000144]

A chromosome structure variation whereby the duplicated sequences are carried as a free centric element.

free_ring_duplication [SO_1000145]

A ring chromosome which is a copy of another chromosome.

FRT_flanked [SO_0000361]

An attribute to describe sequence that is flanked by the FLP recombinase recognition site, FRT.

FRT_site [SO_0000350]

An inversion site found on the Saccharomyces cerevisiae 2 micron plasmid.

functional_candidate_gene [SO_0001869]

A candidate gene whose function has something in common biologically with the trait under investigation. Requested by Bayer Cropscience December, 2011.

functional_variant [SO_0001536]

A variant whereby the effect is evaluated with respect to a reference. Updated after request from Lea Starita, lea.starita@gmail.com from the NCBI.

functionally_abnormal [SO_0002218]

A sequence variant in which the function of a gene product is altered with respect to a reference. Added after request from Lea Starita, lea.starita@gmail.com from the NCBI Feb 2019.

fusion [SO_0000806]

When two regions of DNA are joined together that are not normally together.

G_box [SO_0001980]

A regulatory promoter element identified in mutation experiments, with consensus sequence: CACGTG. Present in promoters, intergenic regions, coding regions, and introns. They are involved in gene expression responses to light and interact with G-box binding factor and I-box binding factor 1a. A plant specific region.

G_to_A_transition [SO_1000016]

A transition of a guanine to an adenine.

G_to_C_transversion [SO_1000026]

A transversion from guanine to cytidine.

G_to_T_transversion [SO_1000027]

A transversion from guanine to thymine.

GAGA_motif [SO_0001166]

A non directional promoter motif with consensus sequence GAGAGCG.

gain_of_function_variant [SO_0002053]

A sequence variant whereby new or enhanced function is conferred on the gene product.

galactosyl_queuosine [SO_0001319]

Galactosyl_queuosine is a modified 7-deazoguanosine.

gamma_turn [SO_0001138]

Gamma turns, defined for 3 residues i,( i+1),( i+2) if a hydrogen bond exists between residues i and i+2 and the phi and psi angles of residue i+1 fall within 40 degrees.

gamma_turn_classic [SO_0001139]

Gamma turns, defined for 3 residues i, i+1, i+2 if a hydrogen bond exists between residues i and i+2 and the phi and psi angles of residue i+1 fall within 40 degrees: phi(i+1)=75.0 - psi(i+1)=-64.0.

gamma_turn_inverse [SO_0001140]

Gamma turns, defined for 3 residues i, i+1, i+2 if a hydrogen bond exists between residues i and i+2 and the phi and psi angles of residue i+1 fall within 40 degrees: phi(i+1)=-79.0 - psi(i+1)=69.0.

gap [SO_0000730]

A gap in the sequence of known length. The unknown bases are filled in with N’s.

GATA_box [SO_0001840]

A GATA transcription factor element containing the consensus sequence WGATAR (in which W indicates A/T and R indicates A/G). Changed to is_a SO:0001055 transcriptional_cis_regulatory_region from core_eukaryotic_promoter_element SO:0001660 after Ruth Lovering from GREEKC initiative pointed out that GATA boxes are frequently in enhancer regions, Dave Sant Aug 2020. Moved from is_a SO:0001055 transcriptional_cis_regulatory_region to SO:0000235 TF_binding_site after Colin Logie pointed out that this is a consensus sequence where transcription factors bind, GREEKC Jan 21, 2021.

GC_rich_promoter_region [SO_0000173]

A conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG.

gene [SO_0000704]

A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. A gene may be considered as a unit of inheritance. A gene is any ‘gene allele’ that produces a functional transcript (ie one capable of translation into a protein, or independent functioning as an RNA), when encoded in the genome of some cell or virion.

gene_array [SO_0005851]

An array includes two or more genes, or two or more gene subarrays, contiguously arranged where the individual genes, or subarrays, are either identical in sequence, or essentially so. This would include, for example, a cluster of genes each encoding the major ribosomal RNAs and a cluster of histone gene subarrays.

gene_array_member [SO_0000081]

[gene_attribute; gene array member; gene_array_member]

gene_attribute [SO_0000401]

An attribute describing a gene.

gene_cassette_array [SO_0005854]

An array of non-functional genes whose members, when captured by recombination form functional genes. This would include, for example, the arrays of non-functional VSG genes of Trypanosomes.

gene_cassette_member [SO_0005848]

A gene that is a member of a gene cassette, which is a mobile genetic element.

gene_component_region [SO_0000842]

A region of a gene that has a specific function.

gene_fragment [SO_0000997]

A portion of a gene that is not the complete gene. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

gene_fusion [SO_0001565]

A sequence variant whereby a two genes have become joined.

gene_group [SO_0005855]

A collection of related genes.

gene_member_region [SO_0000831]

A region of a gene. A manufactured term used to allow the parts of a gene to have an is_a path to the root.

gene_rearranged_at_DNA_level [SO_0000138]

An epigenetically modified gene, rearranged at the DNA level.

gene_segment [SO_3000000]

A gene component region which acts as a recombinational unit of a gene whose functional form is generated through somatic recombination. Requested by tracker 2021594, July 2008, by Alex.

gene_silenced_by_DNA_methylation [SO_0000129]

A gene that is silenced by DNA methylation.

gene_silenced_by_DNA_modification [SO_0000128]

A gene that is silenced by DNA modification.

gene_silenced_by_histone_deacetylation [SO_0001227]

A gene that is silenced by histone deacetylation.

gene_silenced_by_histone_methylation [SO_0001226]

A gene that is silenced by histone methylation.

gene_silenced_by_histone_modification [SO_0001225]

A gene that is silenced by histone modification.

gene_silenced_by_RNA_interference [SO_0001224]

A gene that is silenced by RNA interference.

gene_subarray [SO_0005852]

A subarray is, by defintition, a member of a gene array (SO:0005851); the members of a subarray may differ substantially in sequence, but are closely related in function. This would include, for example, a cluster of genes encoding different histones.

gene_subarray_member [SO_0005849]

A gene that is a member of a group of genes that are either regulated or transcribed together within a larger group of genes that are regulated or transcribed together.

gene_to_gene_feature [SO_0000067]

[gene_to_gene_feature; gene to gene feature; gene_attribute]

gene_trap_construct [SO_0001477]

A construct which is designed to integrate into a genome and produce a fusion transcript between exons of the gene into which it inserts and a reporter element in the construct. Gene traps contain a splice acceptor, do not contain promoter elements for the reporter, and are mutagenic. Gene traps may be bicistronic with the second cassette containing a promoter driving an a selectable marker.

gene_variant [SO_0001564]

A sequence variant where the structure of the gene is changed.

gene_with_dicistronic_mRNA [SO_0000722]

A gene that encodes a polycistronic mRNA. Requested by MA nov 19 2004.

gene_with_dicistronic_primary_transcript [SO_0000721]

A gene that encodes a dicistronic primary transcript. Requested by Michael, 19 nov 2004.

gene_with_dicistronic_transcript [SO_0000692]

A gene that encodes a dicistronic transcript.

gene_with_edited_transcript [SO_0000548]

A gene that encodes a transcript that is edited.

gene_with_mRNA_recoded_by_translational_bypass [SO_0000711]

A gene with mRNA recoded by translational bypass.

gene_with_mRNA_with_frameshift [SO_0000455]

A gene that encodes an mRNA with a frameshift.

gene_with_non_canonical_start_codon [SO_0001739]

A gene with a start codon other than AUG. Requested by flybase, Dec 2010.

gene_with_polyadenylated_mRNA [SO_0000451]

A gene that encodes a polyadenylated mRNA.

gene_with_polycistronic_transcript [SO_0000690]

A gene that encodes a polycistronic transcript.

gene_with_recoded_mRNA [SO_0000693]

A gene that encodes an mRNA that is recoded.

gene_with_start_codon_CUG [SO_0001740]

A gene with a translational start codon of CUG. Requested by flybase, Dec 2010.

gene_with_stop_codon_read_through [SO_0000697]

A gene that encodes a transcript with stop codon readthrough.

gene_with_stop_codon_redefined_as_pyrrolysine [SO_0000698]

A gene encoding an mRNA that has the stop codon redefined as pyrrolysine.

gene_with_stop_codon_redefined_as_selenocysteine [SO_0000710]

A gene encoding an mRNA that has the stop codon redefined as selenocysteine.

gene_with_trans_spliced_transcript [SO_0000459]

A gene with a transcript that is trans-spliced.

gene_with_transcript_with_translational_frameshift [SO_0000712]

A gene encoding a transcript that has a translational frameshift.

genetic_marker [SO_0001645]

A measurable sequence feature that varies within a population.

genic_downstream_transcript_variant [SO_0002152]

A variant that falls downstream of a transcript, but within the genic region of the gene due to alternately transcribed isoforms.

genic_upstream_transcript_variant [SO_0002153]

A variant that falls upstream of a transcript, but within the genic region of the gene due to alternately transcribed isoforms.

genome [SO_0001026]

A genome is the sum of genetic material within a cell or virion. A genome is considered the complement of all heritable sequence features in a given cell or organism (chromosomal or extrachromosomal). This is typically a collection of >1 sequence molecules (e.g. chromosomes), but in some organisms (e.g. bacteria) it may be a single sequence macromolecule (e.g. a circular plasmid). For this reason ‘genome’ classifies under ‘sequence feature complement’.

genomic_clone [SO_0000040]

A clone of a DNA region of a genome.

genomic_DNA [SO_0000991]

DNA located in the genome and able to be transmitted to the offspring. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

genomic_DNA_read [SO_0001828]

A sequencer read of a genomic DNA substrate.

genomically_contaminated_cDNA_clone [SO_0000811]

A cDNA clone invalidated by genomic contamination.

germline_variant [SO_0001778]

A variant present in the embryo that is carried by every cell in the body.

glutamic_acid [SO_0001454]

A negatively charged, hydorophilic amino acid encoded by the codons GAA and GAG. A place holder for a cross product with chebi.

glutamic_acid_tRNA_primary_transcript [SO_0000216]

A primary transcript encoding glutaminyl tRNA (SO:0000260).

glutamine [SO_0001448]

A polar, hydorophilic amino acid encoded by the codons CAA and CAG. A place holder for a cross product with chebi.

glutamine_tRNA_primary_transcript [SO_0000217]

A primary transcript encoding glutamyl tRNA (SO:0000260).

glutaminyl_tRNA [SO_0000259]

A tRNA sequence that has a glutamine anticodon, and a 3’ glutamine binding region.

glutamyl_tRNA [SO_0000260]

A tRNA sequence that has a glutamic acid anticodon, and a 3’ glutamic acid binding region.

glycine [SO_0001443]

A non-polar, hydorophilic amino acid encoded by the codons GGN (GGT, GGC, GGA and GGG). A place holder for a cross product with chebi.

glycine_tRNA_primary_transcript [SO_0000218]

A primary transcript encoding glycyl tRNA (SO:0000263).

glycyl_tRNA [SO_0000261]

A tRNA sequence that has a glycine anticodon, and a 3’ glycine binding region.

GNA [SO_0001192]

An attribute describing a sequence consisting of nucleobases attached to a repeating unit made of an acyclic three-carbon propylene glycol connected to a phosphate backbone. It has two enantiomeric forms, (R)-GNA and (S)-GNA. Do not use this term for feature annotation. Use GNA_oligo (SO:0001192) instead.

golden_path [SO_0000688]

A set of subregions selected from sequence contigs which when concatenated form a nonredundant linear sequence.

golden_path_fragment [SO_0000468]

One of the pieces of sequence that make up a golden path.

gRNA_encoding [SO_0000979]

A non-protein_coding gene that encodes a guide_RNA.

gRNA_gene [SO_0001264]

A noncoding RNA that guides the insertion or deletion of uridine residues in mitochondrial mRNAs. This may also refer to synthetic RNAs used to guide DNA editing using the CRIPSR/Cas9 system.

group_1_intron_homing_endonuclease_target_region [SO_0000354]

A region of intronic nucleotide sequence targeted by a nuclease enzyme.

group_II_intron [SO_0000603]

Group II introns are found in rRNA, tRNA and mRNA of organelles in fungi, plants and protists, and also in mRNA in bacteria. They are large self-splicing ribozymes and have 6 structural domains (usually designated dI to dVI). A subset of group II introns also encode essential splicing proteins in intronic ORFs. The length of these introns can therefore be up to 3kb. Splicing occurs in almost identical fashion to nuclear pre-mRNA splicing with two transesterification steps. The 2’ hydroxyl of a bulged adenosine in domain VI attacks the 5’ splice site, followed by nucleophilic attack on the 3’ splice site by the 3’ OH of the upstream exon. Protein machinery is required for splicing in vivo, and long range intron to intron and intron-exon interactions are important for splice site positioning. Group II introns are further sub-classified into groups IIA and IIB which differ in splice site consensus, distance of bulged A from 3’ splice site, some tertiary interactions, and intronic ORF phylogeny. GO:0000373.

group_IIA_intron [SO_0000381]

A group II intron that recognizes IBS1/EBS1 and IBS2/EBS2 for the 5-prime exon and gamma/gamma-prime for the 3-prime exon.

group_IIB_intron [SO_0000382]

A group II intron that recognizes IBS1/EBS1 and IBS2/EBS2 for the 5-prime exon and IBS3/EBS3 for the 3-prime exon.

GT_dinucleotide_repeat [SO_0001862]

A dinucleotide repeat region composed of GT repeating elements. paper:PMID:16043634.

GTT_trinucleotide_repeat [SO_0001863]

A trinucleotide repeat region composed of GTT repeating elements.

guide_RNA [SO_0000602]

A short 3’-uridylated RNA that can form a duplex (except for its post-transcriptionally added oligo_U tail (SO:0000609)) with a stretch of mature edited mRNA.

guide_RNA_region [SO_0000930]

A region of guide RNA.

H_ACA_box_snoRNA [SO_0000594]

Members of the box H/ACA family contain an ACA triplet, exactly 3 nt upstream from the 3’ end and an H-box in a hinge region that links two structurally similar functional domains of the molecule. Both boxes are important for snoRNA biosynthesis and function. A few box H/ACA snoRNAs are involved in rRNA processing; most others are known or predicted to participate in selection of uridine nucleosides in rRNA to be converted to pseudouridines. Site selection is mediated by direct base pairing of the snoRNA with rRNA through one or both targeting domains.

H_ACA_box_snoRNA_encoding [SO_0000608]

snoRNA that is associated with guiding polyuridylation. It contains two short conserved sequence motifs: H box (ANANNA) and ACA (ACA).

H_ACA_box_snoRNA_primary_transcript [SO_0000596]

A primary transcript encoding a small nucleolar RNA of the box H/ACA family.

H_pseudoknot [SO_0000592]

A pseudoknot which contains two stems and at least two loops.

H2AK5_acetylation_site [SO_0001938]

A kind of histone modification site, whereby the 5th residue (a lysine), from the start of the H2A histone protein is acetylated.

H2AK9_acetylation_site [SO_0001944]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H2A histone protein is acetylated.

H2AZK11_acetylation_site [SO_0002147]

A kind of histone modification site, whereby the 11th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2AZK13_acetylation_site [SO_0002148]

A kind of histone modification site, whereby the 13th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2AZK15_acetylation_site [SO_0002149]

A kind of histone modification site, whereby the 15th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2AZK4_acetylation_site [SO_0002145]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2AZK7_acetylation_site [SO_0002146]

A kind of histone modification site, whereby the 7th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2B_ubiquitination_site [SO_0001717]

A histone modification site on H2B where ubiquitin may be added.

H2BK12_acetylation_site [SO_0001937]

A kind of histone modification site, whereby the 12th residue (a lysine), from the start of the H2B protein is acetylated.

H2BK120_acetylation_site [SO_0001940]

A kind of histone modification site, whereby the 120th residue (a lysine), from the start of the H2B histone protein is acetylated.

H2BK15_acetylation_site [SO_0001946]

A kind of histone modification site, whereby the 15th residue (a lysine), from the start of the H2B histone protein is acetylated.

H2BK20_acetylation_site [SO_0001942]

A kind of histone modification site, whereby the 20th residue (a lysine), from the start of the H2B histone protein is acetylated.

H2BK5_monomethylation_site [SO_0001714]

A kind of histone modification site, whereby the 5th residue (a lysine), from the start of the H2B protein is methylated.

H3K14_acetylation_site [SO_0001704]

A kind of histone modification site, whereby the 14th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K18_acetylation_site [SO_0001718]

A kind of histone modification site, whereby the 18th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K20_trimethylation_site [SO_0001935]

A kind of histone modification site, whereby the 20th residue (a lysine), from the start of the H3 protein is tri-methylated.

H3K23_acetylation_site [SO_0001719]

A kind of histone modification, whereby the 23rd residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K23_dimethylation_site [SO_0001951]

A kind of histone modification site, whereby the 23rd residue (a lysine), from the start of the H3 protein is di-methylated.

H3K27_acetylation_site [SO_0002049]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is acetylated. Requested by: Sagar Jain, Richard Scheuermann.

H3K27_dimethylation_site [SO_0001726]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is di-methylated.

H3K27_methylation_site [SO_0001732]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is methylated.

H3K27_monomethylation_site [SO_0001708]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is mono-methylated.

H3K27_trimethylation_site [SO_0001709]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is tri-methylated.

H3K36_acetylation_site [SO_0001936]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K36_dimethylation_site [SO_0001723]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is dimethylated.

H3K36_methylation_site [SO_0001733]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is methylated.

H3K36_monomethylation_site [SO_0001722]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is mono-methylated.

H3K36_trimethylation_site [SO_0001724]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is tri-methylated.

H3K4_acetylation_site [SO_0001943]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K4_dimethylation_site [SO_0001725]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H3 histone protein is di-methylated.

H3K4_methylation_site [SO_0001734]

A kind of histone modification, whereby the 4th residue (a lysine), from the start of the H3 protein is methylated.

H3K4_monomethylation_site [SO_0001705]

A kind of histone modification, whereby the 4th residue (a lysine), from the start of the H3 protein is mono-methylated.

H3K4_trimethylation [SO_0001706]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H3 protein is tri-methylated.

H3K56_acetylation_site [SO_0001945]

A kind of histone modification site, whereby the 56th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K79_dimethylation_site [SO_0001711]

A kind of histone modification site, whereby the 79th residue (a lysine), from the start of the H3 histone protein is di-methylated.

H3K79_methylation_site [SO_0001735]

A kind of histone modification site, whereby the 79th residue (a lysine), from the start of the H3 histone protein is methylated.

H3K79_monomethylation_site [SO_0001710]

A kind of histone modification site, whereby the 79th residue (a lysine), from the start of the H3 histone protein is mono- methylated.

H3K79_trimethylation_site [SO_0001712]

A kind of histone modification site, whereby the 79th residue (a lysine), from the start of the H3 histone protein is tri-methylated.

H3K9_acetylation_site [SO_0001703]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K9_dimethylation_site [SO_0001728]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein may be dimethylated.

H3K9_methylation_site [SO_0001736]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein is methylated.

H3K9_monomethylation_site [SO_0001727]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein is mono-methylated.

H3K9_trimethylation_site [SO_0001707]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein is tri-methylated.

H3R2_dimethylation_site [SO_0001948]

A kind of histone modification site, whereby the 2nd residue (an arginine), from the start of the H3 protein is di-methylated.

H3R2_monomethylation_site [SO_0001947]

A kind of histone modification site, whereby the 2nd residue (an arginine), from the start of the H3 protein is mono-methylated.

H4K_acylation_region [SO_0001738]

A region of the H4 histone whereby multiple lysines are acylated.

H4K12_acetylation_site [SO_0001939]

A kind of histone modification site, whereby the 12th residue (a lysine), from the start of the H4 histone protein is acetylated.

H4K16_acetylation_site [SO_0001729]

A kind of histone modification site, whereby the 16th residue (a lysine), from the start of the H4 histone protein is acetylated.

H4K20_monomethylation_site [SO_0001713]

A kind of histone modification site, whereby the 20th residue (a lysine), from the start of the H4histone protein is mono-methylated.

H4K4_trimethylation_site [SO_0001950]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H4 protein is tri-methylated.

H4K5_acetylation_site [SO_0001730]

A kind of histone modification site, whereby the 5th residue (a lysine), from the start of the H4 histone protein is acetylated.

H4K8_acetylation_site [SO_0001731]

A kind of histone modification site, whereby the 8th residue (a lysine), from the start of the H4 histone protein is acetylated.

H4K91_acetylation_site [SO_0001941]

A kind of histone modification site, whereby the 91st residue (a lysine), from the start of the H4 histone protein is acetylated.

H4R3_dimethylation_site [SO_0001949]

A kind of histone modification site, whereby the 3nd residue (an arginine), from the start of the H4 protein is di-methylated.

hammerhead_ribozyme [SO_0000380]

A small catalytic RNA motif that catalyzes self-cleavage reaction. Its name comes from its secondary structure which resembles a carpenter’s hammer. The hammerhead ribozyme is involved in the replication of some viroid and some satellite RNAs.

haplotype [SO_0001024]

A haplotype is one of a set of coexisting sequence variants of a haplotype block.

haplotype_block [SO_0000355]

A region of the genome which is co-inherited as the result of the lack of historic recombination within it.

helix_turn_helix [SO_0001081]

A motif comprising two helices separated by a turn.

heptamer_of_recombination_feature_of_vertebrate_immune_system_gene [SO_0000561]

Seven nucleotide recombination site (e.g. CACAGTG), part of V-gene, D-gene or J-gene recombination feature of an immunoglobulin or T-cell receptor gene.

heritable_phenotypic_marker [SO_0001500]

A biological_region characterized as a single heritable trait in a phenotype screen. The heritable phenotype may be mapped to a chromosome but generally has not been characterized to a specific gene locus.

HERV_deletion [SO_0002067]

A deletion of the HERV mobile element with respect to a reference.

hetero_compound_chromosome [SO_1000140]

A compound chromosome whereby two arms from different chromosomes are connected through the centromere of one of them.

high_identity_region [SO_0001502]

An experimental feature with high sequence identity to another sequence. Requested by tracker ID: 2902685.

high_quality_draft [SO_0001487]

The status of a whole genome sequence, where overall coverage represents at least 90 percent of the genome.

histidine [SO_0001452]

A positively charged, hydorophilic amino acid encoded by the codons CAT and CAC. A place holder for a cross product with chebi.

histidine_tRNA_primary_transcript [SO_0000219]

A primary transcript encoding histidyl tRNA (SO:0000262).

histidyl_tRNA [SO_0000262]

A tRNA sequence that has a histidine anticodon, and a 3’ histidine binding region.

histone_2A_acetylation_site [SO_0002142]

A histone 2A modification where the modification is the acetylation of the residue.

histone_2AZ_acetylation_site [SO_0002144]

A histone 2AZ modification where the modification is the acetylation of the residue.

histone_2B_acetylation_site [SO_0002143]

A histone 2B modification where the modification is the acetylation of the residue.

histone_3_acetylation_site [SO_0001973]

A histone 3 modification where the modification is the acetylation of the residue.

histone_4_acetylation_site [SO_0001972]

A histone 4 modification where the modification is the acetylation of the residue.

histone_acetylation_site [SO_0001702]

A histone modification where the modification is the acylation of the residue.

histone_acylation_region [SO_0001737]

A histone modification, whereby the histone protein is acylated at multiple sites in a region.

histone_binding_site [SO_0001383]

A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues of a histone.

histone_methylation_site [SO_0001701]

A histone modification site where the modification is the methylation of the residue.

histone_modification [SO_0001700]

Histone modification is a post translationally modified region whereby residues of the histone protein are modified by methylation, acetylation, phosphorylation, ubiquitination, sumoylation, citrullination, or ADP-ribosylation.

histone_ubiqitination_site [SO_0001716]

A histone modification site where ubiquitin may be added.

homing_endonuclease_binding_site [SO_0001257]

The binding site (recognition site) of a homing endonuclease. The binding site is typically large.

homo_compound_chromosome [SO_1000138]

A compound chromosome whereby two copies of the same chromosomal arm attached to a common centromere. The chromosome is diploid for the arm involved.

homol_D_box [SO_0001848]

A core promoter element that has the consensus sequence CAGTCACA (or its inverted form TGTGACTG), and plays the role of a TATA box in promoters that do not contain a canonical TATA sequence.

homol_E_box [SO_0001849]

A core promoter element that has the consensus sequence ACCCTACCCT (or its inverted form AGGGTAGGGT), and is found near the homol D box in some promoters that use a homol D box instead of a canonical TATA sequence.

homologous [SO_0000857]

Similarity due to common ancestry.

homologous_region [SO_0000853]

A region that is homologous to another region.

HSE [SO_0001850]

A promoter element that consists of at least three copies of the pentanucleotide NGAAN, bound by the heat shock transcription factor HSF.

hydrophobic_region_of_peptide [SO_0100013]

Hydrophobic regions are regions with a low affinity for water. Range.

hydroxywybutosine [SO_0001334]

Hydroxywybutosine is a modified guanosine base feature.

hypoploid [SO_0000056]

A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number as some chromosomes are missing.

i_motif [SO_0001010]

A cytosine rich domain whereby strands associate both inter- and intramolecularly at moderately acidic pH.

I-box [SO_0001982]

A plant regulatory promoter motif, composed of a highly conserved hexamer GATAAG (I-box core).

iDNA [SO_0000723]

Genomic sequence removed from the genome, as a normal event, by a process of recombination.

IG_C_gene [SO_0002123]

A constant (C) gene, a gene that codes the constant region of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_C_pseudogene [SO_0002100]

A pseudogenic constant region of an immunoglobulin gene which closely resembles a known functional Imunoglobulin constant gene but in which the coding region has stop codons, frameshift mutations or a mutation that effects the initiation codon. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_D_gene [SO_0002124]

A gene that rearranges at the DNA level and codes the diversity region of the variable domain of an immunoglobuin (IG) gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_J_gene [SO_0002125]

A joining gene that rearranges at the DNA level and codes the joining region of the variable domain of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_J_pseudogene [SO_0002101]

A pseudogenic joining region which closely resembles a known functional imunoglobulin joining gene but in which the coding region has stop codons, frameshift mutations or a mutation that effects the initiation codon that rearranges at the DNA level and codes the joining region of the variable domain of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_V_gene [SO_0002126]

A variable gene that rearranges at the DNA level and codes the variable region of the variable domain of an Immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_V_pseudogene [SO_0002102]

A pseudogenic variable region which closely resembles a known functional imunoglobulin variable gene but in which the coding region has stop codons, frameshift mutations or a mutation that effects the initiation codon that rearranges at the DNA level and codes the variable region of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

immature_peptide_region [SO_0001063]

An immature_peptide_region is the extent of the peptide after it has been translated and before any processing occurs. Range.

immunoglobulin_gene [SO_0002122]

A germline immunoglobulin gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

immunoglobulin_pseudogene [SO_0002098]

A pseudogene derived from an immunoglobulin gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

immunoglobulin_region [SO_0001832]

A region of immunoglobulin sequence, either constant or variable.

improved_high_quality_draft [SO_0001488]

The status of a whole genome sequence, where additional work has been performed, using either manual or automated methods, such as gap resolution.

inactive_catalytic_site [SO_0001618]

A sequence variant that causes the inactivation of a catalytic site with respect to a reference sequence.

inactive_ligand_binding_site [SO_0001560]

A sequence variant that causes the inactivation of a ligand binding site with respect to a reference sequence.

incomplete_terminal_codon_variant [SO_0001626]

A sequence variant where at least one base of the final codon of an incompletely annotated transcript is changed. EBI term: Partial codon - Located within the final, incomplete codon of a transcript with a shortened coding sequence where the end is unknown.

incomplete_transcript_3UTR_variant [SO_0002076]

A sequence variant that intersects the 3’ UTR of an incompletely annotated transcript.

incomplete_transcript_5UTR_variant [SO_0002077]

A sequence variant that intersects the 5’ UTR of an incompletely annotated transcript.

incomplete_transcript_CDS [SO_0002081]

A sequence variant that intersects the coding regions of an incompletely annotated transcript.

incomplete_transcript_coding_splice_variant [SO_0002082]

A sequence variant that intersects the coding sequence near a splice region of an incompletely annotated transcript.

incomplete_transcript_exonic_variant [SO_0002080]

A sequence variant that intersects the exon of an incompletely annotated transcript.

incomplete_transcript_intronic_variant [SO_0002078]

A sequence variant that intersects the intron of an incompletely annotated transcript.

incomplete_transcript_splice_region_variant [SO_0002079]

A sequence variant that intersects the splice region of an incompletely annotated transcript.

incomplete_transcript_variant [SO_0002075]

A sequence variant that intersects an incompletely annotated transcript. This term is to map to the ANNOVAR term ’ncRNA’ http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ . The description in the documentation (11/23/15) ‘variant overlaps a transcript without coding annotation in the gene definition’. and this is further clarified in the document: ncRNA above refers to RNA without coding annotation. It does not mean that this is a RNA that will never be translated; it merely means that the user-selected gene annotation system was not able to give a coding sequence annotation. It could still code protein products and may have such annotations in future versions of gene annotation or in another gene annotation system. For example, BC039000 is regarded as ncRNA by ANNOVAR when using UCSC Known Gene annotation, but it is regarded as a protein-coding gene by ANNOVAR when using ENSEMBL annotation. It is further clarified in the comments section as: ncRNA does NOT mean conventional non-coding RNA. It means a RNA without complete coding sequence, and it can be a coding RNA that is annotated incorrectly by RefSeq or other gene definition systems.

increased_polyadenylation_variant [SO_0001802]

A transcript processing variant whereby polyadenylation of the encoded transcript is increased with respect to the reference. Term requested by M. Dumontier, June 1 2011.

increased_transcript_level_variant [SO_0001542]

A sequence variant that increases the level of mature, spliced and processed RNA with respect to a reference sequence.

increased_transcript_stability_variant [SO_0001548]

A sequence variant that increases transcript stability with respect to a reference sequence.

increased_transcription_rate_variant [SO_0001551]

A sequence variant that increases the rate of transcription with respect to a reference sequence.

increased_translational_product_level [SO_0001556]

A sequence variant which increases the translational product level with respect to a reference sequence.

independently_known [SO_0000906]

Attribute to describe a feature that is independently known - not predicted.

inducible_promoter [SO_0002051]

A promoter whereby activity is induced by the presence or absence of biotic or abiotic factors.

inframe [SO_0001817]

An attribute describing a sequence that contains a mutation involving the deletion or insertion of one or more bases, where this number is divisible by 3.

inframe_deletion [SO_0001822]

An inframe non synonymous variant that deletes bases from the coding sequence.

inframe_indel [SO_0001820]

A coding sequence variant where the change does not alter the frame of the transcript.

inframe_variant [SO_0001650]

A sequence variant which does not cause a disruption of the translational reading frame.

Initiating Methionine [SO_0001582]

A codon variant that changes at least one base of the first codon of a transcript. This is being used to annotate changes to the first codon of a transcript, when the first annotated codon is not to methionine. A variant is predicted to change the first amino acid of a translation irrespective of the fact that the underlying codon is an AUG. As such for transcripts with an incomplete CDS (sequence does not start with an AUG), it is still called.

inosine [SO_0001230]

A modified RNA base in which hypoxanthine is bound to the ribose ring. The free molecule is CHEBI:17596.

INR_motif [SO_0000014]

A sequence element characteristic of some RNA polymerase II promoters required for the correct positioning of the polymerase for the start of transcription. Overlaps the TSS. The mammalian consensus sequence is YYAN(T|A)YY; the Drosophila consensus sequence is TCA(G|T)t(T|C). In each the A is at position +1 with respect to the TSS. Functionally similar to the TATA box element. Binds TAF1, TAF2.

INR1_motif [SO_0001163]

A promoter motif with consensus sequence TCATTCG.

Insertion [SO_0000667]

The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence.

insertion_attribute [SO_0001512]

A quality of a chromosomal insertion,.

insertion_breakpoint [SO_0001414]

The point within a chromosome where a insertion begins or ends.

insertion_site [SO_0000366]

The junction where an insertion occurred.

insertional [SO_0001522]

When a translocation is simply moving genetic material from one chromosome to another.

insertional_duplication [SO_1000154]

A chromosome duplication involving the insertion of a duplicated region (as opposed to a free duplication).

inside_intron [SO_0000069]

An attribute to describe a gene when it is located within the intron of another gene.

inside_intron_antiparallel [SO_0000070]

An attribute to describe a gene when it is located within the intron of another gene and on the opposite strand.

inside_intron_parallel [SO_0000071]

An attribute to describe a gene when it is located within the intron of another gene and on the same strand.

insulator_binding_site [SO_0001460]

A binding site that, in an insulator region of a nucleotide molecule, interacts selectively and non-covalently with polypeptide residues. See tracker ID 2060908.

integrated_mobile_genetic_element [SO_0001039]

An MGE that is integrated into the host chromosome.

integrated_plasmid [SO_0001040]

A plasmid sequence that is integrated within the host chromosome.

integration_excision_site [SO_0000946]

A region specifically recognised by a recombinase, which inserts or removes another region marked by a distinct cognate integration/excision site.

intein_containing [SO_0000729]

An attribute of protein-coding genes where the initial protein product contains an intein.

intein_encoding_region [SO_0002026]

The nucleotide sequence which encodes the intein portion of the precursor gene. Requested by Janos Demeter 2014.

interband [SO_0000450]

A light region between two darkly staining bands in a polytene chromosome.

interchromosomal [SO_0001511]

A change in chromosomes that occurs between two sections of the same chromosome or between homologous chromosomes.

interchromosomal_breakpoint [SO_0001873]

A rearrangement breakpoint between two different chromosomes.

interchromosomal_duplication [SO_0000457]

A chromosome duplication involving an insertion from another chromosome.

interchromosomal_mutation [SO_1000031]

A chromosomal structure variation whereby more than one chromosome is involved.

interchromosomal_translocation [SO_0002060]

A translocation where the regions involved are from different chromosomes.

interchromosomal_transposition [SO_1000155]

A chromosome structure variation whereby a transposition occurred between chromosomes.

intergenic_1kb_variant [SO_0002074]

A variant that falls in an intergenic region that is 1 kb or less between 2 genes. This term is added to map to the Annovar annotation ‘upstream,downstream’ .

intergenic_variant [SO_0001628]

A sequence variant located in the intergenic region, between genes. EBI term Intergenic variations - More than 5 kb either upstream or downstream of a transcript.

interior_coding_exon [SO_0000004]

A coding exon that is not the most 3-prime or the most 5-prime in a given transcript.

interior_exon [SO_0000201]

An exon that is bounded by 5’ and 3’ splice sites.

interior_intron [SO_0000191]

An intron that is not the most 3-prime or the most 5-prime in a given transcript.

intermediate [SO_0000933]

An attribute to describe a feature between stages of processing.

intermediate_element [SO_0001677]

A core promoter region of RNA polymerase III type 1 promoters.

internal_eliminated_sequence [SO_0000671]

A sequence eliminated from the genome of ciliates during nuclear differentiation.

internal_feature_elongation [SO_0001908]

A sequence variant that causes the extension of a genomic feature from within the feature rather than from the terminus of the feature, with regard to the reference sequence.

internal_guide_sequence [SO_0001016]

A purine-rich sequence in the group I introns which determines the locations of the splice sites in group I intron splicing and has catalytic activity.

internal_ribosome_entry_site [SO_0000243]

Sequence element that recruits a ribosomal subunit to internal mRNA for translation initiation.

internal_Shine_Dalgarno_sequence [SO_1001260]

A Shine-Dalgarno sequence that stimulates recoding through interactions with the anti-Shine-Dalgarno in the RNA of small ribosomal subunits of translating ribosomes. The signal is only operative in Bacteria.

internal_transcribed_spacer_region [SO_0000639]

Non-coding regions of DNA sequence that separate genes coding for the 28S, 5.8S, and 18S ribosomal RNAs.

internal_UTR [SO_0000241]

A UTR bordered by the terminal and initial codons of two CDSs in a polycistronic transcript. Every UTR is either 5’, 3’ or internal.

intrachromosomal [SO_0001510]

A change in chromosomes that occurs between two separate chromosomes.

intrachromosomal_breakpoint [SO_0001874]

A rearrangement breakpoint within the same chromosome.

intrachromosomal_duplication [SO_1000038]

A duplication that occurred within a chromosome.

intrachromosomal_mutation [SO_1000028]

A chromosomal structure variation within a single chromosome.

intrachromosomal_translocation [SO_0002061]

A translocation where the regions involved are from the same chromosome.

intrachromosomal_transposition [SO_1000041]

A chromosome structure variation whereby a transposition occurred within a chromosome.

intragenic_variant [SO_0002011]

A variant that occurs within a gene but falls outside of all transcript features. This occurs when alternate transcripts of a gene do not share overlapping sequence. Requested by Pablo Cingolani, for use in SnpEff.

intramembrane_polypeptide_region [SO_0001075]

Polypeptide region present in the lipid bilayer.

intrinsically_unstructured_polypeptide_region [SO_0100003]

A region of polypeptide chain with high conformational flexibility.

introgressed_chromosome_region [SO_0000664]

A region of a chromosome that has been introduced by backcrossing with a separate species.

intron [SO_0000188]

A region of a primary transcript that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

intron_base_5 [SO_0001994]

Fifth intronic position after the intron exon boundary, close to the 5’ edge of the intron.

intron_domain [SO_0001014]

An intronic region that has an attribute. Requested by Colin Batchelor, Feb 2007.

intron_gain_variant [SO_0001573]

A sequence variant whereby an intron is gained by the processed transcript; usually a result of an alteration of the donor or acceptor.

intron_variant [SO_0001627]

A transcript variant occurring within an intron. EBI term: Intronic variations - In intron.

intronic_regulatory_region [SO_0001492]

A regulatory region that is part of an intron. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

intronic_splice_enhancer [SO_0000320]

Sequences within the intron that modulate splice site selection for some introns.

intronic_splicing_silencer [SO_0002056]

An intronic splicing regulatory element that functions to recruit trans acting splicing factors suppress the transcription of the gene or genes they control. Requested by Javier Diez Perez.

invalidated [SO_0000790]

An attribute describing a feature that is invalidated.

invalidated_by_chimeric_cDNA [SO_0000362]

A cDNA clone constructed from more than one mRNA. Usually an experimental artifact.

invalidated_by_genomic_contamination [SO_0000414]

An attribute to describe a feature that is invalidated due to genomic contamination.

invalidated_by_genomic_polyA_primed_cDNA [SO_0000415]

An attribute to describe a feature that is invalidated due to polyA priming.

invalidated_by_partial_processing [SO_0000416]

An attribute to describe a feature that is invalidated due to partial processing.

invalidated_cDNA_clone [SO_0000809]

A cDNA clone that is invalid.

inversion [SO_1000036]

A continuous nucleotide sequence is inverted in the same position.

inversion_attribute [SO_0001517]

When a region of a chromosome is changed to the reverse order without duplication or deletion.

inversion_breakpoint [SO_0001022]

The point within a chromosome where an inversion begins or ends.

inversion_cum_translocation [SO_1000148]

A chromosomal translocation whereby the first two breaks are in the same chromosome, and the region between them is rejoined in inverted order to the other side of the first break, such that both sides of break one are present on the same chromosome. The remaining free ends are joined as a translocation with those resulting from the third break.

inversion_derived_aneuploid_chromosome [SO_0000567]

A chromosome may be generated by recombination between two inversions; presumed to have a deficiency or duplication at each end of the inversion.

inversion_derived_bipartite_deficiency [SO_0000461]

A chromosomal deletion whereby a chromosome generated by recombination between two inversions; has a deficiency at each end of the inversion.

inversion_derived_bipartite_duplication [SO_0000547]

A chromosome generated by recombination between two inversions; there is a duplication at each end of the inversion.

inversion_derived_deficiency_plus_aneuploid [SO_0000512]

A chromosomal deletion whereby a chromosome generated by recombination between two inversions; has a deficiency at one end and presumed to have a deficiency or duplication at the other end of the inversion.

inversion_derived_deficiency_plus_duplication [SO_0000465]

A chromosome deletion whereby a chromosome is generated by recombination between two inversions; there is a deficiency at one end of the inversion and a duplication at the other end of the inversion.

inversion_derived_duplication_plus_aneuploid [SO_0000549]

A chromosome generated by recombination between two inversions; has a duplication at one end and presumed to have a deficiency or duplication at the other end of the inversion.

inversion_site [SO_0000948]

A region specifically recognised by a recombinase, which inverts the region flanked by a pair of sites. A target region for site-specific inversion of a DNA region and which carries binding sites for a site-specific recombinase and accessory proteins as well as the site for specific cleavage by the recombinase.

inversion_site_part [SO_0001048]

A region located within an inversion site. A term created to allow the parts of an inversion site have an is_a path back to the root.

inverted [SO_0001515]

A quality of an insertion where the insert is in a cytologically inverted orientation.

inverted_insertional_duplication [SO_1000153]

An insertional duplication where a copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments.

inverted_interchromosomal_transposition [SO_1000156]

An interchromosomal transposition whereby a copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segment.

inverted_intrachromosomal_transposition [SO_1000158]

An intrachromosomal transposition whereby the segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments.

inverted_repeat [SO_0000294]

The sequence is complementarily repeated on the opposite strand. It is a palindrome, and it may, or may not be hyphenated. Examples: GCTGATCAGC, or GCTGA—–TCAGC.

inverted_ring_chromosome [SO_0000439]

A ring chromosome is a chromosome whose arms have fused together to form a ring in an inverted fashion, often with the loss of the ends of the chromosome.

inverted_tandem_duplication [SO_1000040]

A tandem duplication where the individual regions are not in the same orientation.

IRLinv_site [SO_0001046]

Component of the inversion site located at the left of a region susceptible to site-specific inversion.

iron_repressed_GATA_element [SO_0001851]

A GATA promoter element with consensus sequence WGATAA, found in promoters of genes repressed in the presence of iron. The synonym IDP (GATA) is found in an annotation but un-traced as far as literature goes.

iron_responsive_element [SO_0001182]

A regulatory sequence found in the 5’ and 3’ UTRs of many mRNAs which encode iron-binding proteins. It has a hairpin structure and is recognized by trans-acting proteins known as iron-regulatory proteins.

IRRinv_site [SO_0001047]

Component of the inversion site located at the right of a region susceptible to site-specific inversion.

isoleucine [SO_0001438]

A non-polar, hydorophobic amino acid encoded by the codons ATH (ATT, ATC and ATA). A place holder for a cross product with chebi.

isoleucine_tRNA_primary_transcript [SO_0000220]

A primary transcript encoding isoleucyl tRNA (SO:0000263).

isoleucyl_tRNA [SO_0000263]

A tRNA sequence that has an isoleucine anticodon, and a 3’ isoleucine binding region.

isowyosine [SO_0001342]

Isowyosine is a modified guanosine base feature.

ISRE [SO_0001715]

An ISRE is a transcriptional cis regulatory region, containing the consensus region: YAGTTTC(A/T)YTTTYCC, responsible for increased transcription via interferon binding. Term requested via tracker (2981725) by Alan Ruttenberg, April 2010. It has been described as both an enhancer and a promoter, so the parent is the more general term. Moved from is_a SO:0001055 transcriptional_cis_regulatory_region to SO:0000235 TF_binding_site after Colin Logie pointed out that this is a consensus sequence where transcription factors bind, GREEKC Jan 21, 2021.

J_C_cluster [SO_0000511]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one J-gene and one C-gene.

J_cluster [SO_0000513]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including more than one J-gene.

J_gene_recombination_feature [SO_0000302]

Recombination signal including J-heptamer, J-spacer and J-nonamer in 5’ of J-region of a J-gene or J-sequence.

J_gene_segment [SO_0000470]

Germline genomic DNA of an immunoglobulin/T-cell receptor gene including J-region with 5’ UTR (SO:0000204) and 3’ UTR (SO:0000205), also designated as J-segment.

J_heptamer [SO_0000515]

7 nucleotide recombination site (e.g. CACAGTG), part of a J-gene recombination feature of an immunoglobulin/T-cell receptor gene.

J_nonamer [SO_0000514]

9 nucleotide recombination site (e.g. GGTTTTTGT), part of a J-gene recombination feature of an immunoglobulin/T-cell receptor gene.

J_spacer [SO_0000517]

12 or 23 nucleotide spacer between the J-nonamer and the J-heptamer of a J-gene recombination feature of an immunoglobulin/T-cell receptor gene.

junction [SO_0000699]

A sequence_feature with an extent of zero. A junction is a boundary between regions. A boundary has an extent of zero.

KEN_box [SO_0001807]

A conserved polypeptide motif that can be recognized by FZR/Cdh1-activated anaphase-promoting complex/cyclosome (APC/C) and targets a protein for ubiquitination and subsequent degradation by the APC/C. The consensus sequence is KENXXXN.

kinetoplast_gene [SO_0000089]

A gene located in kinetoplast sequence.

kozak_sequence [SO_0001647]

A kind of ribosome entry site, specific to Eukaryotic organisms that overlaps part of both 5’ UTR and CDS sequence.

L_box [SO_0001981]

An orientation dependent regulatory promoter element, with consensus sequence of TTGCACAN4TTGCACA, found in plants.

L1_LINE_retrotransposon [SO_0002272]

Long interspersed element-1 (LINE-1) elements are found in the human genome, which contains ORF1 (open reading frame1, including CC, coiled coil; RRM, RNA recognition motif; CTD, carboxyl-terminal domain) and ORF2 (including EN, endonuclease; RT, reverse transcriptase; C, cysteine-rich domain). The L1-encoded proteins (ORF1p and ORF2p) can mobilize nonautonomous retrotransposons, other noncoding RNAs, and messenger RNAs. Added as per GitHub Issue Request #488 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/488)

laevosynaptic_chromosome [SO_1000143]

LS is an autosynaptic chromosome carrying the two left (L = levo) telomeres.

lambda_vector [SO_0000754]

The lambda bacteriophage is the vector for the linear lambda clone. The genes involved in the lysogenic pathway are removed from the from the viral DNA. Up to 25 kb of foreign DNA can then be inserted into the lambda genome.

lariat_intron [SO_0001958]

A kind of intron whereby the excision is driven by lariat formation. Requested by PomBase 3604508.

late_origin_of_replication [SO_0002141]

An origin of replication that initiates late in S phase.

left_handed_peptide_helix [SO_0001115]

A left handed helix is a region of peptide where the coiled conformation turns in an anticlockwise, left handed screw.

lethal_variant [SO_0001773]

A sequence variant where the mutated gene product does not allow for one or more basic functions necessary for survival.

leucine [SO_0001437]

A non-polar, hydorophobic amino acid encoded by the codons CTN (CTT, CTC, CTA and CTG), TTA and TTG. A place holder for a cross product with chebi.

leucine_tRNA_primary_transcript [SO_0000221]

A primary transcript encoding leucyl tRNA (SO:0000264).

leucoplast_chromosome [SO_0000823]

A chromosome with origin in a leucoplast.

leucoplast_gene [SO_0000095]

A plastid gene from leucoplast sequence.

leucoplast_sequence [SO_0000747]

DNA belonging to the genome of a leucoplast, a colorless plastid generally containing starch or oil.

leucyl_tRNA [SO_0000264]

A tRNA sequence that has a leucine anticodon, and a 3’ leucine binding region.

level_of_transcript_variant [SO_0001540]

A sequence variant which alters the level of a transcript.

ligand_binding_site [SO_0001657]

A binding site that, in the molecule, interacts selectively and non-covalently with a small molecule such as a drug, or hormone.

ligation_based_read [SO_0001425]

A read produced by ligation based sequencing technologies. An example of this kind of read is one produced by ABI SOLiD.

lincRNA [SO_0001463]

Long, intervening non-coding RNA. A transcript that does not overlap within the start or end genomic coordinates of a coding gene or pseudogene on either strand.

lincRNA_gene [SO_0001641]

A gene that encodes a long, intervening non-coding RNA.

LINE_element [SO_0000194]

A dispersed repeat family with many copies, each from 1 to 6 kb long. New elements are generated by retroposition of a transcribed copy. Typically the LINE contains 2 ORF’s one of which is reverse transcriptase, and 3’and 5’ direct repeats.

LINE1_deletion [SO_0002069]

A deletion of a LINE1 mobile element with respect to a reference.

LINE1_insertion [SO_0002064]

An insertion from the Line1 family of mobile elements.

linear [SO_0000987]

A quality of a nucleotide polymer that has a 3’-terminal residue and a 5’-terminal residue. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

linear_double_stranded_DNA_chromosome [SO_0000957]

Structural unit composed of a self-replicating, double-stranded, linear DNA molecule.

linear_double_stranded_RNA_chromosome [SO_0000964]

Structural unit composed of a self-replicating, double-stranded, linear RNA molecule.

linear_single_stranded_DNA_chromosome [SO_0000959]

Structural unit composed of a self-replicating, single-stranded, linear DNA molecule.

linear_single_stranded_RNA_chromosome [SO_0000963]

Structural unit composed of a self-replicating, single-stranded, linear RNA molecule.

linkage_group [SO_0000018]

A group of loci that can be grouped in a linear order representing the different degrees of linkage among the genes concerned.

lipoprotein_signal_peptide [SO_0100009]

A peptide that acts as a signal for both membrane translocation and lipid attachment in prokaryotes.

LNA [SO_0001188]

An attribute describing a sequence consisting of nucleobases attached to a repeating unit made of ’locked’ deoxyribose rings connected to a phosphate backbone. The deoxyribose unit’s conformation is ’locked’ by a 2’-C,4’-C-oxymethylene link. Do not use this term for feature annotation. Use LNA_oligo (SO:0001189) instead.

lnc_RNA [SO_0001877]

A non-coding RNA over 200nucleotides in length.

lncRNA_gene [SO_0002127]

A gene that encodes a long non-coding RNA. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes.

lncRNA_primary_transcript [SO_0002035]

A primary transcript encoding a lncRNA.

lncRNA_with_retained_intron [SO_0002113]

A lncRNA transcript containing a retained intron. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

long_terminal_repeat [SO_0000286]

A sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses.

loR [SO_0002033]

A short, non coding transcript of loop-derived sequences encoded in precursor miRNA. MoRs are generated from miR hairpins that are longer and can produce two functional miR per strand. They are called moRs because they are not located next to the loop and thus their biogenesis process is a little different, but functionally, they are supposed to act like miRs. It is the same for loRs that are the loop fragments, they are generated differently than miRs or moRs but if loaded into the risc they are supposed to act the same way miRs do. Requested by Thomas Desvignes, Jan 2015.

loss_of_function_variant [SO_0002054]

A sequence variant whereby the gene product has diminished or abolished function.

loss_of_heterozygosity [SO_0001786]

A functional variant whereby the sequence alteration causes a loss of function of one allele of a gene.

low_complexity [SO_0001004]

When a sequence does not contain an equal distribution of all four possible nucleotide bases or does not contain all nucleotide bases.

low_complexity_region [SO_0001005]

A region where the DNA does not contain an equal distrubution of all four possible nucleotides or does not contain all four nucleotides.

loxP_site [SO_0000346]

Cre-Recombination target sequence.

LTR_component [SO_0000848]

The long terminal repeat found at the ends of the sequence to be inserted into the host genome.

LTR_retrotransposon [SO_0000186]

A retrotransposon flanked by long terminal repeat sequences.

lysine [SO_0001450]

A positively charged, hydorophilic amino acid encoded by the codons AAA and AAG. A place holder for a cross product with chebi.

lysine_tRNA_primary_transcript [SO_0000222]

A primary transcript encoding lysyl tRNA (SO:0000265).

lysosomal_localization_signal [SO_0001530]

A polypeptide region that targets a polypeptide to the lysosome.

lysyl_tRNA [SO_0000265]

A tRNA sequence that has a lysine anticodon, and a 3’ lysine binding region.

macronuclear_chromosome [SO_0000824]

A chromosome originating in a macronucleus.

macronuclear_sequence [SO_0000083]

DNA belonging to the macronuclei of ciliates.

macronucleus_destined_segment [SO_0000672]

A sequence that is conserved, although rearranged relative to the micronucleus, in the macronucleus of a ciliate genome.

major_TSS [SO_0001238]

The tanscription start site that is most frequently used for transcription of a gene.

mannosyl_queuosine [SO_0001320]

Mannosyl_queuosine is a modified 7-deazoguanosine.

Mat2P [SO_0002157]

A gene cassette array containing H+ mating type specific information.

Mat3M [SO_0002158]

A gene cassette array containing H- mating type specific information.

match [SO_0000343]

A region of sequence, aligned to another sequence with some statistical significance, using an algorithm such as BLAST or SIM4.

match_part [SO_0000039]

A part of a match, for example an hsp from blast is a match_part.

maternal_uniparental_disomy [SO_0001745]

Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the mother and no copies of the same chromosome or region from the father.

maternal_variant [SO_0001775]

A variant in the genetic material inherited from the mother.

maternally_imprinted [SO_0000135]

The maternal copy of the gene is modified, rendering it transcriptionally silent.

maternally_imprinted_gene [SO_0000888]

A gene that is maternally_imprinted.

mathematically_defined_repeat [SO_0001642]

A mathematically defined repeat (MDR) is a experimental feature that is determined by querying overlapping oligomers of length k against a database of shotgun sequence data and identifying regions in the query sequence that exceed a statistically determined threshold of repetitiveness. Mathematically defined repeat regions are determined without regard to the biological origin of the repetitive region. The repeat units of a MDR are the overlapping oligomers of size k that were used to for the query. Tools that can annotate mathematically defined repeats include Tallymer (Kurtz et al 2008, BMC Genomics: 517) and RePS (Wang et al, Genome Res 12(5): 824-831.).

mating_type_M_box [SO_0001852]

A promoter element with consensus sequence ACAAT, found in promoters of mating type M-specific genes in fission yeast and bound by the transcription factor Mat1-Mc. Note that this should not be confused with the M-box that has consensus sequence CATGTG and is bound by bHLH transcription factors such as MITF.

mating_type_region [SO_0001789]

A specialized region in the genomes of some yeast and fungi, the genes of which regulate mating type.

mating_type_region_motif [SO_0001999]

DNA motif that is a component of a mating type region.

mating_type_region_replication_fork_barrier [SO_0002021]

A DNA motif that is found in eukaryotic rDNA repeats, and is a site of replication fork pausing. Requested by Midori Harris.

matrix_attachment_site [SO_0000036]

A DNA region that is required for the binding of chromatin to the nuclear matrix.

mature_miRNA_variant [SO_0001620]

A transcript variant located with the sequence of the mature miRNA. EBI term: Within mature miRNA - Located within a microRNA.

mature_protein_region [SO_0000419]

The polypeptide sequence that remains when the cleaved peptide regions have been cleaved from the immature peptide. This term mature peptide, merged with the biosapiens term mature protein region and took that to be the new name. Old def: The coding sequence for the mature or final peptide or protein product following post-translational modification.

mature_protein_region_of_CDS [SO_0002249]

A CDS region corresponding to a mature protein region of a polypeptide. Added as per request from GitHub Issue #484 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/484)

mature_transcript_region [SO_0000834]

A region of a mature transcript. A manufactured term to collect together the parts of a mature transcript and give them an is_a path to the root.

maxicircle [SO_0000742]

A maxicircle is a replicon, part of a kinetoplast, that contains open reading frames and replicates via a rolling circle method.

maxicircle_gene [SO_0000654]

A mitochondrial gene located in a maxicircle.

MCB [SO_0001855]

A promoter element with consensus sequence ACGCGT, bound by the transcription factor complex MBF (MCB-binding factor) and found in promoters of genes expressed during the G1/S transition of the cell cycle.

meiotic_recombination_region [SO_0002155]

A genomic region in which there is an exchange of genetic material as a result of the repair of meiosis-specific double strand breaks that occur during meiotic prophase.

member_of_regulon [SO_1001217]

A gene that is a member of a group of genes that are either regulated or transcribed together.

membrane_peptide_loop [SO_0001076]

Polypeptide region localized within the lipid bilayer where both ends traverse the same membrane.

membrane_structure [SO_0001071]

Arrangement of the polypeptide with respect to the lipid bilayer. Range.

metabolic_island [SO_0000774]

A transmissible element containing genes involved in metabolism, analogous to the pathogenicity islands of gram negative bacteria. Genes for phenolic compound degradation in Pseudomonas putida are found on metabolic islands.

metal_binding_site [SO_0001656]

A binding site that, in the molecule, interacts selectively and non-covalently with metal ions. See GO:0046872 : metal ion binding.

methionine [SO_0001442]

A non-polar, hydorophobic amino acid encoded by the codon ATG. A place holder for a cross product with chebi.

methionine_tRNA_primary_transcript [SO_0000223]

A primary transcript encoding methionyl tRNA (SO:0000266).

methionyl_tRNA [SO_0000266]

A tRNA sequence that has a methionine anticodon, and a 3’ methionine binding region.

methylated_adenine [SO_0000161]

A modified base in which adenine has been methylated.

methylated_cytosine [SO_0000114]

A methylated deoxy-cytosine.

methylated_DNA_base_feature [SO_0000306]

A nucleotide modified by methylation.

methylation_guide_snoRNA [SO_0005841]

A snoRNA that specifies the site of 2’-O-ribose methylation in an RNA molecule by base pairing with a short sequence around the target residue. Has RNA 2’-O-ribose methylation guide activity (GO:0030561).

methylation_guide_snoRNA_primary_transcript [SO_0000580]

A primary transcript encoding a methylation guide small nucleolar RNA.

methylinosine [SO_0001233]

A modified RNA base in which methylhypoxanthine is bound to the ribose ring.

methylwyosine [SO_0001337]

Methylwyosine is a modified guanosine base feature.

microarray_oligo [SO_0000328]

A DNA sequence used experimentally to detect the presence or absence of a complementary nucleic acid.

micronuclear_chromosome [SO_0000825]

A chromosome originating in a micronucleus.

micronuclear_sequence [SO_0000084]

DNA belonging to the micronuclei of a cell.

mini_exon_donor_RNA [SO_0000635]

A primary transcript that donates the spliced leader to other mRNA.

mini_gene [SO_0000815]

By definition, minigenes are short open-reading frames (ORF), usually encoding approximately 9 to 20 amino acids, which are expressed in vivo (as distinct from being synthesized as peptide or protein ex vivo and subsequently injected). The in vivo synthesis confers a distinct advantage: the expressed sequences can enter both antigen presentation pathways, MHC I (inducing CD8+ T- cells, which are usually cytotoxic T-lymphocytes (CTL)) and MHC II (inducing CD4+ T-cells, usually ‘T-helpers’ (Th)); and can encounter B-cells, inducing antibody responses. Three main vector approaches have been used to deliver minigenes: viral vectors, bacterial vectors and plasmid DNA.

minicircle [SO_0000980]

A minicircle is a replicon, part of a kinetoplast, that encodes for guide RNAs.

minicircle_gene [SO_0000975]

A gene found within a minicircle.

minor_TSS [SO_0001239]

A tanscription start site that is not the most frequently used for transcription of a gene.

minus_1_frameshift [SO_0000866]

A frameshift caused by deleting one base.

minus_1_frameshift_variant [SO_0001592]

A sequence variant which causes a disruption of the translational reading frame, by shifting one base ahead.

minus_1_translationally_frameshifted [SO_1001262]

An attribute describing a translational frameshift of -1.

minus_12_signal [SO_0001673]

A conserved region about 12-bp upstream of the start point of bacterial transcription units, involved with sigma factor 54.

minus_2_frameshift [SO_0000867]

A frameshift caused by deleting two bases.

minus_2_frameshift_variant [SO_0001593]

A sequence variant which causes a disruption of the translational reading frame, by shifting two bases forward.

minus_24_signal [SO_0001674]

A conserved region about 24-bp upstream of the start point of bacterial transcription units, involved with sigma factor 54.

minus_35_signal [SO_0000176]

A conserved hexamer about 35-bp upstream of the start point of bacterial transcription units; consensus=TTGACa or TGTTGACA. This region is associated with sigma factor 70. Changed from is_a SO:0000713 DNA_motif to is_a SO:0002312 core_prokaryotic_promoter_element in response to GREEKC Initiative Dave Sant Aug 2020. Changed from is_a SO:0002312 core_prokaryotic_promoter_element back to is_a SO:0000713 DNA_motif to be consistent with minus_12_signal and minus_24_signal on 12 July 2021.

miR_encoding_lncRNA_primary_transcript [SO_0002036]

A lncRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_shRNA_primary_transcript [SO_0002039]

A shRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_snoRNA_primary_transcript [SO_0002034]

A snoRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_tRNA_primary_transcript [SO_0002037]

A tRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_vaultRNA_primary_transcript [SO_0002041]

A vaultRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_Y_RNA_primary_transcript [SO_0002043]

A Y-RNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miRNA_antiguide [SO_0001473]

A region of the pri miRNA that base pairs with the guide to form the hairpin.

miRNA_encoding [SO_0000571]

A region that can be transcribed into a microRNA (miRNA).

miRNA_gene [SO_0001265]

A small noncoding RNA of approximately 22 nucleotides in length which may be involved in regulation of gene expression. Moved from ncRNA_gene to sncRNA_gene 27 April 2021 to be more consistent with the organization of the ncRNA branch of SO. Requested by FlyBase, moved by Dave Sant. See GitHub Issue #514.

miRNA_loop [SO_0001246]

The loop of the hairpin loop formed by folding of the pre-miRNA.

miRNA_primary_transcript [SO_0000647]

A primary transcript encoding a micro RNA.

miRNA_primary_transcript_region [SO_0001243]

A part of an miRNA primary_transcript.

miRNA_stem [SO_0001245]

The stem of the hairpin loop formed by folding of the pre-miRNA.

miRNA_target_site [SO_0000934]

A miRNA target site is a binding site where the molecule is a micro RNA.

miRtron [SO_0001034]

A de-branched intron which mimics the structure of pre-miRNA and enters the miRNA processing pathway without Drosha mediated cleavage. Ruby et al. Nature 448:83 describe a new class of miRNAs that are derived from de-branched introns.

missense_variant [SO_0001583]

A sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved. EBI term: Non-synonymous SNPs. SNPs that are located in the coding sequence and result in an amino acid change in the encoded peptide sequence. A change that causes a non_synonymous_codon can be more than 3 bases - for example 4 base substitution.

MITE [SO_0000338]

A highly repetitive and short (100-500 base pair) transposable element with terminal inverted repeats (TIR) and target site duplication (TSD). MITEs do not encode proteins.

mitochondrial_chromosome [SO_0000819]

A chromosome originating in a mitochondria.

mitochondrial_contig [SO_0001921]

A contig of mitochondria derived sequences. Requested by Bayer Cropscience, October, 2012.

mitochondrial_DNA [SO_0001032]

DNA belonging to the genome of a mitochondria. This terms is used by MO.

mitochondrial_DNA_read [SO_0001929]

A sequencer read of a mitochondrial DNA sample. Requested by Bayer Cropscience, October, 2012.

mitochondrial_sequence [SO_0000737]

DNA belonging to the genome of a mitochondria. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

mitochondrial_supercontig [SO_0001922]

A scaffold composed of mitochondrial contigs.

mitochondrial_targeting_signal [SO_0001808]

A polypeptide region that targets a polypeptide to the mitochondrion.

mitotic_recombination_region [SO_0002154]

A genomic region where there is an exchange of genetic material with another genomic region, occurring in somatic cells.

MNP [SO_0001013]

A multiple nucleotide polymorphism with alleles of common length > 1, for example AAA/TTT.

MNV [SO_0002007]

An MNV is a multiple nucleotide variant (substitution) in which the inserted sequence is the same length as the replaced sequence.

mobile_element_deletion [SO_0002066]

A deletion of a mobile element when comparing a reference sequence (has mobile element) to a individual sequence (does not have mobile element).

mobile_element_insertion [SO_0001837]

A kind of insertion where the inserted sequence is a mobile element. Requested by the EBI.

mobile_genetic_element [SO_0001037]

A nucleotide region with either intra-genome or intracellular mobility, of varying length, which often carry the information necessary for transfer and recombination with the host genome.

mobile_intron [SO_0000666]

An intron (mitochondrial, chloroplast, nuclear or prokaryotic) that encodes a double strand sequence specific endonuclease allowing for mobility.

modified_adenine [SO_0001962]

A modified adenine DNA base feature.

modified_adenosine [SO_0001273]

A modified adenine is an adenine base feature that has been altered.

modified_amino_acid_feature [SO_0001385]

A post translationally modified amino acid feature.

modified_cytidine [SO_0001275]

A modified cytidine is a cytidine base feature which has been altered.

modified_cytosine [SO_0001963]

A modified cytosine DNA base feature.

modified_DNA_base [SO_0000305]

A modified nucleotide, i.e. a nucleotide other than A, T, C. G. Modified base:<modified_base>.

modified_glycine [SO_0001386]

A post translationally modified glycine amino acid feature.

modified_guanine [SO_0001964]

A modified guanine DNA base feature.

modified_guanosine [SO_0001276]

A guanosine base that has been modified.

modified_inosine [SO_0001274]

A modified inosine is an inosine base feature that has been altered.

modified_L_alanine [SO_0001387]

A post translationally modified alanine amino acid feature.

modified_L_arginine [SO_0001406]

A post translationally modified arginine amino acid feature.

modified_L_asparagine [SO_0001388]

A post translationally modified asparagine amino acid feature.

modified_L_aspartic_acid [SO_0001389]

A post translationally modified aspartic acid amino acid feature.

modified_L_cysteine [SO_0001390]

A post translationally modified cysteine amino acid feature.

modified_L_glutamic_acid [SO_0001391]

A post translationally modified glutamic acid.

modified_L_glutamine [SO_0001394]

A post translationally modified glutamine amino acid feature.

modified_L_histidine [SO_0001398]

A post translationally modified histidine amino acid feature.

modified_L_isoleucine [SO_0001396]

A post translationally modified isoleucine amino acid feature.

modified_L_leucine [SO_0001401]

A post translationally modified leucine amino acid feature.

modified_L_lysine [SO_0001400]

A post translationally modified lysine amino acid feature.

modified_L_methionine [SO_0001395]

A post translationally modified methionine amino acid feature.

modified_L_phenylalanine [SO_0001397]

A post translationally modified phenylalanine amino acid feature.

modified_L_proline [SO_0001404]

A post translationally modified proline amino acid feature.

modified_L_selenocysteine [SO_0001402]

A post translationally modified selenocysteine amino acid feature.

modified_L_serine [SO_0001399]

A post translationally modified serine amino acid feature.

modified_L_threonine [SO_0001392]

A post translationally modified threonine amino acid feature.

modified_L_tryptophan [SO_0001393]

A post translationally modified tryptophan amino acid feature.

modified_L_tyrosine [SO_0001405]

A post translationally modified tyrosine amino acid feature.

modified_L_valine [SO_0001403]

A post translationally modified valine amino acid feature.

modified_RNA_base_feature [SO_0000250]

A post_transcriptionally modified base.

modified_uridine [SO_0001277]

A uridine base that has been modified.

molecular_contact_region [SO_0100002]

A region that is involved a contact with another molecule. Range.

monocistronic [SO_0000878]

An attribute describing a sequence that contains the code for one gene product.

monocistronic_mRNA [SO_0000633]

An mRNA with either a single protein product, or for which the regions encoding all its protein products overlap.

monocistronic_primary_transcript [SO_0000632]

A primary transcript encoding for one gene product.

monocistronic_transcript [SO_0000665]

A transcript that is monocistronic.

monomeric_repeat [SO_0001934]

A repeat_region containing repeat_units of 1 bp that is repeated multiple times in tandem.

moR [SO_0002032]

A non-coding transcript encoded by sequences adjacent to the ends of the 5’ and 3’ miR-encoding sequences that abut the loop in precursor miRNA. MoRs are generated from miR hairpins that are longer and can produce two functional miR per strand. They are called moRs because they are not located next to the loop and thus their biogenesis process is a little different, but functionally, they are supposed to act like miRs. It is the same for loRs that are the loop fragments, they are generated differently than miRs or moRs but if loaded into the risc they are supposed to act the same way miRs do. Requested by Thomas Desvignes, Jan 2015.

morpholino_backbone [SO_0001183]

An attribute describing a sequence composed of nucleobases bound to a morpholino backbone. A morpholino backbone consists of morpholine (CHEBI:34856) rings connected by phosphorodiamidate linkages. Do not use this for feature annotation. Use morpholino_oligo (SO:0000034) instead.

morpholino_oligo [SO_0000034]

Morpholino oligos are synthesized from four different Morpholino subunits, each of which contains one of the four genetic bases (A, C, G, T) linked to a 6-membered morpholine ring. Eighteen to 25 subunits of these four subunit types are joined in a specific order by non-ionic phosphorodiamidate intersubunit linkages to give a Morpholino.

mRNA_attribute [SO_0000863]

An attribute describing an mRNA feature.

mRNA_contig [SO_0001829]

A contig composed of mRNA_reads. Requested by Bayer Cropscience June, 2011.

mRNA_read [SO_0001827]

A sequencer read of an mRNA substrate. Requested by Bayer Cropscience June, 2011.

mRNA_recoded_by_codon_redefinition [SO_1001265]

A recoded_mRNA that was modified by an alteration of codon meaning.

mRNA_recoded_by_translational_bypass [SO_1001264]

A recoded_mRNA where translation was suspended at a particular codon and resumed at a particular non-overlapping downstream codon.

mRNA_region [SO_0000836]

A region of an mRNA. This term was added to provide a grouping term for the region parts of mRNA, thus giving them an is_a path back to the root.

mRNA_with_frameshift [SO_0000108]

An mRNA with a frameshift.

mRNA_with_minus_1_frameshift [SO_0000282]

An mRNA with a minus 1 frameshift.

mRNA_with_minus_2_frameshift [SO_0000335]

A mRNA with a minus 2 frameshift.

mRNA_with_plus_1_frameshift [SO_0000321]

An mRNA with a plus 1 frameshift.

mRNA_with_plus_2_frameshift [SO_0000329]

An mRNA with a plus 2 frameshift.

mt_rRNA [SO_0002128]

Mitochondrial rRNA is an RNA component of the small or large subunits of mitochondrial ribosomes. Updated definition to be consistent with format of other rRNA definitions on 10 June 2021. Requested by EBI. See GitHub Issue #493.

mt_tRNA [SO_0002129]

Mitochondrial transfer RNA.

MTE [SO_0001162]

A sequence element characteristic of some RNA polymerase II promoters, usually located between +20 and +30 relative to the TSS. Consensus sequence is CSARCSSAACGS. Tends to co-occur with INR motif (SO:0000014). Tends to not occur with DPE motif (SO:0000015) or DMv5 (SO:0001159).

multiplexing_sequence_identifier [SO_0002023]

A nucleic tag which is used in a ligation step of library preparation process to allow pooling of samples while maintaining ability to identify individual source material and creation of a multiplexed library.

mutated_variant_site [SO_0001148]

Site which has been experimentally altered. Discrete.

N_region [SO_0001835]

Extra nucleotides inserted between rearranged immunoglobulin segments.

n_terminal_region [SO_0100014]

The amino-terminal positively-charged region of a signal peptide (approx 1-5 aa).

N2_2_prime_O_dimethylguanosine [SO_0001329]

N2_2prime_O_dimethylguanosine is a modified guanosine base feature.

N2_7_2prirme_O_trimethylguanosine [SO_0001343]

N2_7_2prirme_O_trimethylguanosine is a modified guanosine base feature.

N2_7_dimethylguanosine [SO_0001338]

N2_7_dimethylguanosine is a modified guanosine base feature.

N2_methylguanosine [SO_0001325]

N2_methylguanosine is a modified guanosine base feature.

N2_N2_2_prime_O_trimethylguanosine [SO_0001330]

N2_N2_2prime_O_trimethylguanosine is a modified guanosine base feature.

N2_N2_7_trimethylguanosine [SO_0001339]

N2_N2_7_trimethylguanosine is a modified guanosine base feature.

N2_N2_dimethylguanosine [SO_0001328]

N2_N2_dimethylguanosine is a modified guanosine base feature.

N4_2_prime_O_dimethylcytidine [SO_0001291]

N4,2’-O-dimethylcytidine is a modified cytidine.

N4_acetyl_2_prime_O_methylcytidine [SO_0001288]

N4-acetyl-2’-O-methylcytidine is a modified cytidine.

N4_acetylcytidine [SO_0001285]

N4-acetylcytidine is a modified cytidine.

N4_methylcytidine [SO_0001290]

N4-methylcytidine is a modified cytidine.

N4_N4_2_prime_O_trimethylcytidine [SO_0001294]

N4_N4_2_prime_O_trimethylcytidine is a modified cytidine.

N6_2_prime_O_dimethyladenosine [SO_0001312]

N6_2prime_O_dimethyladenosine is a modified adenosine.

N6_acetyladenosine [SO_0001315]

N6_acetyladenosine is a modified adenosine.

N6_cis_hydroxyisopentenyl_adenosine [SO_0001302]

N6_cis_hydroxyisopentenyl_adenosine is a modified adenosine.

N6_glycinylcarbamoyladenosine [SO_0001304]

N6_glycinylcarbamoyladenosine is a modified adenosine.

N6_hydroxynorvalylcarbamoyladenosine [SO_0001308]

N6_hydroxynorvalylcarbamoyladenosine is a modified adenosine.

N6_isopentenyladenosine [SO_0001300]

N6_isopentenyladenosine is a modified adenosine.

N6_methyl_N6_threonylcarbamoyladenosine [SO_0001307]

N6_methyl_N6_threonylcarbamoyladenosine is a modified adenosine.

N6_methyladenine [SO_0001920]

An adenine methylated at the 6 nitrogen.

N6_methyladenosine [SO_0001297]

N6_methyladenosine is a modified adenosine.

N6_N6_2_prime_O_trimethyladenosine [SO_0001313]

N6_N6_2prime_O_trimethyladenosine is a modified adenosine.

N6_N6_dimethyladenosine [SO_0001311]

N6_N6_dimethyladenosine is a modified adenosine.

N6_threonylcarbamoyladenosine [SO_0001305]

N6_threonylcarbamoyladenosine is a modified adenosine.

natural [SO_0000782]

An attribute describing a feature that occurs in nature.

natural_plasmid [SO_0001476]

A plasmid that occurs naturally.

natural_transposable_element [SO_0000797]

TE that exists (or existed) in nature.

natural_variant_site [SO_0001147]

Describes the natural sequence variants due to polymorphisms, disease-associated mutations, RNA editing and variations between strains, isolates or cultivars. Discrete.

nc_conserved_region [SO_0000334]

Non-coding region of sequence similarity by descent from a common ancestor.

nc_primary_transcript [SO_0000483]

A primary transcript that is never translated into a protein.

ncRNA_gene [SO_0001263]

A gene that encodes a non-coding RNA.

NDM2_motif [SO_0001167]

A non directional promoter motif with consensus CGMYGYCR.

NDM3_motif [SO_0001168]

A non directional promoter motif with consensus sequence GAAAGCT.

negative_sense_ssRNA_viral_sequence [SO_0001200]

A negative_sense_RNA_viral_sequence is a ss_RNA_viral_sequence that is the sequence of a single stranded RNA virus that is complementary to mRNA and must be converted to positive sense RNA by RNA polymerase before translation.

negatively_autoregulated [SO_0000473]

The gene product is involved in its own transcriptional regulation where it decreases transcription.

negatively_autoregulated_gene [SO_0000891]

A gene that is negatively autoreguated.

nested_repeat [SO_0001649]

A repeat that is disrupted by the insertion of another element.

nested_tandem_repeat [SO_0001658]

An NTR is a nested repeat of two distinct tandem motifs interspersed with each other. Tracker ID: 3052459.

nested_transposon [SO_0001648]

A transposon that is disrupted by the insertion of another element.

NMD_polymorphic_pseudogene_transcript [SO_0002118]

A polymorphic pseudogene transcript that contains a CDS but has one or more splice junctions >50bp downstream of stop codon. Premature stop codon is not introduced, directly or indirectly, as a result of the variation i.e. must be present in both protein_coding and pseudogenic alleles. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

NMD_transcript [SO_0002114]

A protein coding transcript that contains a CDS but has one or more splice junctions >50bp downstream of stop codon, making it susceptible to nonsense mediated decay. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

NMD_transcript_variant [SO_0001621]

A variant in a transcript that is the target of nonsense-mediated mRNA decay.

no_output [SO_0100010]

An experimental region wherean analysis has been run and not produced any annotation.

no_sequence_alteration [SO_0002073]

A position or feature within a sequence that is identical to the comparable position or feature of a specified reference sequence. This term is requested by the ClinVar data model group for use in the allele registry and such. A sequence at a defined location that is defined to match the reference assembly.

non_adjacent_residues [SO_0001083]

Indicates that two consecutive residues in a fragment sequence are not consecutive in the full-length protein and that there are a number of unsequenced residues between them.

non_allelic_homologous_recombination_region [SO_0002094]

A genomic region at a non-allelic position where exchange of genetic material happens as a result of homologous recombination.

non_AUG_initiated_uORF [SO_0002151]

A uORF beginning with a codon other than AUG.

non_canonical_five_prime_splice_site [SO_0000679]

A 5’ splice site which does not have the sequence “GT”.

non_canonical_start_codon [SO_0000680]

A start codon that is not the usual AUG sequence.

non_canonical_three_prime_splice_site [SO_0000678]

A 3’ splice site that does not have the sequence “AG”.

non_coding_transcript_exon_variant [SO_0001792]

A sequence variant that changes non-coding exon sequence in a non-coding transcript.

non_coding_transcript_intron_variant [SO_0001970]

A transcript variant occurring within an intron of a non coding transcript.

non_coding_transcript_splice_region_variant [SO_0002088]

A transcript variant occurring within the splice region (1-3 bases of the exon or 3-8 bases of the intron) of a non coding transcript.

non_coding_transcript_variant [SO_0001619]

A transcript variant of a non coding RNA gene. Within non-coding gene - Located within a gene that does not code for a protein.

non_conservative_amino_acid_substitution [SO_0001608]

A sequence variant of a codon causing the substitution of a non conservative amino acid for another in the resulting polypeptide.

non_conservative_missense_variant [SO_0001586]

A sequence variant whereby at least one base of a codon is changed resulting in a codon that encodes for an amino acid with different biochemical properties.

non_cytoplasmic_polypeptide_region [SO_0001074]

Polypeptide region that is localized outside of a lipid bilayer and outside of the cytoplasm. This could be inside an organelle within the cell.

non_LTR_retrotransposon [SO_0000189]

A retrotransposon without long terminal repeat sequences.

non_LTR_retrotransposon_polymeric_tract [SO_0000433]

A polymeric tract, such as poly(dA), within a non_LTR_retrotransposon.

non_processed_pseudogene [SO_0001760]

A pseudogene that arose from a means other than retrotransposition. A pseudogene created via genomic duplication of a functional protein-coding parent gene followed by accumulation of deleterious mutations.

non_protein_coding [SO_0000011]

A gene which can be transcribed, but will not be translated into a protein.

non_synonymous [SO_0001816]

A variant that leads to the change of an amino acid within the protein.

non_terminal_residue [SO_0001084]

The residue at an extremity of the sequence is not the terminal residue. Discrete.

non_transcribed_region [SO_0000183]

A region of the gene which is not transcribed.

nonamer_of_recombination_feature_of_vertebrate_immune_system_gene [SO_0000562]

Nine nucleotide recombination site, part of V-gene, D-gene or J-gene recombination feature of an immunoglobulin or T-cell receptor gene.

noncoding_exon [SO_0000198]

An exon that does not contain any codons.

noncoding_region_of_exon [SO_0001214]

The maximal intersection of exon and UTR. An exon either containing but not starting with a start codon or containing but not ending with a stop codon will be partially coding and partially non coding.

noncontiguous_finished [SO_0001490]

The status of a whole genome sequence, where the assembly is high quality, closure approaches have been successful for most gaps, misassemblies and low quality regions.

nonsynonymous_variant [SO_0001992]

A non-synonymous variant is an inframe, protein altering variant, resulting in a codon change.

novel_sequence_insertion [SO_0001838]

An insertion the sequence of which cannot be mapped to the reference genome. Requested by the NCBI.

NSD_transcript [SO_0002130]

A transcript that contains a CDS but has no stop codon before the polyA site is reached.

nuclear_chromosome [SO_0000828]

A chromosome originating in a nucleus.

nuclear_export_signal [SO_0001531]

A polypeptide region that targets a polypeptide to he cytoplasm.

nuclear_localization_signal [SO_0001528]

A polypeptide region that targets a polypeptide to the nucleus.

nuclear_mt_pseudogene [SO_0001044]

A nuclear pseudogene of either coding or non-coding mitochondria derived sequence. Definition change requested by Val, 3172757.

nuclear_rim_localization_signal [SO_0001534]

A polypeptide region that targets a polypeptide to the nuclear rim.

nuclear_sequence [SO_0000738]

DNA belonging to the nuclear genome of cell. Moved from is_a SO:0000736 (organelle_sequence) when brought to our attention by GitHub issue #489.

nuclease_binding_site [SO_0000059]

A binding site that, of a nucleotide molecule, that interacts selectively and non-covalently with polypeptide residues of a nuclease.

nuclease_hypersensitive_site [SO_0000322]

A region of nucleotide sequence targeted by a nuclease enzyme that is found cleaved more than would be expected by chance. Relationship to accessible_DNA_region added 11 Feb 2021. GREEKC pointed out that this is an assay based term, but we need a biological term for the accessible DNA. See GitHub Issue #531.

nuclease_sensitive_site [SO_0000684]

A region of nucleotide sequence targeted by a nuclease enzyme.

nucleic_acid [SO_0000348]

An attribute describing a sequence consisting of nucleobases bound to repeating units. The forms found in nature are deoxyribonucleic acid (DNA), where the repeating units are 2-deoxy-D-ribose rings connected to a phosphate backbone, and ribonucleic acid (RNA), where the repeating units are D-ribose rings connected to a phosphate backbone.

nucleomorph_gene [SO_0000097]

A gene from nucleomorph sequence.

nucleomorphic_chromosome [SO_0000829]

A chromosome originating in a nucleomorph.

nucleomorphic_sequence [SO_0000739]

DNA belonging to the genome of a plastid such as a chloroplast. The nucleomorph is the nuclei of the plastic.

nucleotide_binding_site [SO_0001655]

A binding site that, in the molecule, interacts selectively and non-covalently with nucleotide residues. See GO:0000166 : nucleotide binding.

nucleotide_cleavage_site [SO_0002204]

A point in nucleic acid where a cleavage event occurs.

nucleotide_match [SO_0000347]

A match against a nucleotide sequence.

nucleotide_motif [SO_0000714]

A region of nucleotide sequence corresponding to a known motif.

nucleotide_to_protein_binding_site [SO_0001654]

A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues.

null_mutation [SO_0002055]

A variant whereby the gene product is not functional or the gene product is not produced.

octamer_motif [SO_0001258]

A sequence element characteristic of some RNA polymerase II promoters with sequence ATTGCAT that binds Pou-domain transcription factors. Nature. 1986 Oct 16-22;323(6089):640-3.

Okazaki_fragment [SO_0001985]

Any of the DNA segments produced by discontinuous synthesis of the lagging strand during DNA replication. Requested by Midori Harris, 2013.

oligo_U_tail [SO_0000609]

The string of non-encoded U’s at the 3’ end of a guide RNA (SO:0000602).

one_methyl_three_three_amino_three_carboxypropyl_pseudouridine [SO_0001373]

1_methyl_3_3_amino_3_carboxypropyl_pseudouridine is a modified uridine base feature.

one_methyladenosine [SO_0001295]

1_methyladenosine is a modified adenosine.

one_methylguanosine [SO_0001324]

1_methylguanosine is a modified guanosine base feature.

one_methylinosine [SO_0001278]

1-methylinosine is a modified inosine.

one_methylpseudouridine [SO_0001347]

1_methylpseudouridine is a modified uridine base feature.

one_two_prime_O_dimethyladenosine [SO_0001314]

1,2’-O-dimethyladenosine is a modified adenosine.

one_two_prime_O_dimethylguanosine [SO_0001340]

1_2prime_O_dimethylguanosine is a modified guanosine base feature.

one_two_prime_O_dimethylinosine [SO_0001279]

1,2’-O-dimethylinosine is a modified inosine.

open_chromatin_region [SO_0001747]

A DNA sequence that in the normal state of the chromosome corresponds to an unfolded, un-complexed stretch of double-stranded DNA. Requested by John Calley 3125900.

operon_member [SO_0000080]

A gene that is a member of an operon, which is a set of genes transcribed together as a unit.

ORF [SO_0000236]

The in-frame interval between the stop codons of a reading frame which when read as sequential triplets, has the potential of encoding a sequential string of amino acids. TER(NNN)nTER. The definition was modified by Rama. ORF is defined by the sequence, whereas the CDS is defined according to whether a polypeptide is made. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

organelle_sequence [SO_0000736]

A sequence of DNA that originates from a an organelle.

oriC [SO_0000953]

An origin of bacterial chromosome replication.

oriV [SO_0000952]

An origin of vegetative replication in plasmids and phages.

orphan [SO_0000910]

A gene whose predicted amino acid sequence is unsupported by any experimental evidence or by any match with any other known sequence.

orphan_CDS [SO_1001247]

A CDS whose predicted amino acid sequence is unsupported by any experimental evidence or by any match with any other known sequence.

orthologous [SO_0000858]

An attribute describing a kind of homology where divergence occurred after a speciation event.

outron [SO_0001475]

A region of a primary transcript, that is removed via trans splicing.

overlapping [SO_0000068]

An attribute describing a gene that has a sequence that overlaps the sequence of another gene.

overlapping_EST_set [SO_0001262]

A continous experimental result region extending the length of multiple overlapping EST’s.

overlapping_feature_set [SO_0001261]

A continuous region of sequence composed of the overlapping of multiple sequence_features, which ultimately provides evidence for another sequence_feature. This feature was requested by Nicole, tracker id 1911479. It is required to gather evidence together for annotation. An example would be overlapping ESTs that support an mRNA.

P_TIR_transposon [SO_0001535]

A P-element is a DNA transposon responsible for hybrid dysgenesis. P elements in this terminal inverted repeat (TIR) transposon superfamily have 31 bp perfect TIR and upon insertion duplicate an 8 bp sequence. It contains transposase that may lack the DDE domain. Moved from under DNA_transposon (SO:0000182) by Dave Sant as per request from GitHub issue #488 on June 25, 2020

PAC [SO_0000154]

The P1-derived artificial chromosome are DNA constructs that are derived from the DNA of P1 bacteriophage. They can carry large amounts (about 100-300 kilobases) of other sequences for a variety of bioengineering purposes. It is one type of vector used to clone DNA fragments (100- to 300-kb insert size; average, 150 kb) in Escherichia coli cells. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. Drosophila melanogaster PACs carry an average insert size of 80 kb. The library represents a 6-fold coverage of the genome.

PAC_end [SO_0001480]

A region of sequence from the end of a PAC clone that may provide a highly specific marker.

paired_end_fragment [SO_0001790]

An assembly region that has been sequenced from both ends resulting in a read_pair (mate_pair).

paracentric [SO_0001519]

An inversion event that does not include the centromere.

paracentric_inversion [SO_1000047]

A chromosomal inversion that does not include the centromere.

parallel_beta_strand [SO_0001113]

A peptide region which hydrogen bonded to another region of peptide running in the oposite direction (both running N-terminal to C-terminal). This orientation is slightly less stable because it introduces nonplanarity in the inter-strand hydrogen bonding pattern. Hydrogen bonding occurs between every other C=O from one strand to every other N-H on the adjacent strand. In this case, if two atoms C-alpha (i)and C-alpha (j) are adjacent in two hydrogen-bonded beta strands, then they do not hydrogen bond to each other; rather, one residue forms hydrogen bonds to the residues that flank the other (but not vice versa). For example, residue i may form hydrogen bonds to residues j - 1 and j + 1; this is known as a wide pair of hydrogen bonds. By contrast, residue j may hydrogen-bond to different residues altogether, or to none at all. The dihedral angles (phi, psi) are about (-120 degrees, 115 degrees) in parallel sheets. Range.

paralogous [SO_0000859]

An attribute describing a kind of homology where divergence occurred after a duplication event.

partial_genomic_sequence_assembly [SO_0001876]

A partial DNA sequence assembly of a chromosome or full genome, which contains gaps that are filled with N’s. Requested by Bayer Cropscience January, 2012.

partially_characterized_chromosomal_mutation [SO_1000175]

A chromosome structure variant that has not been characterized fully.

partially_processed_cDNA_clone [SO_0000813]

A cDNA invalidated clone by partial processing.

paternal_uniparental_disomy [SO_0001746]

Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the father and no copies of the same chromosome or region from the mother.

paternal_variant [SO_0001776]

A variant in the genetic material inherited from the father.

paternally_imprinted [SO_0000136]

The paternal copy of the gene is modified, rendering it transcriptionally silent.

paternally_imprinted_gene [SO_0000889]

A gene that is paternally imprinted.

pathogenic_island [SO_0000773]

Mobile genetic elements that contribute to rapid changes in virulence potential. They are present on the genomes of pathogenic strains but absent from the genomes of non pathogenic members of the same or related species. Nature Reviews Microbiology 2, 414-424 (2004); doi:10.1038 micro 884 GENOMIC ISLANDS IN PATHOGENIC AND ENVIRONMENTAL MICROORGANISMS Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jorg Hacker.

PCB [SO_0001871]

A promoter element with consensus sequence GNAACR, bound by the transcription factor complex PBF (PCB-binding factor) and found in promoters of genes expressed during the M/G1 transition of the cell cycle.

PCR_product [SO_0000006]

A region amplified by a PCR reaction. This term is mapped to MGED. This term is now located in OBI, with the following ID OBI_0000406.

pedigree_specific_variant [SO_0001779]

A variant that is found only by individuals that belong to the same pedigree.

peptide_coil [SO_0100012]

Irregular, unstructured regions of a protein’s backbone, as distinct from the regular region (namely alpha helix and beta strand - characterised by specific patterns of main-chain hydrogen bonds).

peptide_collection [SO_0001501]

A collection of peptide sequences. Term requested via tracker ID: 2910829.

peptide_helix [SO_0001114]

A helix is a secondary_structure conformation where the peptide backbone forms a coil. Range.

peptide_localization_signal [SO_0001527]

A region of peptide sequence used to target the polypeptide molecule to a specific organelle.

peptidyl [SO_0001407]

An attribute describing the nature of a proteinaceous polymer, where by the amino acid units are joined by peptide bonds.

pericentric [SO_0001518]

An inversion event that includes the centromere.

pericentric_inversion [SO_1000046]

A chromosomal inversion that includes the centromere.

peroxywybutosine [SO_0001333]

Peroxywybutosine is a modified guanosine base feature.

Phage_RNA_Polymerase_Promoter [SO_0001204]

A region (DNA) to which Bacteriophage RNA polymerase binds, to begin transcription. former parent RNA_polymerase_promoter SO:0001203 was merged with promoter SO:0000167 in Aug 2020 as part of GREEKC.

phenylalanine [SO_0001441]

A non-polar, hydorophobic amino acid encoded by the codons TTT and TTC. A place holder for a cross product with chebi.

phenylalanine_tRNA_primary_transcript [SO_0000224]

A primary transcript encoding phenylalanyl tRNA (SO:0000267).

phenylalanyl_tRNA [SO_0000267]

A tRNA sequence that has a phenylalanine anticodon, and a 3’ phenylalanine binding region.

pheromone_response_element [SO_0002045]

A PRE is a (yeast) TFBS with consensus site [TGAAAC(A/G)]. Requested by Rama, SGD.

phosphorylation_site [SO_0001811]

A post-translationally modified region in which residues of the protein are modified by phosphorylation.

PIP_box [SO_0001810]

A polypeptide region that mediates binding to PCNA. The consensus sequence is QXX(hh)XX(aa), where (h) denotes residues with moderately hydrophobic side chains and (a) denotes residues with highly hydrophobic aromatic side chains.

piRNA_gene [SO_0001638]

A gene that encodes for an piwi associated RNA. Moved from ncRNA_gene to sncRNA_gene 27 April 2021 to be more consistent with the organization of the ncRNA branch of SO. Requested by FlyBase, moved by Dave Sant. See GitHub Issue #514.

plasmid [SO_0000155]

A self replicating, using the hosts cellular machinery, often circular nucleic acid molecule that is distinct from a chromosome in the organism. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

plasmid_gene [SO_0000098]

A gene from plasmid sequence.

plasmid_location [SO_0000749]

The location of DNA that has come from a plasmid sequence.

plastid_gene [SO_0000090]

A gene from plastid sequence.

plastid_sequence [SO_0000740]

DNA belonging to the genome of a plastid such as a chloroplast.

plus_1_frameshift [SO_0000868]

A frameshift caused by inserting one base.

plus_1_frameshift_variant [SO_0001594]

A sequence variant which causes a disruption of the translational reading frame, by shifting one base backward.

plus_1_translational_frameshift [SO_0001211]

The region of mRNA 1 base long that is skipped during the process of translational frameshifting (GO:0006452), causing the reading frame to be different.

plus_1_translationally_frameshifted [SO_1001263]

An attribute describing a translational frameshift of +1.

plus_2_frameshift_variant [SO_0001595]

A sequence variant which causes a disruption of the translational reading frame, by shifting two bases backward.

plus_2_framshift [SO_0000869]

A frameshift caused by inserting two bases.

plus_2_translational_frameshift [SO_0001212]

The region of mRNA 2 bases long that is skipped during the process of translational frameshifting (GO:0006452), causing the reading frame to be different.

PNA [SO_0001184]

An attribute describing a sequence composed of peptide nucleic acid (CHEBI:48021), a chemical consisting of nucleobases bound to a backbone composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. The purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds. Do not use this term for feature annotation. Use PNA_oligo (SO:0001011) instead.

PNA_oligo [SO_0001011]

Peptide nucleic acid, is a chemical not known to occur naturally but is artificially synthesized and used in some biological research and medical treatments. The PNA backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. The purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds.

point_centromere [SO_0001794]

A point centromere is a relatively small centromere (about 125 bp DNA) in discrete sequence, found in some yeast including S. cerevisiae.

point_mutation [SO_1000008]

A single nucleotide change which has occurred at the same position of a corresponding nucleotide in a reference sequence.

polinton [SO_0001170]

A kind of DNA transposon that populates the genomes of protists, fungi, and animals, characterized by a unique set of proteins necessary for their transposition, including a protein-primed DNA polymerase B, retroviral integrase, cysteine protease, and ATPase. Polintons are characterized by 6-bp target site duplications, terminal-inverted repeats that are several hundred nucleotides long, and 5’-AG and TC-3’ termini. Polintons exist as autonomous and nonautonomous elements.

polyA_primed_cDNA_clone [SO_0000812]

A cDNA clone invalidated by polyA priming.

polyA_sequence [SO_0000610]

Sequence of about 100 nucleotides of A added to the 3’ end of most eukaryotic mRNAs.

polyA_signal_sequence [SO_0000551]

The recognition sequence necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

polyA_site [SO_0000553]

The site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation. The boundary between the UTR and the polyA sequence.

polyadenylated [SO_0000246]

A attribute describing the addition of a poly A tail to the 3’ end of a mRNA molecule.

polyadenylated_mRNA [SO_0000871]

An mRNA that is polyadenylated.

polyadenylation_variant [SO_0001545]

A sequence variant that changes polyadenylation with respect to a reference sequence.

polycistronic [SO_0000880]

An attribute describing a sequence that contains the code for more than one gene product.

polycistronic_primary_transcript [SO_0000631]

A primary transcript encoding for more than one gene product.

polycistronic_transcript [SO_0000078]

A transcript that is polycistronic.

polymer_attribute [SO_0000443]

An attribute to describe the kind of biological sequence.

polymerase_synthesis_read [SO_0001426]

A read produced by the polymerase based sequence by synthesis method. An example is a read produced by Illumina technology.

polymorphic_pseudogene [SO_0001841]

A pseudogene in the reference genome, though known to be intact in the genomes of other individuals of the same species. The annotation process has confirmed that the pseudogenisation event is not a genomic sequencing error. This terms is used by Ensembl and Vega. Pseudogene owing to a SNP/DIP but in other individuals/haplotypes/strains the gene is translated.

polymorphic_pseudogene_processed_transcript [SO_0002116]

A processed transcript that does not contain a CDS that fullfills annotation criteria and not necessarily functionally non-coding. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

polymorphic_pseudogene_with_retained_intron [SO_0002110]

A polymorphic pseudogene in the reference genome, containing a retained intron, known to be intact in the genomes of other individuals of the same species. The annotation process has confirmed that the pseudogenisation event is not a genomic sequencing error. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

polymorphic_sequence_variant [SO_0001025]

A sequence variant that is segregating in one or more natural populations of a species.

polymorphic_variant [SO_0001766]

A variant that affects one of several possible alleles at that location, such as the major histocompatibility complex (MHC) genes.

polypeptide [SO_0000104]

A sequence of amino acids linked by peptide bonds which may lack appreciable tertiary structure and may not be liable to irreversible denaturation. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. The term ‘protein’ was merged with ‘polypeptide’. Although ‘protein’ was a sequence_attribute and therefore meant to describe the quality rather than an actual feature, it was being used erroneously. It is replaced by ‘peptidyl’ as the polymer attribute.

polypeptide_binding_motif [SO_0100018]

A polypeptide binding motif is a short (up to 20 amino acids) polypeptide region of biological interest that contains one or more amino acids experimentally shown to bind to a ligand.

polypeptide_calcium_ion_contact_site [SO_0001094]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with calcium ions. Residue involved in contact with calcium.

polypeptide_catalytic_motif [SO_0100019]

A polypeptide catalytic motif is a short (up to 20 amino acids) polypeptide region that contains one or more active site residues.

polypeptide_cobalt_ion_contact_site [SO_0001095]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with cobalt ions.

polypeptide_conserved_motif [SO_0100017]

A conserved motif is a short (up to 20 amino acids) region of biological interest that is conserved in different proteins. They may or may not have functional or structural significance within the proteins in which they are found.

polypeptide_conserved_region [SO_0100021]

A subsection of sequence with biological interest that is conserved in different proteins. They may or may not have functional or structural significance within the proteins in which they are found.

polypeptide_copper_ion_contact_site [SO_0001096]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with copper ions.

polypeptide_DNA_contact [SO_0100020]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with DNA.

polypeptide_domain [SO_0000417]

A structurally or functionally defined protein region. In proteins with multiple domains, the combination of the domains determines the function of the protein. A region which has been shown to recur throughout evolution. Range. Old definition from before biosapiens: A region of a single polypeptide chain that folds into an independent unit and exhibits biological activity. A polypeptide chain may have multiple domains.

polypeptide_function_variant [SO_0001554]

A sequence variant which changes polypeptide functioning with respect to a reference sequence.

polypeptide_fusion [SO_0001616]

A sequence variant that causes a fusion of two polypeptide sequences.

polypeptide_gain_of_function_variant [SO_0001557]

A sequence variant which causes gain of polypeptide function with respect to a reference sequence.

polypeptide_iron_ion_contact_site [SO_0001097]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with iron ions.

polypeptide_ligand_contact [SO_0001105]

Residues which interact with a ligand.

polypeptide_localization_variant [SO_0001558]

A sequence variant which changes the localization of a polypeptide with respect to a reference sequence.

polypeptide_loss_of_function_variant [SO_0001559]

A sequence variant that causes the loss of a polypeptide function with respect to a reference sequence.

polypeptide_magnesium_ion_contact_site [SO_0001098]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with magnesium ions.

polypeptide_manganese_ion_contact_site [SO_0001099]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with manganese ions.

polypeptide_metal_contact [SO_0001092]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with metal ions. Residue is part of a binding site for a metal ion.

polypeptide_molybdenum_ion_contact_site [SO_0001100]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with molybdenum ions.

polypeptide_motif [SO_0001067]

A sequence motif is a short (up to 20 amino acids) region of biological interest. Such motifs, although they are too short to constitute functional domains, share sequence similarities and are conserved in different proteins. They display a common function (protein-binding, subcellular location etc.). Range.

polypeptide_nest_left_right_motif [SO_0001121]

A motif of two consecutive residues with dihedral angles: Residue(i): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees. Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.

polypeptide_nest_motif [SO_0001120]

A motif of two consecutive residues with dihedral angles. Nest should not have Proline as any residue. Nests frequently occur as parts of other motifs such as Schellman loops.

polypeptide_nest_right_left_motif [SO_0001122]

A motif of two consecutive residues with dihedral angles: Residue(i): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees. Residue(i+1): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.

polypeptide_nickel_ion_contact_site [SO_0001101]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with nickel ions.

polypeptide_partial_loss_of_function [SO_0001561]

A sequence variant that causes some but not all loss of polypeptide function with respect to a reference sequence.

polypeptide_post_translational_processing_variant [SO_0001562]

A sequence variant that causes a change in post translational processing of the peptide with respect to a reference sequence.

polypeptide_region [SO_0000839]

Biological sequence region that can be assigned to a specific subsequence of a polypeptide. Added to allow the polypeptide regions to have is_a paths back to the root.

polypeptide_repeat [SO_0001068]

A polypeptide_repeat is a single copy of an internal sequence repetition. Range.

polypeptide_secondary_structure [SO_0001078]

A region of peptide with secondary structure has hydrogen bonding along the peptide chain that causes a defined conformation of the chain. Biosapien term was secondary_structure.

polypeptide_sequence_variant [SO_0001603]

A sequence variant with in the CDS that causes a change in the resulting polypeptide sequence.

polypeptide_sequencing_information [SO_0001082]

Incompatibility in the sequence due to some experimental problem. Range.

polypeptide_structural_region [SO_0001070]

Region of polypeptide with a given structural property. Range.

polypeptide_truncation [SO_0001617]

A sequence variant of the CD that causes a truncation of the resulting polypeptide.

polypeptide_tungsten_ion_contact_site [SO_0001102]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with tungsten ions.

polypeptide_turn_motif [SO_0001128]

A reversal in the direction of the backbone of a protein that is stabilized by hydrogen bond between backbone NH and CO groups, involving no more than 4 amino acid residues. Range.

polypeptide_variation_site [SO_0001146]

A site of sequence variation (alteration). Alternative sequence due to naturally occurring events such as polymorphisms and alternative splicing or experimental methods such as site directed mutagenesis. For example, was a substitution natural or mutated as part of an experiment? This term is added to merge the biosapiens term sequence_variations.

polypeptide_zinc_ion_contact_site [SO_0001103]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with zinc ions.

polypyrimidine_tract [SO_0000612]

The polypyrimidine tract is one of the cis-acting sequence elements directing intron removal in pre-mRNA splicing.

population_specific_variant [SO_0001780]

A variant found within only speficic populations.

positional_candidate_gene [SO_0001868]

A candidate gene whose association with a trait is based on the gene’s location on a chromosome. Requested by Bayer Cropscience December, 2011.

positive_sense_ssRNA_viral_sequence [SO_0001201]

A positive_sense_RNA_viral_sequence is a ss_RNA_viral_sequence that is the sequence of a single stranded RNA virus that can be immediately translated by the host.

positively_autoregulated [SO_0000475]

The gene product is involved in its own transcriptional regulation, where it increases transcription.

positively_autoregulated_gene [SO_0000892]

A gene that is positively autoregulated.

possible_assembly_error [SO_0000702]

A region of sequence where there may have been an error in the assembly.

possible_base_call_error [SO_0000701]

A region of sequence where the validity of the base calling is questionable.

post_translationally_regulated [SO_0000130]

An attribute describing a gene that is regulated after it has been translated.

post_translationally_regulated_by_protein_modification [SO_0000469]

An attribute describing a gene sequence where the resulting protein is modified to regulate it.

post_translationally_regulated_by_protein_stability [SO_0000467]

An attribute describing a gene sequence where the resulting protein is regulated by the stability of the resulting protein.

post_translationally_regulated_gene [SO_0000890]

A gene that is post translationally regulated.

pre_edited_mRNA [SO_0000932]

A primary transcript that, at least in part, encodes one or more proteins that has not been edited.

pre_edited_region [SO_0000583]

The region of a transcript that will be edited.

pre_miRNA [SO_0001244]

The 60-70 nucleotide region remain after Drosha processing of the primary transcript, that folds back upon itself to form a hairpin structure.

predicted_by_ab_initio_computation [SO_0000911]

An attribute describing a feature that is predicted by a computer program that did not rely on sequence similarity.

predicted_gene [SO_0000996]

A region of the genome that has been predicted to be a gene but has not been confirmed by laboratory experiments. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

predicted_transcript [SO_0002138]

A transcript feature that has been predicted but is not yet validated.

primary_transcript [SO_0000185]

A transcript that in its initial state requires modification to be functional.

primary_transcript_region [SO_0000835]

A part of a primary transcript. This term was added to provide a grouping term for the region parts of primary_transcript, thus giving them an is_a path back to the root.

primer [SO_0000112]

An oligo to which new deoxyribonucleotides can be added by DNA polymerase.

primer_match [SO_0001472]

A nucleotide match to a primer sequence.

priRNA [SO_0002022]

A small RNA molecule, 22-23 nt in size, that is the product of a longer RNA. The production of priRNAs is independent of dicer and involves binding of RNA by argonaute and trimming by triman. In fission yeast, priRNAs trigger the establishment of heterochromatin. PriRNAs are primarily generated from centromeric transcripts (dg and dh repeats), but may also be produced from degradation products of primary transcripts.

processed_pseudogene [SO_0000043]

A pseudogene created via retrotranposition of the mRNA of a functional protein-coding parent gene followed by accumulation of deleterious mutations lacking introns and promoters, often including a polyA tail. Please not the synonym R psi M uses the spelled out form of the greek letter.

processed_transcript [SO_0001503]

A transcript for which no open reading frame has been identified and for which no other function has been determined. Ensembl and Vega also use this term name. Requested by Howard Deen of MGI.

prokaryotic_promoter [SO_0002222]

A regulatory_region essential for the specific initiation of transcription at a defined location in a DNA molecule, although this location might not be one single base. It is recognized by a specific RNA polymerase(RNAP)-holoenzyme, and this recognition is not necessarily autonomous.

proline [SO_0001439]

A non-polar, hydorophobic amino acid encoded by the codons CCN (CCT, CCC, CCA and CCG). A place holder for a cross product with chebi.

proline_tRNA_primary_transcript [SO_0000225]

A primary transcript encoding prolyl tRNA (SO:0000268).

prolyl_tRNA [SO_0000268]

A tRNA sequence that has a proline anticodon, and a 3’ proline binding region.

promoter_element [SO_0001659]

An element that can exist within the promoter region of a gene. Mmoved from is_a: SO:0001055 transcriptional_cis_regulatory_region as per request from GREEKC initiative in August 2020.

promoter_flanking_region [SO_0001952]

A region immediately adjacent to a promoter which may or may not contain transcription factor binding sites.

promoter_trap_construct [SO_0001478]

A construct which is designed to integrate into a genome and express a reporter when inserted in close proximity to a promoter element. Promoter traps typically do not contain promoter elements and are mutagenic.

propeptide [SO_0001062]

Part of a peptide chain which is cleaved off during the formation of the mature protein. Range.

propeptide_cleavage_site [SO_0001061]

The propeptide_cleavage_site is the arginine/lysine boundary on a propeptide where cleavage occurs. Discrete.

propeptide_region_of_CDS [SO_0002250]

A CDS region corresponding to a propeptide of a polypeptide. Added as per request from GitHub Issue #484 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/484)

prophage [SO_0001006]

A phage genome after it has established in the host genome in a latent/immune state either as a plasmid or as an integrated “island”.

proplastid_gene [SO_0000096]

A gene from proplastid sequence.

proplastid_sequence [SO_0000748]

DNA belonging to the genome of a proplastid such as an immature chloroplast.

protease_site [SO_0001956]

A polypeptide_region that codes for a protease cleavage site.

protein_altering_variant [SO_0001818]

A sequence_variant which is predicted to change the protein encoded in the coding sequence.

protein_binding_site [SO_0000410]

A binding site that, in the molecule, interacts selectively and non-covalently with polypeptide molecules. See GO:0042277 : peptide binding.

protein_coding [SO_0000010]

A gene which, when transcribed, can be translated into a protein.

protein_coding_gene [SO_0001217]

A gene that codes for an RNA that can be translated into a protein.

protein_coding_primary_transcript [SO_0000120]

A primary transcript that, at least in part, encodes one or more proteins. May contain introns.

protein_hmm_match [SO_0001831]

A match to a protein HMM such as pfam.

protein_match [SO_0000349]

A match against a protein sequence.

protein_protein_contact [SO_0001093]

A binding site that, in the protein molecule, interacts selectively and non-covalently with polypeptide residues.

protein_stability_element [SO_0001955]

A polypeptide region that proves structure in a protein that affects the stability of the protein.

proviral_gene [SO_0000099]

A gene from proviral sequence.

proviral_location [SO_0000751]

The location of DNA that has come from a viral origin.

proviral_region [SO_0000113]

A viral sequence which has integrated into a host genome.

proximal_promoter_element [SO_0001668]

DNA segment that ranges from about -250 to -40 relative to +1 of RNA transcription start site, where sequence specific DNA-binding transcription factors binds, such as Sp1, CTF (CCAAT-binding transcription factor), and CBF (CCAAT-box binding factor).

PSE_motif [SO_0000017]

A sequence element characteristic of the promoters of snRNA genes transcribed by RNA polymerase II or by RNA polymerase III. Located between -45 and -60 relative to the TSS. The human PSE_motif consensus sequence is TCACCNTNA(C|G)TNAAAAG(T|G). The basal transcription factor, snRNA-activating protein complex (SNAPc), binds the PSE_motif and is required for the transcription of both RNA polymerase II and III transcribed small-nuclear RNA genes.

pseudogene [SO_0000336]

A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their “normal” paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its “normal” paralog).

pseudogene_by_unequal_crossing_over [SO_0000044]

A pseudogene caused by unequal crossing over at recombination.

pseudogene_processed_transcript [SO_0002111]

A processed_transcript supported by EST and/or mRNA evidence that aligns unambiguously to a pseudogene locus (i.e. alignment to the pseudogene locus clearly better than alignment to parent locus). Term added as part of collaboration with Gencode, adding biotypes used in annotation.

pseudogenic_CDS [SO_0002087]

A non functional descendant of the coding portion of a coding transcript, part of a pseudogene.

pseudogenic_exon [SO_0000507]

A non functional descendant of an exon, part of a pseudogene. This is the analog of the exon of a functional gene. The term was requested by Rama - SGD to allow the annotation of the parts of a pseudogene. Non-functional is defined as either its transcription or translation (or both) are prevented due to one or more mutations.

pseudogenic_gene_segment [SO_0001741]

A gene segment which when incorporated by somatic recombination in the final gene transcript results in a nonfunctional product.

pseudogenic_region [SO_0000462]

A non-functional descendant of a functional entity.

pseudogenic_rRNA [SO_0000777]

A non functional descendant of an rRNA. Added Jan 2006 to allow the annotation of the pseudogenic rRNA by flybase. Non-functional is defined as its transcription is prevented due to one or more mutatations.

pseudogenic_transcript [SO_0000516]

A non functional descendant of a transcript, part of a pseudogene. This is the analog of the transcript of a functional gene. The term was requested by Rama - SGD to allow the annotation of the parts of a pseudogene. Non-functional is defined as either its transcription or translation (or both) are prevented due to one or more mutations.

pseudogenic_transcript_with_retained_intron [SO_0002115]

A transcript supported by EST and/or mRNA evidence that aligns unambiguously to the pseudogene locus; has retained intronic sequence compared to a reference transcript sequence. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes.

pseudogenic_tRNA [SO_0000778]

A non functional descendent of a tRNA. Added Jan 2006 to allow the annotation of the pseudogenic tRNA by flybase. Non-functional is defined as its transcription is prevented due to one or more mutatations.

pseudoknot [SO_0000591]

A tertiary structure in RNA where nucleotides in a loop form base pairs with a region of RNA downstream of the loop.

pseudouridylation_guide_snoRNA [SO_0001187]

A snoRNA that specifies the site of pseudouridylation in an RNA molecule by base pairing with a short sequence around the target residue. Has RNA pseudouridylation guide activity (GO:0030558).

purine_to_pyrimidine_transversion [SO_1000023]

Change of a purine nucleotide, A or G , into a pyrimidine nucleotide C or T.

purine_transition [SO_1000014]

A substitution of a purine, A or G, for another purine.

pyrimidine_to_purine_transversion [SO_1000018]

Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G.

pyrimidine_transition [SO_1000010]

A substitution of a pyrimidine, C or T, for another pyrimidine.

pyrosequenced_read [SO_0001424]

A read produced by pyrosequencing technology. An example is a read produced by Roche 454 technology.

pyrrolysine [SO_0001456]

A relatively rare amino acid encoded by the codon UAG in some contexts, whereas UAG is a termination codon in other contexts. A place holder for a cross product with chebi.

pyrrolysine_loss [SO_0002010]

A sequence variant whereby at least one base of a codon encoding pyrrolysine is changed, resulting in a different encoded amino acid. Request from Uma Devi Paila, UVA. Variants in the sites of rare amino acids e.g. Selenocysteine. These are important impact terms since a loss of such rare amino acids may lead to a loss of function.

pyrrolysine_tRNA_primary_transcript [SO_0001178]

A primary transcript encoding pyrrolysyl tRNA (SO:0000766).

pyrrolysyl_tRNA [SO_0000766]

A tRNA sequence that has a pyrrolysine anticodon, and a 3’ pyrrolysine binding region.

QTL [SO_0000771]

A quantitative trait locus (QTL) is a polymorphic locus which contains alleles that differentially affect the expression of a continuously distributed phenotypic trait. Usually it is a marker described by statistical association to quantitative variation in the particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci. Added in respose to request by Simon Twigger November 14th 2005.

quality_value [SO_0001686]

An experimental feature attribute that defines the quality of the feature in a quantitative way, such as a phred quality score.

quantitative_variant [SO_0001774]

A variant within a gene that contributes to a quantitative trait such as height or weight.

queuosine [SO_0001317]

Queuosine is a modified 7-deazoguanosine.

R_five_prime_LTR_region [SO_0000427]

The R segment of the three-prime long terminal repeat.

R_GNA [SO_0001194]

An attribute describing a GNA sequence in the (R)-GNA enantiomer. Do not use this term for feature annotation. Use R_GNA_oligo (SO:0001195) instead.

R_GNA_oligo [SO_0001195]

An oligo composed of (R)-GNA residues.

R_LTR_region [SO_0000423]

The R segment of the long terminal repeats.

R_three_prime_LTR_region [SO_0000430]

The R segment of the three-prime long terminal repeat.

random_sequence [SO_0000449]

A sequence of nucleotides or amino acids which, by design, has a “random” order of components, given a predetermined input frequency of these components.

RAPD [SO_0001481]

RAPD is a ‘PCR product’ where a sequence variant is identified through the use of PCR with random primers.

rare_amino_acid_variant [SO_0002008]

A sequence variant whereby at least one base of a codon encoding a rare amino acid is changed, resulting in a different encoded amino acid. Request from Uma Devi Paila, UVA. Variants in the sites of rare amino acids e.g. Selenocysteine. These are important impact terms since a loss of such rare amino acids may lead to a loss of function.

rare_variant [SO_0001765]

When a variant from the genomic sequence is rarely found in the general population. The threshold for ‘rare’ varies between studies.

rasiRNA [SO_0000454]

A 17-28-nt, small interfering RNA derived from transcripts of repetitive elements.

rate_of_transcription_variant [SO_0001550]

A sequence variant that changes the rate of transcription with respect to a reference sequence.

rDNA_intergenic_spacer_element [SO_0001860]

A DNA motif that contains a core consensus sequence AGGTAAGGGTAATGCAC, is found in the intergenic regions of rDNA repeats, and is bound by an RNA polymerase I transcription termination factor (e.g. S. pombe Reb1). The S. pombe telomeric repeat consensus is TTAC(0-1)A(0-1)G(1-8). Page 208 of ISBN:978-0199638901

rDNA_replication_fork_barrier [SO_0001914]

A DNA motif that is found in eukaryotic rDNA repeats, and is a site of replication fork pausing. Requested by Midori - June 2012.

read [SO_0000150]

A sequence obtained from a single sequencing experiment. Typically a read is produced when a base calling program interprets information from a chromatogram trace file produced from a sequencing machine.

read_pair [SO_0000007]

One of a pair of sequencing reads in which the two members of the pair are related by originating at either end of a clone insert.

reading_frame [SO_0000717]

A nucleic acid sequence that when read as sequential triplets, has the potential of encoding a sequential string of amino acids. It need not contain the start or stop codon. This term was added after a request by SGD. August 2004. Modified after SO meeting in Cambridge to not include start or stop.

reagent [SO_0000695]

A sequence used in experiment. Requested by Lynn Crosby, jan 2006.

rearranged_at_DNA_level [SO_0000904]

An attribute to describe the sequence of a feature, where the DNA is rearranged.

rearrangement_region [SO_0001872]

A region of a chromosome, where the chromosome has undergone a large structural rearrangement that altered the genome organization. There is no longer synteny to the reference genome. NCBI definition: An orphan rearrangement between chromosomal location observed in isolation.

reciprocal [SO_0001521]

When translocation occurs between nonhomologous chromosomes and involved an equal exchange of genetic materials.

reciprocal_chromosomal_translocation [SO_1000048]

A chromosomal translocation with two breaks; two chromosome segments have simply been exchanged.

recoded [SO_0000881]

An attribute describing an mRNA sequence that has been reprogrammed at translation, causing localized alterations.

recoded_by_translational_bypass [SO_0000886]

Recoded mRNA where a block of nucleotides is not translated.

recoded_codon [SO_0000145]

A codon that has been redefined at translation. The redefinition may be as a result of translational bypass, translational frameshifting or stop codon readthrough.

recoded_mRNA [SO_1001261]

The sequence of a mature mRNA transcript, modified before translation or during translation, usually by special cis-acting signals.

recoding_pseudoknot [SO_0000545]

The pseudoknots involved in recoding are unique in that, as they play their role as a structure, they are immediately unfolded and their now linear sequence serves as a template for decoding.

recoding_stimulatory_region [SO_1001268]

A site in an mRNA sequence that stimulates the recoding of a region in the same mRNA.

recombination_enhancer [SO_0002059]

A regulatory_region that promotes or induces the process of recombination.

recombination_feature [SO_0000298]

A feature where there has been exchange of genetic material in the event of mitosis or meiosis

recombination_feature_of_rearranged_gene [SO_0000300]

A location where a gene is rearranged due to recombination during mitosis or meiosis.

recombination_hotspot [SO_0000339]

A region in a genome which promotes recombination.

recombination_regulatory_region [SO_0001681]

A regulatory region that is involved in the control of the process of recombination.

recombination_signal_sequence [SO_0001532]

A region recognized by a recombinase.

recombinationally_inverted_gene [SO_0000373]

A recombinationally rearranged gene by inversion.

recombinationally_rearranged [SO_0000940]

A gene that is recombinationally rearranged.

recombinationally_rearranged_gene [SO_0000456]

A gene that is recombinationally rearranged.

recombinationally_rearranged_vertebrate_immune_system_gene [SO_0000941]

A recombinationally rearranged gene of the vertebrate immune system.

recursive_splice_site [SO_0000998]

A recursive splice site is a splice site which subdivides a large intron. Recursive splicing is a mechanism that splices large introns by sub dividing the intron at non exonic elements and alternate exons.

reference genome sequence [SO_0001505]

A collection of sequences (often chromosomes) taken as the standard for a given organism and genome assembly.

region [SO_0000001]

A sequence_feature with an extent greater than zero. A nucleotide region is composed of bases and a polypeptide region is composed of amino acids.

regional_centromere [SO_0001795]

A regional centromere is a large modular centromere found in fission yeast and higher eukaryotes. It consist of a central core region flanked by inverted inner and outer repeat regions.

regional_centromere_central_core [SO_0001796]

A conserved region within the central region of a modular centromere, where the kinetochore is formed.

regional_centromere_inner_repeat_region [SO_0001798]

The inner inverted repeat region of a modular centromere and part of the central core surrounding a non-conserved central region. This region is adjacent to the central core, on each chromosome arm.

regional_centromere_outer_repeat_region [SO_0001799]

The heterochromatic outer repeat region of a modular centromere. These repeats exist in tandem arrays on both chromosome arms.

regional_centromere_outer_repeat_transcript [SO_0001905]

A transcript that is transcribed from the outer repeat region of a regional centromere.

regulated [SO_0000119]

An attribute to describe a sequence that is regulated.

regulatory_promoter_element [SO_0001678]

A promoter element that is not part of the core promoter, but provides the promoter with a specific regulatory region.

regulatory_region_ablation [SO_0001894]

A feature ablation whereby the deleted region includes a regulatory region. Created in conjunction with the EBI.

regulatory_region_amplification [SO_0001891]

A feature amplification of a region containing a regulatory region. Created in conjunction with the EBI.

regulatory_region_fusion [SO_0001887]

A feature fusion where the deletion brings together regulatory regions. Created in conjunction with the EBI.

regulatory_region_translocation [SO_0001884]

A feature translocation where the region contains a regulatory region. Created in conjunction with the EBI.

regulatory_region_variant [SO_0001566]

A sequence variant located within a regulatory region. EBI term: Regulatory region variations - In regulatory region annotated by Ensembl.

remark [SO_0000700]

A comment about the sequence.

repeat_component [SO_0000840]

A region of a repeated sequence. A manufactured to group the parts of repeats, to give them an is_a path back to the root.

repeat_fragment [SO_0001050]

A portion of a repeat, interrupted by the insertion of another element. Requested by Chris Smith, and others at Flybase to help annotate nested repeats.

repeat_region [SO_0000657]

A region of sequence containing one or more repeat units.

repeat_unit [SO_0000726]

The simplest repeated component of a repeat region. A single repeat. Added to comply with the feature table. A single repeat.

replication_regulatory_region [SO_0001682]

A regulatory region that is involved in the control of the process of nucleotide replication.

replicon [SO_0001235]

A region containing at least one unique origin of replication and a unique termination site.

rescue [SO_0000814]

An attribute describing a region’s ability, when introduced to a mutant organism, to re-establish (rescue) a phenotype.

rescue_gene [SO_0000816]

A gene that rescues.

rescue_mini_gene [SO_0000795]

A mini_gene that rescues.

rescue_region [SO_0000411]

A region that rescues.

resolution_site [SO_0000947]

A region specifically recognized by a recombinase, which separates a physically contiguous circle of DNA into two physically separate circles.

restriction_enzyme_assembly_scar [SO_0001953]

A region of DNA sequence formed from the ligation of two sticky ends where the palindrome is broken and no longer comprises the recognition site and thus cannot be re-cut by the restriction enzymes used to create the sticky ends.

restriction_enzyme_binding_site [SO_0000061]

A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues of a restriction enzyme. A region of a molecule that binds to a restriction enzyme.

restriction_enzyme_cleavage_junction [SO_0001688]

The boundary at which a restriction enzyme breaks the nucleotide sequence.

restriction_enzyme_five_prime_single_strand_overhang [SO_0001932]

A terminal region of DNA sequence where the end of the region is not blunt ended and the exposed single strand terminates at the 5’ end.

restriction_enzyme_recognition_site [SO_0001687]

The nucleotide region (usually a palindrome) that is recognized by a restriction enzyme. This may or may not be equal to the restriction enzyme binding site.

restriction_enzyme_region [SO_0001954]

A region related to restriction enzyme function. Not a great term for annotation, but used to classify the various regions related to restriction enzymes.

restriction_enzyme_single_strand_overhang [SO_0001695]

A terminal region of DNA sequence where the end of the region is not blunt ended.

restriction_enzyme_three_prime_single_strand_overhang [SO_0001933]

A terminal region of DNA sequence where the end of the region is not blunt ended and the exposed single strand terminates at the 3’ end.

retinoic_acid_responsive_element [SO_0001653]

A transcription factor binding site of variable direct repeats of the sequence PuGGTCA spaced by five nucleotides (DR5) found in the promoters of retinoic acid-responsive genes, to which retinoic acid receptors bind.

retrogene [SO_0001219]

A gene that has been produced as the product of a reverse transcriptase mediated event.

retron [SO_1001275]

Sequence coding for a short, single-stranded, DNA sequence via a retrotransposed RNA intermediate; characteristic of some microbial genomes.

reverse [SO_0001031]

Reverse is an attribute of the feature, where the feature is in the 3’ to 5’ direction. Again could be applied to primer.

reverse_Hoogsteen_base_pair [SO_0000501]

A type of non-canonical base-pairing.

reverse_primer [SO_0000132]

A single stranded oligo used for polymerase chain reaction. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

RH_map [SO_0001252]

A radiation hybrid map is a physical map.

rho_dependent_bacterial_terminator [SO_0000981]

A transcription terminator that is dependent upon Rho.

rho_independent_bacterial_terminator [SO_0000982]

A transcription terminator that is not dependent upon Rho. Rather, the mRNA contains a sequence that allows it to base-pair with itself and make a stem-loop structure.

ribonuclease_site [SO_0001977]

A region of a transcript encoding the cleavage site for a ribonuclease enzyme.

ribosome_entry_site [SO_0000139]

Region in mRNA where ribosome assembles.

riboswitch [SO_0000035]

A riboswitch is a part of an mRNA that can act as a direct sensor of small molecules to control their own expression. A riboswitch is a cis element in the 5’ end of an mRNA, that acts as a direct sensor of metabolites.

ribothymidine [SO_0001232]

A modified RNA base in which thymine is bound to the ribose ring. The free molecule is CHEBI:30832.

ribozymic [SO_0001186]

An attribute describing the sequence of a transcript that has catalytic activity even without an associated ribonucleoprotein. Do not use this for feature annotation. Use ribozyme (SO:0000374) instead.

right_handed_peptide_helix [SO_0001116]

A right handed helix is a region of peptide where the coiled conformation turns in a clockwise, right handed screw.

ring_chromosome [SO_1000045]

A ring chromosome is a chromosome whose arms have fused together to form a ring, often with the loss of the ends of the chromosome.

RNA [SO_0000356]

An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a D-ribose ring connected to a phosphate backbone. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

RNA_6S [SO_0000376]

A small (184-nt in E. coli) RNA that forms a hairpin type structure. 6S RNA associates with RNA polymerase in a highly specific manner. 6S RNA represses expression from a sigma70-dependent promoter during stationary phase.

RNA_aptamer [SO_0000033]

RNA molecules that have been selected from random pools based on their ability to bind other molecules.

RNA_chromosome [SO_0000961]

Structural unit composed of a self-replicating, RNA molecule.

RNA_hook_turn [SO_0000027]

[RNA_hook_turn; RNA hook turn; RNA_junction_loop; hook-turn motif]

RNA_internal_loop [SO_0000020]

A region of double stranded RNA where the bases do not conform to WC base pairing. The loop is closed on both sides by canonical base pairing. If the interruption to base pairing occurs on one strand only, it is known as a bulge.

RNA_junction_loop [SO_0000026]

[RNA junction loop; RNA_junction_loop; RNA_motif]

RNA_motif [SO_0000715]

A motif that is active in RNA sequence.

RNA_polymerase_II_TATA_box [SO_0001661]

A TATA box core promoter of a gene transcribed by RNA polymerase II.

RNA_polymerase_III_TATA_box [SO_0001662]

A TATA box core promoter of a gene transcribed by RNA polymerase III.

RNA_sequence_secondary_structure [SO_0000122]

A folded RNA sequence.

RNA_stability_element [SO_0001979]

A motif that affects the stability of RNA.

RNAi_reagent [SO_0000337]

A double stranded RNA duplex, at least 20bp long, used experimentally to inhibit gene function by RNA interference.

RNApol_I_promoter [SO_0000169]

A DNA sequence in eukaryotic DNA to which RNA polymerase I binds, to begin transcription. parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221.

RNApol_II_core_promoter [SO_0001669]

The minimal portion of the promoter required to properly initiate transcription in RNA polymerase II transcribed genes.

RNApol_II_promoter [SO_0000170]

A DNA sequence in eukaryotic DNA to which RNA polymerase II binds, to begin transcription. parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221.

RNApol_III_promoter [SO_0000171]

A DNA sequence in eukaryotic DNA to which RNA polymerase III binds, to begin transcription. parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221.

RNApol_III_promoter_type_1 [SO_0000617]

This type of promoter recruits RNA pol III. This promoter is intragenic and includes an A box, an intermediate element, and a C box. This is well conserved in the 5s rRNA promoters across species.

RNApol_III_promoter_type_2 [SO_0000618]

This type of promoter recruits RNA pol III to transcribe genes mainly for t-RNA. This promoter is intragenic and includes an A box and a B box.

RNApol_III_promoter_type_3 [SO_0000621]

This type of promoter recruits RNA pol III to transcribe predominantly noncoding RNAs. This promoter contains a proximal sequence element (PSE) and a TATA box upstream of the gene that it regulates. Transcription can also be activated by a distal sequence element (DSE), which is located further upstream.

RNase_MRP_RNA [SO_0000385]

The RNA molecule essential for the catalytic activity of RNase MRP, an enzymatically active ribonucleoprotein with two distinct roles in eukaryotes. In mitochondria it plays a direct role in the initiation of mitochondrial DNA replication. In the nucleus it is involved in precursor rRNA processing, where it cleaves the internal transcribed spacer 1 between 18S and 5.8S rRNAs.

RNase_MRP_RNA_gene [SO_0001640]

A gene that encodes a RNase_MRP_RNA.

RNase_P_RNA [SO_0000386]

The RNA component of Ribonuclease P (RNase P), a ubiquitous endoribonuclease, found in archaea, bacteria and eukarya as well as chloroplasts and mitochondria. Its best characterized activity is the generation of mature 5 prime ends of tRNAs by cleaving the 5 prime leader elements of precursor-tRNAs. Cellular RNase Ps are ribonucleoproteins. RNA from bacterial RNase Ps retains its catalytic activity in the absence of the protein subunit, i.e. it is a ribozyme. Isolated eukaryotic and archaeal RNase P RNA has not been shown to retain its catalytic function, but is still essential for the catalytic activity of the holoenzyme. Although the archaeal and eukaryotic holoenzymes have a much greater protein content than the bacterial ones, the RNA cores from all the three lineages are homologous. Helices corresponding to P1, P2, P3, P4, and P10/11 are common to all cellular RNase P RNAs. Yet, there is considerable sequence variation, particularly among the eukaryotic RNAs.

RNase_P_RNA_gene [SO_0001639]

A gene that encodes an RNase P RNA.

RprA_RNA [SO_0000387]

Translational regulation of the stationary phase sigma factor RpoS is mediated by the formation of a double-stranded RNA stem-loop structure in the upstream region of the rpoS messenger RNA, occluding the translation initiation site. Clones carrying rprA (RpoS regulator RNA) increased the translation of RpoS. The rprA gene encodes a 106 nucleotide regulatory RNA. As with DsrA Rfam:RF00014, RprA is predicted to form three stem-loops. Thus, at least two small RNAs, DsrA and RprA, participate in the positive regulation of RpoS translation. Unlike DsrA, RprA does not have an extensive region of complementarity to the RpoS leader, leaving its mechanism of action unclear. RprA is non-essential.

RR_tract [SO_0000435]

A polypurine tract within an LTR_retrotransposon.

RRE_RNA [SO_0000388]

The Rev response element (RRE) is encoded within the HIV-env gene. Rev is an essential regulatory protein of HIV that binds an internal loop of the RRE leading, encouraging further Rev-RRE binding. This RNP complex is critical for mRNA export and hence for expression of the HIV structural proteins.

rRNA [SO_0000252]

rRNA is an RNA component of a ribosome that can provide both structural scaffolding and catalytic activity. Definition updated 10 June 2021 as part of restructuring rRNA terms and reforming definitions to have similar structures. Request from EBI. See GitHub Issue #493

rRNA_25S [SO_0001002]

Cytosolic 25S rRNA is an RNA component of the large subunit of cytosolic ribosomes most eukaryotes. Renamed from rRNA_5S to cytosolic_5S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.

rRNA_cleavage_RNA [SO_0005843]

An ncRNA that is part of a ribonucleoprotein that cleaves the primary pre-rRNA transcript in the process of producing mature rRNA molecules.

rRNA_cleavage_snoRNA_primary_transcript [SO_0000582]

A primary transcript encoding an rRNA cleavage snoRNA.

rRNA_encoding [SO_0000573]

A region that can be transcribed into a ribosomal RNA (rRNA).

rRNA_gene [SO_0001637]

A gene that encodes for ribosomal RNA.

rRNA_large_subunit_primary_transcript [SO_0000325]

A primary transcript encoding a large ribosomal subunit RNA.

rRNA_primary_transcript [SO_0000209]

A primary transcript encoding a ribosomal RNA.

rRNA_primary_transcript_region [SO_0000838]

A region of an rRNA primary transcript. To allow transcribed_spacer_region to have a path to the root.

rRNA_small_subunit_primary_transcript [SO_0000255]

A primary transcript encoding a small ribosomal subunit RNA.

RST [SO_0001467]

A tag produced from a single sequencing read from a RACE product; typically a few hundred base pairs long.

RST_match [SO_0001471]

A match against an RST sequence.

S_GNA [SO_0001196]

An attribute describing a GNA sequence in the (S)-GNA enantiomer. Do not use this term for feature annotation. Use S_GNA_oligo (SO:0001197) instead.

S_GNA_oligo [SO_0001197]

An oligo composed of (S)-GNA residues.

S_region [SO_0001836]

The switch region of immunoglobulin heavy chains; it is involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin classes from the same B-cell.

SAGE_tag [SO_0000326]

A short diagnostic sequence tag, serial analysis of gene expression (SAGE), that allows the quantitative and simultaneous analysis of a large number of transcripts.

Sap1_recognition_motif [SO_0001864]

A DNA motif to which the S. pombe Sap1 protein binds. The consensus sequence is 5’-TARGCAGNTNYAACGMG-3’; it is found at the mating type locus, where it is important for mating type switching, and at replication fork barriers in rDNA repeats.

sarcin_like_RNA_motif [SO_0000024]

A loop in ribosomal RNA containing the sites of attack for ricin and sarcin.

scaRNA [SO_0002095]

A ncRNA, specific to the Cajal body, that has been demonstrated to function as a guide RNA in the site-specific synthesis of 2’-O-ribose-methylated nucleotides and pseudouridines in the RNA polymerase II-transcribed U1, U2, U4 and U5 spliceosomal small nuclear RNAs (snRNAs). Moved from is_a ncRNA (SO:0000655) to is_a snoRNA (SO:0000275) as per request from FlyBase by Dave Sant 24 April 2021. See GitHub Issue #509.

schellmann_loop [SO_0001123]

A motif of six or seven consecutive residues that contains two H-bonds.

schellmann_loop_seven [SO_0001124]

Wild type: A motif of seven consecutive residues that contains two H-bonds in which: the main-chain CO of residue(i) is H-bonded to the main-chain NH of residue(i+6), the main-chain CO of residue(i+1) is H-bonded to the main-chain NH of residue(i+5).

schellmann_loop_six [SO_0001125]

Common Type: A motif of six consecutive residues that contains two H-bonds in which: the main-chain CO of residue(i) is H-bonded to the main-chain NH of residue(i+5) the main-chain CO of residue(i+1) is H-bonded to the main-chain NH of residue(i+4).

score [SO_0001685]

The score of an experimentally derived feature such as a p-value.

scRNA [SO_0000013]

A small non coding RNA sequence, present in the cytoplasm.

scRNA_encoding [SO_0000575]

A region that can be transcribed into a small cytoplasmic RNA (scRNA).

scRNA_gene [SO_0001266]

A small noncoding RNA that is generally found only in the cytoplasm.

scRNA_primary_transcript [SO_0000012]

The primary transcript of any one of several small cytoplasmic RNA molecules present in the cytoplasm and sometimes nucleus of a Eukaryote.

SECIS_element [SO_1001274]

The incorporation of selenocysteine into a protein sequence is directed by an in-frame UGA codon (usually a stop codon) within the coding region of the mRNA. Selenoprotein mRNAs contain a conserved secondary structure in the 3’ UTR that is required for the distinction of UGA stop from UGA selenocysteine. The selenocysteine insertion sequence (SECIS) is around 60 nt in length and adopts a hairpin structure which is sufficiently well-defined and conserved to act as a computational screen for selenoprotein genes.

selenocysteine [SO_0001455]

A relatively rare amino acid encoded by the codon UGA in some contexts, whereas UGA is a termination codon in other contexts. A place holder for a cross product with chebi.

selenocysteine_loss [SO_0002009]

A sequence variant whereby at least one base of a codon encoding selenocysteine is changed, resulting in a different encoded amino acid. Request from Uma Devi Paila, UVA. Variants in the sites of rare amino acids e.g. Selenocysteine. These are important impact terms since a loss of such rare amino acids may lead to a loss of function.

selenocysteine_tRNA_primary_transcript [SO_0005856]

A primary transcript encoding seryl tRNA (SO:000269).

selenocysteinyl_tRNA [SO_0005857]

A tRNA sequence that has a selenocysteine anticodon, and a 3’ selenocysteine binding region.

self_cleaving_ribozyme [SO_0002231]

An RNA that catalyzes its own cleavage. Added as per request by John T. Sexton GitHub issue #470 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/470)

sense_intronic_ncRNA [SO_0002131]

A long non-coding transcript found within an intron of a coding or non-coding gene, with no overlap of exonic sequence.

sense_overlap_ncRNA [SO_0002132]

A long non-coding transcript that contains a protein coding gene within its intronic sequence on the same strand, with no overlap of exonic sequence.

sequence_alteration [SO_0001059]

A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence. 1. A ‘sequence alteration’ is an allele whose sequence deviates in its entirety from that of other features found at the same genomic location (i.e. it deviates along its entire extent). In this sense, ‘sequence alterations’ represent the minimal extent an allele can take - i.e. that which is variable with some other feature along its entire sequence). An example is a SNP or insertion. Alleles whose extent goes beyond the specific sequence that is known to be variable are not sequence alterations. These are alleles that represent alternate versions of some larger, named feature. The classic example here is a ‘gene allele’, which spans the extent of an entire gene, and contains one or more sequence alterations (regions known to vary) as part. 2. Sequence alterations are not necessarily ‘variant’ in the sense defined in GENO (i.e. being ‘variant with’ some reference sequence). In any comparison of alleles at a particular location, the choice of a ‘reference’ is context-dependent - as comparisons in other contexts might consider a different allele to be the reference. So while sequence alterations are usually considered ‘variant’ in the context in which they are considered, this variant status may not hold at all times. For this reason, the ‘sequence alteration’ class is not made an rdfs:subClassOf ‘variant allele’. For a particular instance of a sequence alteration, howver, we may in some cases be able to rdf:type it as a ‘varaint allele’ and a ‘sequence alteration’, in situations where we can be confident that the feature will never be considered a reference. For example, experimentally generated mutations in model organism genes that are created expressly to vary from an established reference. 3. Note that we consider novel features gained in a genome to be sequence alterations, including aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that become part of the heritable genome of a cell or organism. Merged with partially characterized change in nucleotide sequence.

sequence_assembly [SO_0000353]

A sequence of nucleotides that has been algorithmically derived from an alignment of two or more different sequences.

sequence_attribute [SO_0000400]

An attribute describes a quality of sequence.

sequence_collection [SO_0001260]

A collection of discontinuous sequences.

sequence_comparison [SO_0002072]

A position or feature where two sequences have been compared.

sequence_conflict [SO_0001085]

Different sources report differing sequences. Discrete.

sequence_difference [SO_0000413]

A region where the sequence differs from that of a specified sequence.

sequence_feature [SO_0000110]

Any extent of continuous biological sequence. A sequence feature is an extent of ’located’ biological sequence, whose identity is determined by both its inherent sequence (ordering of monomeric units) and its position (start and end coordinates based on alignment with some reference). By contrast, ‘biological sequences’ are identified and distinguished only by their inehrent sequence, and not their position. Accordingly, the ‘ATG’ start codon in the coding DNA sequence of the human AKT gene is the same ‘sequence’ as the ‘ATG’ start codon in the human SHH gene, but these represent two distinct ‘sequence features’ in virtue of their different positions in the genome.

sequence_length_alteration [SO_0000248]

A kind of kind of sequence alteration where the copies of a region present varies across a population.

sequence_length_variant [SO_0002160]

A sequence variant that changes the length of one or more sequence features.

sequence_location [SO_0000735]

The location of a sequence.

sequence_rearrangement_feature [SO_0000669]

A feature where a segment of DNA has been rearranged from what it was in the parent cell.

sequence_secondary_structure [SO_0000002]

A folded sequence.

sequence_uncertainty [SO_0001086]

Describes the positions in a sequence where the authors are unsure about the sequence assignment.

sequence_variant [SO_0001060]

A sequence_variant is a non exact copy of a sequence_feature or genome exhibiting one or more sequence_alteration.

sequencing_primer [SO_0000107]

A single stranded oligo used for polymerase chain reaction.

serine [SO_0001444]

A polar, hydorophilic amino acid encoded by the codons TCN (TCT, TCC, TCA, TCG), AGT and AGC. A place holder for a cross product with chebi.

serine_threonine_motif [SO_0001126]

A motif of five consecutive residues and two hydrogen bonds in which: residue(i) is Serine (S) or Threonine (T), the side-chain O of residue(i) is H-bonded to the main-chain NH of residue(i+2) or (i+3) , the main-chain CO group of residue(i) is H-bonded to the main-chain NH of residue(i+3) or (i+4).

serine_threonine_staple_motif [SO_0001127]

A motif of four or five consecutive residues and one H-bond in which: residue(i) is Serine (S) or Threonine (T), the side-chain OH of residue(i) is H-bonded to the main-chain CO of residue(i3) or (i4), Phi angles of residues(i1), (i2) and (i3) are negative.

serine_threonine_turn [SO_0001141]

A motif of three consecutive residues and one H-bond in which: residue(i) is Serine (S) or Threonine (T), the side-chain O of residue(i) is H-bonded to the main-chain NH of residue(i+2).

serine_tRNA_primary_transcript [SO_0000226]

A primary transcript encoding seryl tRNA (SO:000269).

seryl_tRNA [SO_0000269]

A tRNA sequence that has a serine anticodon, and a 3’ serine binding region.

seven_aminomethyl_seven_deazaguanosine [SO_0001322]

7_aminomethyl_7_deazaguanosine is a modified 7-deazoguanosine.

seven_cyano_seven_deazaguanosine [SO_0001321]

7_cyano_7_deazaguanosine is a modified 7-deazoguanosine.

seven_deazaguanosine [SO_0001316]

7-deazaguanosine is a modified guanosine.

seven_methylguanine [SO_0001231]

A modified RNA base in which guanine is methylated at the 7- position. The free molecule is CHEBI:2274.

seven_methylguanosine [SO_0001326]

7_methylguanosine is a modified guanosine base feature.

sgRNA [SO_0001998]

A small RNA oligo, typically about 20 bases, that guides the cas nuclease to a target DNA sequence in the CRISPR/cas mutagenesis method.

shadow_enhancer [SO_0001482]

An enhancer that drives the pattern of transcription and binds to the same TF as the primary enhancer, but is located in the intron of or on the far side of a neighboring gene.

short_tandem_repeat_variation [SO_0002096]

A variation that expands or contracts a tandem repeat with regard to a reference.

SHP_box [SO_0002159]

A conserved Cdc48/p97 interaction motif with strict consensus sequence F[PI]GKG[TK][RK]LG[GT] and relaxed consensus sequence FXGKGX[RK]LG.

shRNA [SO_0002031]

A short hairpin RNA (shRNA) is an RNA transcript that makes a tight hairpin turn that can be used to silence target gene expression via RNA interference.

shRNA_primary_transcript [SO_0002038]

A primary transcript encoding an shRNA.

signal_anchor [SO_0001809]

A signal sequence that is not cleaved from the polypeptide. Anchors a Type II membrane protein to the membrane.

signal_peptide_region_of_CDS [SO_0002251]

A CDS region corresponding to a signal peptide of a polypeptide. Added as per request from GitHub Issue #484 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/484)

signature [SO_0001978]

A region of sequence where developer information is encoded. Requested by Jackie Quinn for use in synthetic biology.

silenced [SO_0000893]

An attribute describing an epigenetic process where a gene is inactivated at transcriptional or translational level.

silenced_by_DNA_methylation [SO_0000895]

An attribute describing an epigenetic process where a gene is inactivated by DNA methylation, resulting in repression of transcription.

silenced_by_DNA_modification [SO_0000894]

An attribute describing an epigenetic process where a gene is inactivated by DNA modifications, resulting in repression of transcription.

silenced_by_histone_deacetylation [SO_0001223]

An attribute describing an epigenetic process where a gene is inactivated by histone deacetylation. Histone deacetylation is GO:0016573.

silenced_by_histone_methylation [SO_0001222]

An attribute describing an epigenetic process where a gene is inactivated by histone methylation. Histone methylation is GO:0016571.

silenced_by_histone_modification [SO_0001221]

An attribute describing an epigenetic process where a gene is inactivated by histone modification. Histone modification is GO:0016570.

silenced_by_RNA_interference [SO_0001220]

An attribute describing an epigenetic process where a gene is inactivated by RNA interference. RNA interference is GO:0016246.

silenced_gene [SO_0000127]

A gene that is silenced.

silencer [SO_0000625]

A regulatory region which upon binding of transcription factors, suppress the transcription of the gene or genes they control.

silent_mating_type_cassette_array [SO_0001984]

A gene cassette array that corresponds to a silenced version of a mating type region.

single [SO_0000984]

When a nucleotide polymer has only one strand. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

single_strand_restriction_enzyme_cleavage_site [SO_0001694]

A restriction enzyme cleavage site whereby only one strand is cut.

single_stranded_cDNA [SO_0000757]

DNA synthesized from RNA by reverse transcriptase, single stranded.

single_stranded_DNA_chromosome [SO_0000956]

Structural unit composed of a self-replicating, single-stranded DNA molecule.

single_stranded_RNA_chromosome [SO_0000962]

Structural unit composed of a self-replicating, single-stranded RNA molecule.

site_specific_recombination_target_region [SO_0000342]

A region specifically recognised by a recombinase where recombination can occur during mitosis or meiosis.

SL1_acceptor_site [SO_0000708]

A trans_splicing_acceptor_site which appends the 22nt SL1 RNA leader sequence to the 5’ end of most mRNAs.

SL10_acceptor_site [SO_0001755]

A SL2_acceptor_site which appends the SL10 RNA leader sequence to the 5’ end of an mRNA. SL10 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL11_acceptor_site [SO_0001756]

A SL2_acceptor_site which appends the SL11 RNA leader sequence to the 5’ end of an mRNA. SL11 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL12_acceptor_site [SO_0001757]

A SL2_acceptor_site which appends the SL12 RNA leader sequence to the 5’ end of an mRNA. SL12 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL2_acceptor_site [SO_0000709]

A trans_splicing_acceptor_site which appends the 22nt SL2 RNA leader sequence to the 5’ end of mRNAs. SL2 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL3_acceptor_site [SO_0001748]

A SL2_acceptor_site which appends the SL3 RNA leader sequence to the 5’ end of an mRNA. SL3 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL4_acceptor_site [SO_0001749]

A SL2_acceptor_site which appends the SL4 RNA leader sequence to the 5’ end of an mRNA. SL4 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL5_acceptor_site [SO_0001750]

A SL2_acceptor_site which appends the SL5 RNA leader sequence to the 5’ end of an mRNA. SL5 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL6_acceptor_site [SO_0001751]

A SL2_acceptor_site which appends the SL6 RNA leader sequence to the 5’ end of an mRNA. SL6 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL7_acceptor_site [SO_0001752]

A SL2_acceptor_site which appends the SL7 RNA leader sequence to the 5’ end of an mRNA. SL7 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL8_acceptor_site [SO_0001753]

A SL2_acceptor_site which appends the SL8 RNA leader sequence to the 5’ end of an mRNA. SL8 acceptor sites occur in genes in internal segments of polycistronic transcripts.

SL9_acceptor_site [SO_0001754]

A SL2_acceptor_site which appends the SL9 RNA leader sequence to the 5’ end of an mRNA. SL9 acceptor sites occur in genes in internal segments of polycistronic transcripts.

small_regulatory_ncRNA [SO_0000370]

A non-coding RNA less than 200 nucleotides long, usually with a specific secondary structure, that acts to regulate gene expression. These include short ncRNAs such as piRNA, miRNA and siRNAs (among others).

smFISH_probe [SO_0001854]

A smFISH is a probe that binds RNA in a single molecule in situ hybridization experiment.

sncRNA [SO_0002247]

A non-coding RNA less than 200 nucleotides in length. Added as per request from GitHub Issue #485 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/485)

sncRNA_gene [SO_0002342]

A ncRNA_gene (SO:0001263) that is less than 200 nucleotides in length. Added as a request from FlyBase to make the ncRNA_gene branch in SO mirror the ncRNA branch. See GitHub Issue #514

snoRNA_encoding [SO_0000578]

A region that can be transcribed into a small nucleolar RNA (snoRNA).

snoRNA_gene [SO_0001267]

A gene encoding a small noncoding RNA that participates in the processing or chemical modifications of many RNAs, including ribosomal RNAs and spliceosomal RNAs. Moved from ncRNA_gene to sncRNA_gene 27 April 2021 to be more consistent with the organization of the ncRNA branch of SO. Requested by FlyBase, moved by Dave Sant. See GitHub Issue #514.

snoRNA_primary_transcript [SO_0000232]

A primary transcript encoding one or more small nucleolar RNAs (SO:0000275). This definition was broadened 26 Jan 2021 to reflect that a single transcript can encode one or more snoRNAs. Brought to our attention by FlyBase. GitHub Issue #520 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/520).

SNP [SO_0000694]

SNPs are single base pair positions in genomic DNA at which different sequence alternatives exist in normal individuals in some population(s), wherein the least frequent variant has an abundance of 1% or greater.

snRNA_encoding [SO_0000623]

A region that can be transcribed into a small nuclear RNA (snRNA).

snRNA_gene [SO_0001268]

A gene that encodes a small nuclear RNA. Moved from ncRNA_gene to sncRNA_gene 27 April 2021 to be more consistent with the organization of the ncRNA branch of SO. Requested by FlyBase, moved by Dave Sant. See GitHub Issue #514.

snRNA_primary_transcript [SO_0000231]

A primary transcript encoding a small nuclear RNA (SO:0000274).

SNV [SO_0001483]

SNVs are single base pair positions in genomic DNA at which different sequence alternatives exist.

solo_LTR [SO_0001003]

A recombination product between the 2 LTR of the same element. Requested by Hadi Quesneville January 2007.

somatic_variant [SO_0001777]

A variant that has arisen after splitting of the embryo, resulting in the variant being found in only some of the tissues or cells of the body.

sonicate_fragment [SO_0001253]

A DNA fragment generated by sonication. Sonication is a technique used to sheer DNA into smaller fragments.

sORF [SO_0002028]

An open reading frame that encodes a peptide of less than 100 amino acids.

SP6_RNA_Polymerase_Promoter [SO_0001205]

A region (DNA) to which the SP6 RNA polymerase binds, to begin transcription.

specific_recombination_site [SO_0000299]

A location where recombination or occurs during mitosis or meiosis.

splice_acceptor_variant [SO_0001574]

A splice variant that changes the 2 base region at the 3’ end of an intron.

splice_donor_5th_base_variant [SO_0001787]

A sequence variant that causes a change at the 5th base pair after the start of the intron in the orientation of the transcript.

splice_donor_variant [SO_0001575]

A splice variant that changes the 2 base pair region at the 5’ end of an intron.

splice_enhancer [SO_0000344]

Region of a transcript that regulates splicing.

splice_junction [SO_0001421]

The boundary between an intron and an exon.

splice_region [SO_0001902]

A region surrounding a cis_splice site, either within 1-3 bases of the exon or 3-8 bases of the intron.

splice_region_variant [SO_0001630]

A sequence variant in which a change has occurred within the region of the splice site, either within 1-3 bases of the exon or 3-8 bases of the intron. EBI term: splice site - 1-3 bps into an exon or 3-8 bps into an intron.

splice_site [SO_0000162]

Consensus region of primary transcript bordering junction of splicing. A region that overlaps exactly 2 base and adjacent_to splice_junction. With spliceosomal introns, the splice sites bind the spliceosomal machinery.

splice_site_variant [SO_0001629]

A sequence variant that changes the first two or last two bases of an intron, or the 5th base from the start of the intron in the orientation of the transcript. EBI term - essential splice site - In the first 2 or the last 2 base pairs of an intron. The 5th base is on the donor (5’) side of the intron. Updated to b in line with Cancer Genome Project at the Sanger.

spliced_leader_RNA [SO_0000636]

Snall nuclear RNAs that are incorporated into the pre-mRNAs to replace the 5’ end in some eukaryotes.

spliceosomal_intron [SO_0000662]

An intron which is spliced by the spliceosome. GO:0000398.

spliceosomal_intron_region [SO_0000841]

A region within an intron. A terms added to allow the parts of introns to have is_a paths to the root.

splicing_regulatory_region [SO_0001056]

A regulatory_region that modulates splicing. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

splicing_variant [SO_0001568]

A sequence variant that changes the process of splicing.

SRP_RNA [SO_0000590]

The signal recognition particle (SRP) is a universally conserved ribonucleoprotein. It is involved in the co-translational targeting of proteins to membranes. The eukaryotic SRP consists of a 300-nucleotide 7S RNA and six proteins: SRPs 72, 68, 54, 19, 14, and 9. Archaeal SRP consists of a 7S RNA and homologues of the eukaryotic SRP19 and SRP54 proteins. In most eubacteria, the SRP consists of a 4.5S RNA and the Ffh protein (a homologue of the eukaryotic SRP54 protein). Eukaryotic and archaeal 7S RNAs have very similar secondary structures, with eight helical elements. These fold into the Alu and S domains, separated by a long linker region. Eubacterial SRP is generally a simpler structure, with the M domain of Ffh bound to a region of the 4.5S RNA that corresponds to helix 8 of the eukaryotic and archaeal SRP S domain. Some Gram-positive bacteria (e.g. Bacillus subtilis), however, have a larger SRP RNA that also has an Alu domain. The Alu domain is thought to mediate the peptide chain elongation retardation function of the SRP. The universally conserved helix which interacts with the SRP54/Ffh M domain mediates signal sequence recognition. In eukaryotes and archaea, the SRP19-helix 6 complex is thought to be involved in SRP assembly and stabilizes helix 8 for SRP54 binding.

SRP_RNA_encoding [SO_0000642]

A region that can be transcribed into a signal recognition particle RNA (SRP RNA).

SRP_RNA_gene [SO_0001269]

A noncoding RNA that binds to the ribosome to halt protein synthesis when the signal peptide is present.

SRP_RNA_primary_transcript [SO_0000589]

A primary transcript encoding a signal recognition particle RNA.

ss_oligo [SO_0000441]

A single stranded oligonucleotide. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

ss_RNA_viral_sequence [SO_0001199]

A ss_RNA_viral_sequence is a viral_sequence that is the sequence of a virus that exists as single stranded RNA.

st_turn_left_handed_type_one [SO_0001142]

The peptide twists in an anticlockwise, left handed manner. The dihedral angles for this turn are: Residue(i): -140 degrees < chi(1) -120 degrees < -20 degrees, -90 degrees psi +120 degrees < +40 degrees, residue(i+1): -140 degrees < phi < -20 degrees, -90 < psi < +40 degrees.

st_turn_left_handed_type_two [SO_0001143]

The peptide twists in an anticlockwise, left handed manner. The dihedral angles for this turn are: Residue(i): -140 degrees < chi(1) -120 degrees < -20 degrees, +80 degrees psi +120 degrees < +180 degrees, residue(i+1): +20 degrees < phi < +140 degrees, -40 < psi < +90 degrees.

st_turn_right_handed_type_one [SO_0001144]

The peptide twists in an clockwise, right handed manner. The dihedral angles for this turn are: Residue(i): -140 degrees < chi(1) -120 degrees < -20 degrees, -90 degrees psi +120 degrees < +40 degrees, residue(i+1): -140 degrees < phi < -20 degrees, -90 < psi < +40 degrees.

st_turn_right_handed_type_two [SO_0001145]

The peptide twists in an clockwise, right handed manner. The dihedral angles for this turn are: Residue(i): -140 degrees < chi(1) -120 degrees < -20 degrees, +80 degrees psi +120 degrees < +180 degrees, residue(i+1): +20 degrees < phi < +140 degrees, -40 < psi < +90 degrees.

standard_draft [SO_0001486]

The status of a whole genome sequence, where the data is minimally filtered or un-filtered, from any number of sequencing platforms, and is assembled into contigs. Genome sequence of this quality may harbour regions of poor quality and can be relatively incomplete.

start_lost [SO_0002012]

A codon variant that changes at least one base of the canonical start codon. Request from Uma Devi Paila, UVA. This term should not be applied to incomplete transcripts.

start_retained_variant [SO_0002019]

A sequence variant where at least one base in the start codon is changed, but the start remains. Requested by Uma Paila as this term is annotated by snpEff. This would be used for non_AUG start codon annotation.

status [SO_0000905]

An attribute describing the status of a feature, based on the available evidence. This term is the hypernym of attributes and should not be annotated to.

stem_loop [SO_0000313]

A double-helical region of nucleic acid formed by base-pairing between adjacent (inverted) complementary sequences.

sterol_regulatory_element [SO_0001861]

A 10-bp promoter element bound by sterol regulatory element binding proteins (SREBPs), found in promoters of genes involved in sterol metabolism. Many variants of the sequence ATCACCCCAC function as SREs.

sticky_end_restriction_enzyme_cleavage_site [SO_0001692]

A site where restriction enzymes can cleave that will produce an overhang or ‘sticky end’.

stop_codon [SO_0000319]

In mRNA, a set of three nucleotides that indicates the end of information for protein synthesis.

stop_codon_read_through [SO_0000883]

A stop codon redefined to be a new amino acid.

stop_codon_redefined_as_pyrrolysine [SO_0000884]

A stop codon redefined to be the new amino acid, pyrrolysine.

stop_codon_redefined_as_selenocysteine [SO_0000885]

A stop codon redefined to be the new amino acid, selenocysteine.

stop_codon_signal [SO_1001288]

A recoding stimulatory signal that is a stop codon and has effect on efficiency of recoding. This term does not include the stop codons that are redefined. An example would be a stop codon that partially overlapped a frame shifting site would be an example stimulatory signal.

stop_gained [SO_0001587]

A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened polypeptide. EBI term: Stop gained - In coding sequence, resulting in the gain of a stop codon (i.e. leading to a shortened peptide sequence).

stop_lost [SO_0001578]

A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript. EBI term: Stop lost - In coding sequence, resulting in the loss of a stop codon.

stop_retained_variant [SO_0001567]

A sequence variant where at least one base in the terminator codon is changed, but the terminator remains.

strand_attribute [SO_0000983]

The attribute of how many strands are present in a nucleotide polymer. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

STREP_motif [SO_0001859]

A promoter element with consensus sequence CCCCTC, bound by the PKA-responsive zinc finger transcription factor Rst2.

stRNA_encoding [SO_0000656]

A region that can be transcribed into a small temporal RNA (stRNA). Found in roundworm development.

structural_alteration [SO_0001785]

An alteration of the genome that leads to a change in the structure of one or more chromosomes.

structural_interaction_variant [SO_0002093]

A variant that impacts the internal interactions of the resulting polypeptide structure. Requested by Pablo Cingolani. The way I calculate this is simply by looking at the PDB entry of one protein and then marking those AA that are within 3 Angstrom of each other (and far away in the AA sequence, e.g. over 20 AA distance). The assumption is that, since they are very close in distance, they must be “interacting” and thus important for protein structure.

STS [SO_0000331]

Short (typically a few hundred base pairs) DNA sequence that has a single occurrence in a genome and whose location and base sequence are known.

STS_map [SO_0001251]

An STS map is a physical map organized by the unique STS landmarks.

substitution [SO_1000002]

A sequence alteration where the length of the change in the variant is the same as that of the reference.

subtelomere [SO_0001997]

A heterochromatic region of the chromosome, adjacent to the telomere (on the centromeric side) that contains repetitive DNA and sometimes genes and it is transcribed.

sugar_edge_base_pair [SO_0000030]

A type of non-canonical base-pairing.

supercontig [SO_0000148]

One or more contigs that have been ordered and oriented using end-read information. Contains gaps that are filled with N’s.

supported_by_domain_match [SO_0000908]

An attribute to describe a feature that has been predicted using sequence similarity of a known domain.

supported_by_EST_or_cDNA [SO_0000909]

An attribute to describe a feature that has been predicted using sequence similarity to EST or cDNA data.

supported_by_sequence_similarity [SO_0000907]

An attribute to describe a feature that has been predicted using sequence similarity techniques.

SVA_deletion [SO_0002068]

A deletion of an SVA mobile element.

SVA_insertion [SO_0002065]

An insertion of sequence from the SVA family of mobile elements.

symbiosis_island [SO_0000776]

A transmissible element containing genes involved in symbiosis, analogous to the pathogenicity islands of gram negative bacteria. Nitrogen fixation in Rhizobiaceae species is encoded by symbiosis islands. Evolution of rhizobia by acquisition of a 500-kb symbiosis island that integrates into a phe-tRNA gene. John T. Sullivan and Clive W. Ronso PNAS 1998 Apr 28 95 (9) 5145-5149.

symmetric_RNA_internal_loop [SO_0000025]

An internal RNA loop where the extent of the loop on both stands is the same size.

synonymous [SO_0001815]

A variant that does not lead to any change in the amino acid sequence.

syntenic_region [SO_0005858]

A region in which two or more pairs of homologous markers occur on the same chromosome in two or more species.

synthetic_oligo [SO_0001247]

An oligo composed of synthetic nucleotides.

synthetic_sequence [SO_0000351]

An attribute to decide a sequence of nucleotides, nucleotide analogs, or amino acids that has been designed by an experimenter and which may, or may not, correspond with any natural sequence.

T_cell_receptor_gene [SO_0002133]

A T-cell receptor germline gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

T_cell_receptor_pseudogene [SO_0002099]

A pseudogene derived from a T-cell receptor gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

T_loop [SO_0001177]

Non-base-paired sequence of three nucleotide bases in tRNA. It has sequence T-Psi-C.

T_to_A_transversion [SO_1000021]

A transversion from T to A.

T_to_C_transition [SO_1000013]

A transition of a thymine to a cytidine.

T_to_G_transversion [SO_1000022]

A transversion from T to G.

T3_RNA_Polymerase_Promoter [SO_0001206]

A DNA sequence to which the T3 RNA polymerase binds, to begin transcription.

T7_RNA_Polymerase_Promoter [SO_0001207]

A region (DNA) to which the T7 RNA polymerase binds, to begin transcription.

tag [SO_0000324]

A nucleotide sequence that may be used to identify a larger sequence.

tandem [SO_0001513]

An insertion of extension of a tandem repeat.

tandem_duplication [SO_1000173]

A duplication consisting of 2 identical adjacent regions.

tandem_repeat [SO_0000705]

Two or more adjacent copies of a region (of length greater than 1).

target_site_duplication [SO_0000434]

A sequence of the target DNA that is duplicated when a transposable element or phage inserts; usually found at each end the insertion.

targeting_vector [SO_0001644]

An engineered vector that is able to take part in homologous recombination in a host with the intent of introducing site specific genomic modifications.

tasiRNA [SO_0001800]

The sequence of a 21 nucleotide double stranded, polyadenylated non coding RNA, transcribed from the TAS gene.

tasiRNA_primary_transcript [SO_0001801]

A primary transcript encoding a tasiRNA.

TCS_element [SO_0002044]

A TCS element is a (yeast) transcription factor binding site, bound by the TEA DNA binding domain (DBD) of transcription factors. The consensus site is CATTCC or CATTCT. Requested by Rama - SGD.

TCT_motif [SO_0001959]

A cis-regulatory element, conserved sequence YYC+1TTTYY, and spans -2 to +6 relative to +1 TSS. It is present in most ribosomal protein genes in Drosophila and mammals but not in the yeast Saccharomyces cerevisiae. Resembles the initiator (TCAKTY in Drosophila) but functionally distinct from initiator.

telomerase_RNA_gene [SO_0001643]

A telomerase RNA gene is a non coding RNA gene the RNA product of which is a component of telomerase.

telomeric_repeat [SO_0001496]

The telomeric repeat is a repeat region, part of the chromosome, which in yeast, is a G-rich terminal sequence of the form (TG(1-3))n or more precisely ((TG)(1-6)TG(2-3))n. The repeats are maintained by telomerase and there is generally 300 (+/-) 75 bp of TG(1-3) at a given end. Telomeric repeats function in completing chromosome replication and protecting the ends from degradation and end-to-end fusions. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880739.

telomeric_transcript [SO_0001927]

A non-coding transcript derived from the transcript of the telomere.

template_region [SO_0000978]

A region of a guide_RNA that specifies the insertions and deletions of bases in the editing of a target mRNA.

terminal_inverted_repeat [SO_0000481]

An inverted repeat (SO:0000294) occurring at the termini of a DNA transposon.

terminal_inverted_repeat_element [SO_0000208]

A DNA transposable element defined as having termini with perfect, or nearly perfect short inverted repeats, generally 10 - 40 nucleotides long.

terminator_codon_variant [SO_0001590]

A sequence variant whereby at least one of the bases in the terminator codon is changed. The terminal codon may be the terminator, or in an incomplete transcript the last available codon.

terminator_of_type_2_RNApol_III_promoter [SO_0000615]

A terminator signal for RNA polymerase III transcription.

TERRA [SO_0001923]

A non-coding RNA transcript, derived from the transcription of the telomere. These transcripts contain G rich telomeric RNA repeats and RNA tracts corresponding to adjacent subtelomeric sequences. They are 100-9000 bases long. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.

tetranucleotide_repeat_microsatellite_feature [SO_0000641]

A region of a repeating tetranucleotide sequence (four bases).

TF_binding_site [SO_0000235]

A DNA site where a transcription factor binds. Definition updated along with definitions in Mejia-Almonte et.al PMID:32665585. Added relationship part_of SO:0000727 CRM in place of previous CRM relationship has_part TF_binding_site August 2020 in response to requests from GREEKC initiative. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

TF_binding_site_variant [SO_0001782]

A sequence variant located within a transcription factor binding site.

TFBS_ablation [SO_0001895]

A feature ablation whereby the deleted region includes a transcription factor binding site. Created in conjunction with the EBI.

TFBS_amplification [SO_0001892]

A feature amplification of a region containing a transcription factor binding site. Created in conjunction with the EBI.

TFBS_fusion [SO_0001888]

A fusion where the deletion brings together transcription factor binding sites. Created in conjunction with the EBI.

TFBS_translocation [SO_0001885]

A feature translocation where the region contains a transcription factor binding site. Created in conjunction with the EBI.

three_methylcytidine [SO_0001281]

3-methylcytidine is a modified cytidine.

three_methylpseudouridine [SO_0001377]

3_methylpseudouridine is a modified uridine base feature.

three_methyluridine [SO_0001372]

3_methyluridine is a modified uridine base feature.

three_prime_cis_splice_site [SO_0000164]

Intronic 2 bp region bordering the exon, at the 3’ edge of the intron. A splice_site that is upstream_adjacent_to exon and finishes intron.

three_prime_clip [SO_0000557]

3’-most region of a precursor transcript that is clipped off during processing.

three_prime_coding_exon [SO_0000202]

The coding exon that is most 3-prime on a given transcript.

three_prime_coding_exon_coding_region [SO_0000197]

The sequence of the three_prime_coding_exon that codes for protein.

three_prime_coding_exon_noncoding_region [SO_0000484]

The sequence of the 3’ exon that is not coding.

three_prime_D_heptamer [SO_0000493]

7 nucleotide recombination site like CACAGTG, part of a 3’ D-recombination signal sequence of an immunoglobulin/T-cell receptor gene.

three_prime_D_nonamer [SO_0000494]

A 9 nucleotide recombination site (e.g. ACAAAAACC), part of a 3’ D-recombination signal sequence of an immunoglobulin/T-cell receptor gene.

three_prime_D_recombination_signal_sequence [SO_0000570]

Recombination signal of an immunoglobulin/T-cell receptor gene, including the 3’ D-heptamer (SO:0000493), 3’ D-spacer, and 3’ D-nonamer (SO:0000494) in 3’ of the D-region of a D-gene.

three_prime_D_spacer [SO_0000495]

A 12 or 23 nucleotide spacer between the 3’D-HEPTAMER and 3’D-NONAMER of a 3’D-RS.

three_prime_EST [SO_0001209]

An EST read from the 3’ end of a transcript. They are more likely to fall within non-coding, or untranslated regions(UTRs).

three_prime_five_prime_overlap [SO_0000076]

An attribute to describe a gene when the 3’ region overlaps with another gene’s 5’ region.

three_prime_flanking_region [SO_0001417]

A flanking region located three prime of a specific region.

three_prime_intron [SO_0000192]

An intron that is the most 3-prime in a given transcript.

three_prime_LTR [SO_0000426]

The long terminal repeat found at the three-prime end of the sequence to be inserted into the host genome.

three_prime_LTR_component [SO_0000849]

A component of the three-prime long terminal repeat.

three_prime_noncoding_exon [SO_0000444]

Non-coding exon in the 3’ UTR.

three_prime_overlapping_ncrna [SO_0002120]

Transcript where ditag (digital gene expression profiling)and/or published experimental data strongly supports the existence of short non-coding transcripts transcribed from the 3’UTR. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes.

three_prime_RACE_clone [SO_0001433]

A three prime RACE (Rapid Amplification of cDNA Ends) clone is a cDNA clone copied from the 3’ end of an mRNA (using a poly-dT primer to capture the polyA tail and a gene-specific or randomly primed 5’ primer), and spliced into a vector for propagation in a suitable host.

three_prime_recoding_site [SO_1001277]

The recoding stimulatory signal located downstream of the recoding site.

three_prime_repeat_recoding_signal [SO_1001286]

A recoding stimulatory signal, downstream sequence important for recoding that contains repetitive elements.

three_prime_restriction_enzyme_junction [SO_0001690]

The restriction enzyme cleavage junction on the 3’ strand of the nucleotide sequence.

three_prime_RST [SO_0001468]

A tag produced from a single sequencing read from a 3’-RACE product; typically a few hundred base pairs long.

three_prime_stem_loop_structure [SO_1001279]

A recoding stimulatory region, the stem-loop secondary structural element is downstream of the redefined region.

three_prime_sticky_end_restriction_enzyme_cleavage_site [SO_0001976]

A restriction enzyme recognition site that, when cleaved, results in 3 prime overhangs. Requested by Jackie Quinn. The sticky restriction sites are different from junctions because they include the sequence that is cut, inclusive of the five prime junction and the three prime junction.

three_prime_terminal_inverted_repeat [SO_0000421]

An inverted repeat (SO:0000294) occurring at the 3-prime termini of a DNA transposon.

three_prime_three_prime_overlap [SO_0000075]

An attribute to describe a gene when the 3’ region overlaps with another gene’s 3’ region.

three_prime_UST [SO_0001465]

A UST located in the 3’UTR of a protein-coding transcript.

three_prime_UTR [SO_0000205]

A region at the 3’ end of a mature transcript (following the stop codon) that is not translated into a protein.

three_prime_UTR_intron [SO_0000448]

An intron located in the 3’ UTR.

three_three_amino_three_carboxypropyl_uridine [SO_0001353]

3_3_amino_3_carboxypropyl_uridine is a modified uridine base feature.

three_two_prime_O_dimethyluridine [SO_0001375]

3_2prime_O_dimethyluridine is a modified uridine base feature.

threonine [SO_0001445]

A polar, hydorophilic amino acid encoded by the codons ACN (ACT, ACC, ACA and ACG). A place holder for a cross product with chebi.

threonine_tRNA_primary_transcript [SO_0000227]

A primary transcript encoding threonyl tRNA (SO:000270).

threonyl_tRNA [SO_0000270]

A tRNA sequence that has a threonine anticodon, and a 3’ threonine binding region.

tiling_path [SO_0000472]

A set of regions which overlap with minimal polymorphism to form a linear sequence.

tiling_path_clone [SO_0000480]

A clone which is part of a tiling path. A tiling path is a set of sequencing substrates, typically clones, which have been selected in order to efficiently cover a region of the genome in preparation for sequencing and assembly.

tiling_path_fragment [SO_0000474]

A piece of sequence that makes up a tiling_path (SO:0000472).

tmRNA_acceptor_piece [SO_0000770]

The acceptor region of a two-piece tmRNA that when mature is charged at its 3’ end with alanine. The tmRNA gene undergoes circular permutation in some groups of bacteria; processing of the transcripts from such a gene leaves the mature tmRNA in two pieces, base-paired together. Added in response to Kelly Williams from Indiana. Date: Nov 2005.

tmRNA_coding_piece [SO_0000769]

The region of a two-piece tmRNA that bears the reading frame encoding the proteolysis tag. The tmRNA gene undergoes circular permutation in some groups of bacteria. Processing of the transcripts from such a gene leaves the mature tmRNA in two pieces, base-paired together. Added in response to comment from Kelly Williams from Indiana. Nov 2005.

tmRNA_encoding [SO_0000659]

A region that can be transcribed into a transfer-messenger RNA (tmRNA).

tmRNA_gene [SO_0001271]

A bacterial RNA with both tRNA and mRNA like properties. Moved from ncRNA_gene to sncRNA_gene 27 April 2021 to be more consistent with the organization of the ncRNA branch of SO. Requested by FlyBase, moved by Dave Sant. See GitHub Issue #514.

tmRNA_primary_transcript [SO_0000586]

A primary transcript encoding a tmRNA (SO:0000584).

tmRNA_region [SO_0000847]

A region of a tmRNA. This term was added to provide a grouping term for the region parts of tmRNA, thus giving them an is_a path back to the root.

TNA [SO_0001190]

An attribute describing a sequence consisting of nucleobases attached to a repeating unit made of threose rings connected to a phosphate backbone. Do not use this term for feature annotation. Use TNA_oligo (SO:0001191) instead.

tnaORF [SO_0002029]

A translated ORF encoded entirely within the antisense strand of a known protein coding gene.

topologically_defined_region [SO_0001412]

A DNA region within which self-interaction occurs more often than expected by chance because of DNA-looping.

topology_attribute [SO_0000986]

The attribute of whether a nucleotide polymer is linear or circular. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

TR_box [SO_0001858]

A promoter element with consensus sequence TTCTTTGTTY, bound an HMG-box transcription factor such as S. pombe Ste11, and found in promoters of genes up-regulated early in meiosis.

TR_C_Gene [SO_0002134]

A constant (C) gene, a gene that codes the constant region of a T-cell receptor chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

TR_D_Gene [SO_0002135]

A gene that rearranges at the DNA level and codes the diversity region of the variable domain of aT-cell receptor gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

TR_J_Gene [SO_0002136]

A joining gene that rearranges at the DNA level and codes the joining region of the variable domain of aT-cell receptor chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

TR_J_pseudogene [SO_0002104]

A pseudogenic joining region which closely resembles a known functional T receptor (TR) joining gene but in which the coding region has stop codons, frameshift mutations or a mutation that effects the initiation codon that rearranges at the DNA level and codes the joining region of the variable domain of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

TR_V_Gene [SO_0002137]

A variable gene that rearranges at the DNA level and codes the variable region of the variable domain of aT-cell receptor chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

TR_V_pseudogene [SO_0002103]

A pseudogenic variable region which closely resembles a known functional T receptor variable gene but in which the coding region has stop codons, frameshift mutations or a mutation that effects the initiation codon that rearranges at the DNA level and codes the variable region of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

trans_splice_acceptor_site [SO_0000706]

The 3’ splice site of the acceptor primary transcript. This region contains a polypyridine tract and AG dinucleotide in some organisms and is UUUCAG in C. elegans.

trans_splice_donor_site [SO_0000707]

The 5’ five prime splice site region of the donor RNA. SL RNA contains a donor site.

trans_splice_junction [SO_0001474]

The boundary between the spliced leader and the first exon of the mRNA.

trans_splice_site [SO_0001420]

Primary transcript region bordering trans-splice junction.

trans_spliced [SO_0000870]

An attribute describing transcript sequence that is created by splicing exons from diferent genes.

trans_spliced_mRNA [SO_0000872]

An mRNA that is trans-spliced.

trans_spliced_transcript [SO_0000479]

A transcript that is trans-spliced.

transcribed_cluster [SO_0001457]

A region defined by a set of transcribed sequences from the same gene or expressed pseudogene. This term was requested by Jeff Bowes, using the tracker, ID = 2594157.

transcribed_fragment [SO_0001418]

An experimental region, defined by a tiling array experiment to be transcribed at some level. Term requested by the MODencode group.

transcribed_processed_pseudogene [SO_0002109]

A processed_pseudogene overlapped by locus-specific evidence of transcription. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

transcribed_spacer_region [SO_0000638]

Part of an rRNA transcription unit that is transcribed but discarded during maturation, not giving rise to any part of rRNA.

transcribed_unitary_pseudogene [SO_0002108]

A species specific unprocessed pseudogene without a parent gene, as it has an active orthologue in another species. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

transcribed_unprocessed_pseudogene [SO_0002107]

A unprocessed pseudogene supported by locus-specific evidence of transcription. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

transcript [SO_0000673]

An RNA synthesized on a DNA or RNA template by an RNA polymerase. Added relationship overlaps SO:0002300 unit_of_gene_expression with Mejia-Almonte et.al PMID:32665585 Aug 5, 2020.

transcript_ablation [SO_0001893]

A feature ablation whereby the deleted region includes a transcript feature. Created in conjunction with the EBI.

transcript_amplification [SO_0001889]

A feature amplification of a region containing a transcript. Created in conjunction with the EBI.

transcript_attribute [SO_0000237]

An attribute describing a transcript.

transcript_bound_by_nucleic_acid [SO_0000278]

A transcript that is bound by a nucleic acid. Formerly called transcript_by_bound_nucleic_acid.

transcript_bound_by_protein [SO_0000279]

A transcript that is bound by a protein. Formerly called transcript_by_bound_protein.

transcript_function_variant [SO_0001538]

A sequence variant which alters the functioning of a transcript with respect to a reference sequence.

transcript_fusion [SO_0001886]

A feature fusion where the deletion brings together transcript regions. Created in conjunction with the EBI.

transcript_processing_variant [SO_0001543]

A sequence variant that affects the post transcriptional processing of a transcript with respect to a reference sequence.

transcript_region [SO_0000833]

A region of a transcript. This term was added to provide a grouping term for the region parts of transcript, thus giving them an is_a path back to the root.

transcript_regulatory_region_fusion [SO_0001890]

A feature fusion where the deletion brings together a regulatory region and a transcript region. Created in conjunction with the EBI.

transcript_secondary_structure_variant [SO_0001596]

A sequence variant within a transcript that changes the secondary structure of the RNA product.

transcript_stability_variant [SO_0001546]

A variant that changes the stability of a transcript with respect to a reference sequence.

transcript_translocation [SO_0001883]

A feature translocation where the region contains a transcript. Created in conjunction with the EBI.

transcript_variant [SO_0001576]

A sequence variant that changes the structure of the transcript.

transcript_with_translational_frameshift [SO_0000118]

A transcript with a translational frameshift.

transcription_end_site [SO_0000616]

The base where transcription ends.

transcription_pause_site [SO_0002047]

Transcription pause sites are regions of a gene where RNA polymerase may pause during transcription. The functional role of pausing may be to facilitate factor recruitment, RNA folding, and synchronization with translation. Consensus transcription pause site have been observed in E. coli. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

transcription_start_cluster [SO_0001915]

A region defined by a cluster of experimentally determined transcription starting sites.

transcription_variant [SO_0001549]

A variant that changes alters the transcription of a transcript with respect to a reference sequence.

transcriptional_cis_regulatory_region [SO_0001055]

A regulatory_region that modulates the transcription of a gene or genes. Previous parent term transcription_regulatory_region (SO:0001067) has been merged with this term on 11 Feb 2021 as part of the GREEKC consortium. See GitHub Issue #527.

transcriptionally_constitutive [SO_0000124]

Expressed in relatively constant amounts without regard to cellular environmental conditions such as the concentration of a particular substrate.

transcriptionally_induced [SO_0000125]

An inducer molecule is required for transcription to occur.

transcriptionally_regulated [SO_0000123]

An attribute describing a gene that is regulated at transcription. By:<protein_id>.

transcriptionally_repressed [SO_0000126]

A repressor molecule is required for transcription to stop.

transgene [SO_0000902]

A gene that has been transferred naturally or by any of a number of genetic engineering techniques into a cell or organism where it is foreign (i.e. does not belong to the host genome). Transgenes can exist as integrated into the host genome, or extra-chromosomally on replicons or transiently carried/expressed vectors. What matters is that they are active in the context of a foreign biological system (typically a cell or organism). Note that transgenes as defined here are not necessarily from a different taxon than that of the host genome. For example, a Mus musculus gene over-expressed from a chromosomally-integrated expression construct in a Mus musculus genome qualifies as a transgene because it is exogenous to the host genome. On the relationship between ’transgenic insertions’, ’transgenes’, and ‘alleles’ Transgenic insertions are sequence alterations comprised of foreign/exogenous sequence. This sequence can be from the same or different species as the host cell or genome - it is exogenous in virtue of it being additional sequence inserted into the original host genome. A given transgenic insertion may create one or more transgenes when introduced into a host genome. The extent of a transgene is spans all features needed to drive its expression in the host genome. In most cases a transgenic insertion completely contains one or more transgenes that are fully competent to drive expression in the host genome. But in some cases, a transgenic insertion may carry only part of the final transgene it creates - which requires additional endogenous sequences in the vicinity of its insertion site to complete a functional gene (e.g. this is the case for enhancer traps or gene traps) to complete. In addition to the transgenes they create upon genomic integration, transgenic insertions can create variant alleles by disrupting a known endogenous gene/locus. Variant alleles are versions of a particular genomic features (typically genes), that are altered in their sequence relative to some reference. An insertion that disrupts an endogenous gene would be considered a ‘sequence alteration’ (sensu SO) which creates a ‘variant gene allele’. From the perspective of this disrupted gene, the origin or transgenic nature of this insertion is irrelevant - what matters here is that the gene’s sequence has been altered to create an allele. For the purposes of modeling, any transgene(s) created when an endogenous gene is interrupted by an insertion is considered/modeled separately from the allele of the endogenous gene that is created by the insertion. The transgenic insertion, which is simply a sequence alteration in the host genome, is then linked to any transgenes that it contributes to or overlaps with or contains. The model of the Flybase example HERE illustrates this approach.

transgenic [SO_0000781]

Attribute describing sequence that has been integrated with foreign sequence.

transgenic_insertion [SO_0001218]

An insertion that derives from another organism, via the use of recombinant DNA technology.

transgenic_transposable_element [SO_0000796]

TE that has been modified in vitro, including insertion of DNA derived from a source other than the originating TE. Modified as requested by Lynn - FB. May 2007.

transit_peptide [SO_0000725]

The transit_peptide is a short region at the N-terminus of the peptide that directs the protein to an organelle (chloroplast, mitochondrion, microbody or cyanelle). Added to bring SO inline with the EMBL, DDBJ, GenBank feature table. Old definition before biosapiens: The coding sequence for an N-terminal domain of a nuclear-encoded organellar protein. This domain is involved in post translational import of the protein into the organelle.

transit_peptide_region_of_CDS [SO_0002252]

CDS region corresponding to a transit peptide region of a polypeptide. Added as per request from GitHub Issue #484 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/484)

transition [SO_1000009]

Change of a pyrimidine nucleotide, C or T, into an other pyrimidine nucleotide, or change of a purine nucleotide, A or G, into an other purine nucleotide.

translated_nucleotide_match [SO_0000181]

A match against a translated sequence.

translated_processed_pseudogene [SO_0002105]

A processed pseudogene where there is evidence, (mass spec data) suggesting that it is also translated. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

translated_unprocessed_pseudogene [SO_0002106]

A non-processed pseudogene where there is evidence, (mass spec data) suggesting that it is also translated. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

translation_regulatory_region [SO_0001680]

A regulatory region that is involved in the control of the process of translation.

translational_product_function_variant [SO_0001539]

A sequence variant that affects the functioning of a translational product with respect to a reference sequence.

translational_product_level_variant [SO_0001553]

A functional variant that changes the translational product level with respect to a reference sequence.

translational_product_structure_variant [SO_0001598]

A sequence variant within the transcript that changes the structure of the translational product.

translationally_frameshifted [SO_0000887]

Recoding by frameshifting a particular site.

translationally_regulated [SO_0000131]

An attribute describing a gene that is regulated as it is translated.

translationally_regulated_gene [SO_0000896]

A gene that is translationally regulated.

translocation [SO_0000199]

A region of nucleotide sequence that has translocated to a new position.

translocation_breakpoint [SO_0001413]

The point within a chromosome where a translocation begins or ends.

translocation_element [SO_0000686]

A chromosomal translocation whereby the chromosomes carrying non-homologous centromeres may be recovered independently. These chromosomes are described as translocation elements. This occurs for some translocations, particularly but not exclusively, reciprocal translocations.

translocaton_attribute [SO_0001520]

An attribute of a translocation, which is then a region of nucleotide sequence that has translocated to a new position. The observed adjacency of two previously separated regions.

transmembrane_helix [SO_0001812]

A region that traverses the lipid bilayer and adopts a helical secondary structure.

transmembrane_polypeptide_region [SO_0001077]

Polypeptide region traversing the lipid bilayer.

transposable_element_CDS [SO_0001896]

A CDS that is part of a transposable element.

transposable_element_flanking_region [SO_0000364]

The region of sequence surrounding a transposable element.

transposable_element_gene [SO_0000111]

A gene encoded within a transposable element. For example gag, int, env and pol are the transposable element genes of the TY element in yeast.

transposable_element_insertion_site [SO_0000368]

The junction in a genome where a transposable_element has inserted.

transposable_element_pseudogene [SO_0001897]

A pseudogene contained within a transposable element.

transposon_fragment [SO_0001054]

A portion of a transposon, interrupted by the insertion of another element.

trinucleotide_repeat_microsatellite_feature [SO_0000291]

A region of a repeating trinucleotide sequence (three bases).

tRNA [SO_0000253]

Transfer RNA (tRNA) molecules are approximately 80 nucleotides in length. Their secondary structure includes four short double-helical elements and three loops (D, anti-codon, and T loops). Further hydrogen bonds mediate the characteristic L-shaped molecular structure. Transfer RNAs have two regions of fundamental functional importance: the anti-codon, which is responsible for specific mRNA codon recognition, and the 3’ end, to which the tRNA’s corresponding amino acid is attached (by aminoacyl-tRNA synthetases). Transfer RNAs cope with the degeneracy of the genetic code in two manners: having more than one tRNA (with a specific anti-codon) for a particular amino acid; and ‘wobble’ base-pairing, i.e. permitting non-standard base-pairing at the 3rd anti-codon position. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

tRNA_encoding [SO_0000663]

A region that can be transcribed into a transfer RNA (tRNA).

tRNA_gene [SO_0001272]

A noncoding RNA that binds to a specific amino acid to allow that amino acid to be used by the ribosome during translation of RNA. Moved from ncRNA_gene to sncRNA_gene 27 April 2021 to be more consistent with the organization of the ncRNA branch of SO. Requested by FlyBase, moved by Dave Sant. See GitHub Issue #514.

tRNA_intron [SO_1001272]

An intron found in tRNA that is spliced via endonucleolytic cleavage and ligation rather than transesterification. Could be a cross product with Gene ontology, GO:0006388.

tRNA_primary_transcript [SO_0000210]

A primary transcript encoding a transfer RNA (SO:0000253).

tRNA_region [SO_0001172]

A region of a tRNA.

tryptophan [SO_0001440]

A non-polar, hydorophobic amino acid encoded by the codon TGG. A place holder for a cross product with chebi.

tryptophan_tRNA_primary_transcript [SO_0000228]

A primary transcript encoding tryptophanyl tRNA (SO:000271).

tryptophanyl_tRNA [SO_0000271]

A tRNA sequence that has a tryptophan anticodon, and a 3’ tryptophan binding region.

TSS [SO_0000315]

The first base where RNA polymerase begins to synthesize the RNA transcript. Added relationship is_a SO:0002309 core_promoter_element with the creation of core_promoter_element as part of GREEKC initiative August 2020 - Dave Sant.

two_methyladenosine [SO_0001296]

2_methyladenosine is a modified adenosine.

two_methylthio_N6_cis_hydroxyisopentenyl_adenosine [SO_0001303]

2_methylthio_N6_cis_hydroxyisopentenyl_adenosine is a modified adenosine.

two_methylthio_N6_hydroxynorvalyl_carbamoyladenosine [SO_0001309]

2_methylthio_N6_hydroxynorvalyl_carbamoyladenosine is a modified adenosine.

two_methylthio_N6_isopentenyladenosine [SO_0001301]

2_methylthio_N6_isopentenyladenosine is a modified adenosine.

two_methylthio_N6_methyladenosine [SO_0001299]

2_methylthio_N6_methyladenosine is a modified adenosine.

two_methylthio_N6_threonyl_carbamoyladenosine [SO_0001306]

2_methylthio_N6_threonyl_carbamoyladenosine is a modified adenosine.

two_prime_O_methyladenosine [SO_0001298]

2prime_O_methyladenosine is a modified adenosine.

two_prime_O_methylcytidine [SO_0001283]

2’-O-methylcytidine is a modified cytidine.

two_prime_O_methylguanosine [SO_0001327]

2prime_O_methylguanosine is a modified guanosine base feature.

two_prime_O_methylinosine [SO_0001280]

2’-O-methylinosine is a modified inosine.

two_prime_O_methylpseudouridine [SO_0001348]

2prime_O_methylpseudouridine is a modified uridine base feature.

two_prime_O_methyluridine [SO_0001345]

2prime_O_methyluridine is a modified uridine base feature.

two_prime_O_ribosyladenosine_phosphate [SO_0001310]

2prime_O_ribosyladenosine_phosphate is a modified adenosine.

two_prime_O_ribosylguanosine_phosphate [SO_0001331]

2prime_O_ribosylguanosine_phosphate is a modified guanosine base feature.

two_thio_two_prime_O_methyluridine [SO_0001352]

2_thio_2prime_O_methyluridine is a modified uridine base feature.

two_thiocytidine [SO_0001284]

2-thiocytidine is a modified cytidine.

two_thiouridine [SO_0001349]

2_thiouridine is a modified uridine base feature.

tyrosine [SO_0001446]

A polar, hydorophilic amino acid encoded by the codons TAT and TAC. A place holder for a cross product with chebi.

tyrosine_tRNA_primary_transcript [SO_0000229]

A primary transcript encoding tyrosyl tRNA (SO:000272).

tyrosyl_tRNA [SO_0000272]

A tRNA sequence that has a tyrosine anticodon, and a 3’ tyrosine binding region.

U_box [SO_0001788]

An U-box is a conserved T-rich region upstream of a retroviral polypurine tract that is involved in PPT primer creation during reverse transcription.

U12_intron [SO_0000295]

A type of spliceosomal intron spliced by the U12 spliceosome, that includes U11, U12, U4atac/U6atac and U5 snRNAs. May have either GT-AC or AT-AC 5’ and 3’ boundaries.

U14_snoRNA [SO_0000403]

U14 small nucleolar RNA (U14 snoRNA) is required for early cleavages of eukaryotic precursor rRNAs. In yeasts, this molecule possess a stem-loop region (known as the Y-domain) which is essential for function. A similar structure, but with a different consensus sequence, is found in plants, but is absent in vertebrates. An evolutionarily conserved eukaryotic low molecular weight RNA capable of intermolecular hybridization with both homologous and heterologous 18S rRNA.

U14_snoRNA_primary_transcript [SO_0005837]

The primary transcript of an evolutionarily conserved eukaryotic low molecular weight RNA capable of intermolecular hybridization with both homologous and heterologous 18S rRNA.

U2_intron [SO_0000184]

A major type of spliceosomal intron spliced by the U2 spliceosome, that includes U1, U2, U4/U6 and U5 snRNAs. May have either GT-AG or AT-AG 5’ and 3’ boundaries.

U3_five_prime_LTR_region [SO_0000429]

The U3 segment of the three-prime long terminal repeat.

U3_LTR_region [SO_0000424]

The U3 segment of the long terminal repeats.

U3_snoRNA [SO_0001179]

U3 snoRNA is a member of the box C/D class of small nucleolar RNAs. The U3 snoRNA secondary structure is characterised by a small 5’ domain (with boxes A and A’), and a larger 3’ domain (with boxes B, C, C’, and D), the two domains being linked by a single-stranded hinge. Boxes B and C form the B/C motif, which appears to be exclusive to U3 snoRNAs, and boxes C’ and D form the C’/D motif. The latter is functionally similar to the C/D motifs found in other snoRNAs. The 5’ domain and the hinge region act as a pre-rRNA-binding domain. The 3’ domain has conserved protein-binding sites. Both the box B/C and box C’/D motifs are sufficient for nuclear retention of U3 snoRNA. The box C’/D motif is also necessary for nucleolar localization, stability and hypermethylation of U3 snoRNA. Both box B/C and C’/D motifs are involved in specific protein interactions and are necessary for the rRNA processing functions of U3 snoRNA. The definition is most of the old definition for snoRNA (SO:0000275).

U3_three_prime_LTR_region [SO_0000431]

The U3 segment of the three-prime long terminal repeat.

U4_snRNA [SO_0000393]

U4 small nuclear RNA (U4 snRNA) is a component of the major U2-dependent spliceosome. It forms a duplex with U6, and with each splicing round, it is displaced from U6 (and the spliceosome) in an ATP-dependent manner, allowing U6 to refold and create the active site for splicing catalysis. A recycling process involving protein Prp24 re-anneals U4 and U6.

U4atac_snRNA [SO_0000394]

An snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U6atac_snRNA (SO:0000397).

U5_five_prime_LTR_region [SO_0000428]

The U5 segment of the three-prime long terminal repeat.

U5_LTR_region [SO_0000422]

The U5 segment of the long terminal repeats.

U5_three_prime_LTR_region [SO_0000432]

The U5 segment of the three-prime long terminal repeat.

U6_snRNA [SO_0000396]

U6 snRNA is a component of the spliceosome which is involved in splicing pre-mRNA. The putative secondary structure consensus base pairing is confined to a short 5’ stem loop, but U6 snRNA is thought to form extensive base-pair interactions with U4 snRNA.

U6atac_snRNA [SO_0000397]

U6atac_snRNA is an snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U4atac_snRNA (SO:0000394).

UAA_stop_codon_signal [SO_1001283]

A stop codon signal for a UAA stop codon redefinition.

UAG_stop_codon_signal [SO_1001282]

A stop codon signal for a UAG stop codon redefinition.

UGA_stop_codon_signal [SO_1001285]

A stop codon signal for a UGA stop codon redefinition.

ultracontig [SO_0000719]

An ordered and oriented set of scaffolds based on somewhat weaker sets of inferential evidence such as one set of mate pair reads together with supporting evidence from ESTs or location of markers from SNP or microsatellite maps, or cytogenetic localization of contained markers.

unassigned_supercontig [SO_0001875]

A supercontig that is not been assigned to any ultracontig during a genome assembly project. Requested by Bayer Cropscience January, 2012.

uncharacterized_chromosomal_mutation [SO_1000170]

A chromosome structure variant that has not been characterized.

unconfirmed_transcript [SO_0002139]

This is used for non-spliced EST clusters that have polyA features. This category has been specifically created for the ENCODE project to highlight regions that could indicate the presence of protein coding genes that require experimental validation, either by 5’ RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies.

undermodified_hydroxywybutosine [SO_0001335]

Undermodified_hydroxywybutosine is a modified guanosine base feature.

unedited_region [SO_0000607]

The region of an edited transcript that will not be edited.

unidirectional_gene_fusion [SO_0002085]

A sequence variant whereby two genes, on the same strand have become joined. Requested by SNPEFF team. Feb 2016.

unigene_cluster [SO_0001458]

A kind of transcribed_cluster defined by a set of transcribed sequences from the a unique gene. This term was requested by Jeff Bowes, using the tracker, ID = 2594157.

uninverted_insertional_duplication [SO_1000152]

An insertional duplication where a copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments.

uninverted_interchromosomal_transposition [SO_1000157]

An interchromosomal transition where the segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments.

uninverted_intrachromosomal_transposition [SO_1000159]

An intrachromosomal transposition whereby the segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments.

unique_variant [SO_0001764]

A physical quality which inheres to the variant by virtue of the number instances of the variant within a population.

unit_of_gene_expression [SO_0002300]

Transcription units or transcribed coding sequences. Added as per Mejia-Almonte et.al PMID:32665585

unitary_pseudogene [SO_0001759]

A pseudogene, deactivated from original state by mutation, fixed in a population,where the ortholog in a reference species such as mouse remains functional. This is different from a non processed pseudogene because the gene was not duplicated. An example is the L-gulono-lactone oxidase pseudogene in primates.

unoriented_insertional_duplication [SO_1000160]

An insertional duplication where a copy of the segment between the first two breaks listed is inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded. Flag - unknown in the definition.

unoriented_interchromosomal_transposition [SO_1000161]

An interchromosomal transposition whereby a copy of the segment between the first two breaks listed is inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded. FLAG - term describes an unknown.

unoriented_intrachromosomal_transposition [SO_1000162]

An intrachromosomal transposition whereby the segment between the first two breaks listed is removed and inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded. FLAG - definition describes an unknown.

untranslated_region_polycistronic_mRNA [SO_0000242]

The untranslated sequence separating the ‘cistrons’ of multicistronic mRNA.

uORF [SO_0002027]

A short open reading frame that is found in the 5’ untranslated region of an mRNA and plays a role in translational regulation.

UPD [SO_0001744]

Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from one parent and no copies of the same chromosome or region from the other parent.

upstream_AUG_codon [SO_0000630]

A start codon upstream of the ORF.

upstream_gene_variant [SO_0001631]

A sequence variant located 5’ of a gene. Different groups annotate up and downstream to different lengths. The subtypes are specific and are backed up with cross references.

upstream_transcript_variant [SO_0001986]

A feature variant, where the alteration occurs upstream of the transcript TSS. Requested by Graham Ritchie, EBI/Sanger.

uridine_five_oxyacetic_acid [SO_0001356]

Uridine_5_oxyacetic_acid is a modified uridine base feature.

uridine_five_oxyacetic_acid_methyl_ester [SO_0001357]

Uridine_5_oxyacetic_acid_methyl_ester is a modified uridine base feature.

UST [SO_0001464]

An EST spanning part or all of the untranslated regions of a protein-coding transcript.

UST_match [SO_0001470]

A match against an UST sequence.

UTR [SO_0000203]

Messenger RNA sequences that are untranslated and lie five prime or three prime to sequences which are translated.

UTR_intron [SO_0000446]

Intron located in the untranslated region.

UTR_region [SO_0000837]

A region of UTR. A region of UTR. This term is a grouping term to allow the parts of UTR to have an is_a path to the root.

UTR_variant [SO_0001622]

A transcript variant that is located within the UTR.

V_cluster [SO_0000526]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including more than one V-gene.

V_D_DJ_C_cluster [SO_0000527]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one D-gene, one DJ-gene and one C-gene.

V_D_DJ_cluster [SO_0000528]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one D-gene, one DJ-gene.

V_D_DJ_J_C_cluster [SO_0000529]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one D-gene, one DJ-gene, one J-gene and one C-gene.

V_D_DJ_J_cluster [SO_0000530]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one D-gene, one DJ-gene and one J-gene.

V_D_J_C_cluster [SO_0000531]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one V-gene, one D-gene and one J-gene and one C-gene.

V_D_J_cluster [SO_0000532]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one V-gene, one D-gene and one J-gene.

V_DJ_C_cluster [SO_0000542]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one DJ-gene and one C-gene.

V_DJ_cluster [SO_0000518]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene and one DJ-gene.

V_DJ_J_C_cluster [SO_0000564]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one DJ-gene, one J-gene and one C-gene.

V_DJ_J_cluster [SO_0000519]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one DJ-gene and one J-gene.

V_gene_recombination_feature [SO_0000538]

Recombination signal including V-heptamer, V-spacer and V-nonamer in 3’ of V-region of a V-gene or V-sequence of an immunoglobulin/T-cell receptor gene.

V_gene_segment [SO_0000466]

Germline genomic DNA including L-part1, V-intron and V-exon, with the 5’ UTR and 3’ UTR.

V_heptamer [SO_0000533]

7 nucleotide recombination site (e.g. CACAGTG), part of V-gene recombination feature of an immunoglobulin/T-cell receptor gene.

V_J_C_cluster [SO_0000535]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one V-gene, one J-gene and one C-gene.

V_J_cluster [SO_0000534]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one V-gene and one J-gene.

V_nonamer [SO_0000536]

9 nucleotide recombination site (e.g. ACAAAAACC), part of V-gene recombination feature of an immunoglobulin/T-cell receptor gene.

V_region [SO_0001833]

The variable region of an immunoglobulin polypeptide sequence.

V_spacer [SO_0000537]

12 or 23 nucleotide spacer between the V-heptamer and the V-nonamer of a V-gene recombination feature of an immunoglobulin/T-cell receptor gene.

V_VDJ_C_cluster [SO_0000520]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one VDJ-gene and one C-gene.

V_VDJ_cluster [SO_0000521]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene and one VDJ-gene.

V_VDJ_J_C_cluster [SO_0000565]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one VDJ-gene, one J-gene and one C-gene.

V_VDJ_J_cluster [SO_0000522]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one VDJ-gene and one J-gene.

V_VJ_C_cluster [SO_0000523]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one VJ-gene and one C-gene.

V_VJ_cluster [SO_0000524]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene and one VJ-gene.

V_VJ_J_C_cluster [SO_0000566]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one VJ-gene, one J-gene and one C-gene.

V_VJ_J_cluster [SO_0000525]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one V-gene, one VJ-gene and one J-gene.

vacuolar_sorting_signal [SO_0001813]

A polypeptide region that targets a polypeptide to the vacuole.

validated [SO_0000789]

An attribute to describe a feature that has been proven.

validated_cDNA_clone [SO_0000808]

A cDNA clone that has been validated.

valine [SO_0001436]

A non-polar, hydorophobic amino acid encoded by the codons GTN (GTT, GTC, GTA and GTG). A place holder for a cross product with chebi.

valine_tRNA_primary_transcript [SO_0000230]

A primary transcript encoding valyl tRNA (SO:000273).

valyl_tRNA [SO_0000273]

A tRNA sequence that has a valine anticodon, and a 3’ valine binding region.

variant_collection [SO_0001507]

A collection of one or more sequences of an individual.

variant_frequency [SO_0001763]

A physical quality which inheres to the variant by virtue of the number instances of the variant within a population.

variant_genome [SO_0001506]

A collection of sequences (often chromosomes) of an individual.

variant_origin [SO_0001762]

A quality inhering in a variant by virtue of its origin.

variant_phenotype [SO_0001769]

A quality inhering in a variant by virtue of its phenotype.

variant_quality [SO_0001761]

A dependent entity that inheres in a bearer, a sequence variant.

VAT [SO_0001537]

A sequence variant that changes one or more sequence features.

VAT [SO_0001821]

An inframe non synonymous variant that inserts bases into in the coding sequence.

vaultRNA_primary_transcript [SO_0002040]

A primary transcript encoding a vaultRNA.

VD_gene_segment [SO_0000510]

Genomic DNA of immunoglobulin/T-cell receptor gene in partially rearranged genomic DNA including L-part1, V-intron and V-D-exon, with the 5’ UTR (SO:0000204) and 3’ UTR (SO:0000205).

VDJ_C_cluster [SO_0000541]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VDJ-gene and one C-gene.

VDJ_gene_segment [SO_0000574]

Rearranged genomic DNA of immunoglobulin/T-cell receptor gene including L-part1, V-intron and V-D-J-exon, with the 5’UTR (SO:0000204) and 3’UTR (SO:0000205).

VDJ_J_C_cluster [SO_0000487]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VDJ-gene, one J-gene and one C-gene.

VDJ_J_cluster [SO_0000488]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VDJ-gene and one J-gene.

vertebrate_immune_system_gene [SO_0002121]

The configuration of the IG and TR variable (V), diversity (D) and joining (J) germline genes before DNA rearrangements (with or without constant (C) genes in undefined configuration. (germline, non rearranged regions of the IG DNA loci). These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

vertebrate_immune_system_gene_recombination_feature [SO_0000301]

A feature where recombination has occurred for the purpose of generating a diversity in the immune system.

vertebrate_immune_system_gene_recombination_signal_feature [SO_0000939]

Feature used for the recombination of genomic material for the purpose of generating diversity of the immune system.

vertebrate_immune_system_gene_recombination_spacer [SO_0000563]

A 12 or 23 nucleotide spacer between two regions of an immunoglobulin/T-cell receptor gene that may be rearranged by recombinase.

vertebrate_immune_system_pseudogene [SO_0002097]

A pseudogene derived from a vertebrate immune system gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

vertebrate_immunoglobulin_T_cell_receptor_gene_cluster [SO_0000482]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration.

vertebrate_immunoglobulin_T_cell_receptor_rearranged_gene_cluster [SO_0000938]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration.

vertebrate_immunoglobulin_T_cell_receptor_rearranged_segment [SO_0000936]

Genomic DNA of immunoglobulin/T-cell receptor gene in partially rearranged genomic DNA.

vertebrate_immunoglobulin_T_cell_receptor_segment [SO_0000460]

Germline genomic DNA with the sequence for a V, D, C, or J portion of an immunoglobulin/T-cell receptor. I am using the term segment instead of gene here to avoid confusion with the region ‘gene’.

viral_promoter [SO_0002311]

A regulatory_region including the Transcription Start Site (TSS) of a gene found in genes of viruses.

viral_sequence [SO_0001041]

The region of nucleotide sequence of a virus, a submicroscopic particle that replicates by infecting a host cell. The definitions of the children of this term were revised Decemeber 2007 after discussion on song-devel. The resulting definitions are slightly unweildy but hopefully more logically correct.

virtual_sequence [SO_0000499]

A continuous piece of sequence similar to the ‘virtual contig’ concept of the Ensembl database.

VJ_C_cluster [SO_0000489]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene and one C-gene.

VJ_gene_segment [SO_0000576]

Rearranged genomic DNA of immunoglobulin/T-cell receptor gene including L-part1, V-intron and V-J-exon, with the 5’UTR (SO:0000204) and 3’UTR (SO:0000205).

VJ_J_C_cluster [SO_0000490]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene, one J-gene and one C-gene.

VJ_J_cluster [SO_0000491]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene and one J-gene.

W_region [SO_0002024]

The leftmost segment of homology in the HML and MAT mating loci, but not present in HMR. MERGED COMMENT: TARGET COMMENT: Requested by Janos Demeter, SGD. ——————– SOURCE COMMENT: Requested by Janos Demeter, SGD.

WC_base_pair [SO_0000029]

The canonical base pair, where two bases interact via WC edges, with glycosidic bonds oriented cis relative to the axis of orientation.

whole_genome_sequence_status [SO_0001499]

The status of whole genome sequence. This terms and children were added to SO in response to tracker request by Patrick Chain. The paper Genome Project Standards in a New Era of Sequencing. Science October 9th 2009, addresses these terms.

wiki [SO_0000003]

G-quartets are unusual nucleic acid structures consisting of a planar arrangement where each guanine is hydrogen bonded by hoogsteen pairing to another guanine in the quartet.

wiki [SO_0000005]

The many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA.

wiki [SO_0000023]

The kink turn (K-turn) is an RNA structural motif that creates a sharp (~120 degree) bend between two continuous helices.

wiki [SO_0000037]

A DNA region that includes DNAse hypersensitive sites located near a gene that confers the high-level, position-independent, and copy number-dependent expression to that gene. Definition updated Nov 10 2020, Colin Logie from GREEKC helped us realize that LCRs can also be located 3’ to a gene.

wiki [SO_0000051]

A DNA sequence used experimentally to detect the presence or absence of a complementary nucleic acid.

wiki [SO_0000054]

A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number.

wiki [SO_0000055]

A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number as extra chromosomes are present.

wiki [SO_0000057]

A regulatory element of an operon to which activators or repressors bind thereby effecting translation of genes in that operon. Moved to transcriptional_cis_regulatory_region (SO:0001055) from gene_group_regulatory_region (SO:0000752) on 11 Feb 2021 when SO:0000752 was merged into SO:0001055. See GitHub Issue #529.

wiki [SO_0000077]

A region sequence that is complementary to a sequence of messenger RNA.

wiki [SO_0000087]

A gene from nuclear sequence.

wiki [SO_0000088]

A gene located in mitochondrial sequence.

wiki [SO_0000101]

A transposon or insertion sequence. An element that can insert in a variety of DNA sequences.

wiki [SO_0000134]

Imprinted genes are epigenetically modified genes that are expressed monoallelically according to their parent of origin.

wiki [SO_0000141]

The sequence of DNA located either at the end of the transcript that causes RNA polymerase to terminate transcription. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

wiki [SO_0000147]

A region of the transcript sequence within a gene which is not removed from the primary RNA transcript by RNA splicing. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

wiki [SO_0000151]

A piece of DNA that has been inserted in a vector so that it can be propagated in a host bacterium or some other organism.

wiki [SO_0000157]

A plasmid which carries within its sequence a bacteriophage replication origin. When the host bacterium is infected with “helper” phage, a phagemid is replicated along with the phage DNA and packaged into phage capsids.

wiki [SO_0000158]

A cloning vector that utilizes the E. coli F factor. Birren BW et al. A human chromosome 22 fosmid resource: mapping and analysis of 96 clones. Genomics 1996.

wiki [SO_0000167]

A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the core transcription machinery. A region (DNA) to which RNA polymerase binds, to begin transcription. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. The region on a DNA molecule involved in RNA polymerase binding to initiate transcription. Moved from is_a: SO:0001055 transcriptional_cis_regulatory_region as per request from GREEKC initiative in August 2020. Merged with RNA_polymerase_promoter (SO:0001203) Aug 2020. Moved up one level from is_a CRM (SO:0000727) to is_a transcriptional_cis_regulatory_region (SO:0001055) as part of the GREEKC work January 2021. Pascale Gaudet from Gene Ontology pointed out that CRM can be located upstream of the promoter and therefore cannot include the promoter.

wiki [SO_0000172]

Part of a conserved sequence located about 75-bp upstream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG(C|T)CAATCT.

wiki [SO_0000174]

A conserved AT-rich septamer found about 25-bp before the start point of many eukaryotic RNA polymerase II transcript units; may be involved in positioning the enzyme for correct initiation; consensus=TATA(A|T)A(A|T). Binds TBP.

wiki [SO_0000175]

A conserved region about 10-bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT. This region is associated with sigma factor 70. Changed from is_a SO:0000713 DNA_motif to is_a SO:0002312 core_prokaryotic_promoter_element in response to GREEKC Initiative Dave Sant Aug 2020. Changed from is_a SO:0002312 core_prokaryotic_promoter_element back to is_a SO:0000713 DNA_motif to be consistent with minus_12_signal and minus_24_signal on 12 July 2021.

wiki [SO_0000178]

The DNA region of a group of adjacent genes whose transcription is coordinated on one or several mutually overlapping transcription units transcribed in the same direction and sharing at least one gene. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. Definition updated with per Mejia-Almonte et.al Redefining fundamental concepts of transcription initiation in prokaryotes Aug 5 2020.

wiki [SO_0000180]

A transposable element that is incorporated into a chromosome by a mechanism that requires reverse transcriptase.

wiki [SO_0000193]

A DNA fragment used as a reagent to detect the polymorphic genomic loci by hybridizing against the genomic DNA digested with a given restriction enzyme.

wiki [SO_0000204]

A region at the 5’ end of a mature transcript (preceding the initiation codon) that is not translated into a protein.

wiki [SO_0000206]

A repetitive element, a few hundred base pairs long, that is dispersed throughout the genome. A common human SINE is the Alu element.

wiki [SO_0000233]

A transcript which has undergone the necessary modifications, if any, for its function. In eukaryotes this includes, for example, processing of introns, cleavage, base modification, and modifications to the 5’ and/or the 3’ ends, other than addition of bases. In bacteria functional mRNAs are usually not modified. A processed transcript cannot contain introns.

wiki [SO_0000234]

Messenger RNA is the intermediate molecule between DNA and protein. It includes UTR and coding sequences. It does not contain introns. An mRNA does not contain introns as it is a processed_transcript. The equivalent kind of primary_transcript is protein_coding_primary_transcript (SO:0000120) which may contain introns. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

wiki [SO_0000274]

A small nuclear RNA molecule involved in pre-mRNA splicing and processing. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

wiki [SO_0000275]

A snoRNA (small nucleolar RNA) is any one of a class of small RNAs that are associated with the eukaryotic nucleus as components of small nucleolar ribonucleoproteins. They participate in the processing or modifications of many RNAs, mostly ribosomal RNAs (rRNAs) though snoRNAs are also known to target other classes of RNA, including spliceosomal RNAs, tRNAs, and mRNAs via a stretch of sequence that is complementary to a sequence in the targeted RNA.

wiki [SO_0000276]

Small, ~22-nt, RNA molecule that is the endogenous transcript of a miRNA gene (or the product of other non coding RNA genes. Micro RNAs are produced from precursor molecules (SO:0000647) that can form local hairpin structures, which ordinarily are processed (usually via the Dicer pathway) such that a single miRNA molecule accumulates from one arm of a hairpin precursor molecule. Micro RNAs may trigger the cleavage of their target molecules or act as translational repressors.

wiki [SO_0000287]

A gene that is a fusion.

wiki [SO_0000289]

A repeat_region containing repeat_units of 2 to 10 bp repeated in tandem. A defined feature that includes any type of VNTR or SSLP locus.

wiki [SO_0000296]

A region of nucleic acid from which replication initiates; includes sequences that are recognized by replication proteins, the site from which the first separation of complementary strands occurs, and specific replication start sites.

wiki [SO_0000297]

Displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein. Moved from is_a: SO:0000296 origin_of_replication to is_a: SO:0001411 biological_region after Terrence Murphy (INSDC) pointed out that the D loop can also refer to a loop in DNA repair, which is not an origin of replication. See GitHub Issue #417 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/417)

wiki [SO_0000307]

Regions of a few hundred to a few thousand bases in vertebrate genomes that are relatively GC and CpG rich; they are typically unmethylated and often found near the 5’ ends of genes.

wiki [SO_0000314]

A repeat where the same sequence is repeated in the same direction. Example: GCTGA-followed by-GCTGA.

wiki [SO_0000318]

First codon to be translated by a ribosome.

wiki [SO_0000340]

Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication. A complete chromosome sequence. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

wiki [SO_0000359]

An attribute describing sequence that is flanked by Lox-P sites.

wiki [SO_0000365]

A region encoding an integrase which acts at a site adjacent to it (attI_site) to insert DNA which must include but is not limited to an attC_site.

wiki [SO_0000374]

An RNA with catalytic activity.

wiki [SO_0000375]

Cytosolic 5.8S rRNA is an RNA component of the large subunit of cytosolic ribosomes in eukaryotes. Dave Sant removed ‘5_8S rRNA is also found in archaea.’ from definition due to lack of references mentioning this on 1 Feb 2021. See GitHub Issue #505. Renamed from rRNA_5_8S to cytosolic_5_8S_rRNA on 10 June 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.

wiki [SO_0000379]

A small untranslated RNA involved in expression of the dipeptide and oligopeptide transport systems in Escherichia coli.

wiki [SO_0000383]

A non-translated 93 nt antisense RNA that binds its target ompF mRNA and regulates ompF expression by inhibiting translation and inducing degradation of the message.

wiki [SO_0000384]

A small untranslated RNA which is induced in response to oxidative stress in Escherichia coli. Acts as a global regulator to activate or repress the expression of as many as 40 genes, including the fhlA-encoded transcriptional activator and the rpoS-encoded sigma(s) subunit of RNA polymerase. OxyS is bound by the Hfq protein, that increases the OxyS RNA interaction with its target messages.

wiki [SO_0000389]

A 109-nucleotide RNA of E. coli that seems to have a regulatory role on the galactose operon. Changes in Spot 42 levels are implicated in affecting DNA polymerase I levels.

wiki [SO_0000390]

The RNA component of telomerase, a reverse transcriptase that synthesizes telomeric DNA.

wiki [SO_0000391]

U1 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Its 5’ end forms complementary base pairs with the 5’ splice junction, thus defining the 5’ donor site of an intron. There are significant differences in sequence and secondary structure between metazoan and yeast U1 snRNAs, the latter being much longer (568 nucleotides as compared to 164 nucleotides in human). Nevertheless, secondary structure predictions suggest that all U1 snRNAs share a ‘common core’ consisting of helices I, II, the proximal region of III, and IV.

wiki [SO_0000392]

U2 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Complementary binding between U2 snRNA (in an area lying towards the 5’ end but 3’ to hairpin I) and the branchpoint sequence (BPS) of the intron results in the bulging out of an unpaired adenine, on the BPS, which initiates a nucleophilic attack at the intronic 5’ splice site, thus starting the first of two transesterification reactions that mediate splicing.

wiki [SO_0000395]

U5 RNA is a component of both types of known spliceosome. The precise function of this molecule is unknown, though it is known that the 5’ loop is required for splice site selection and p220 binding, and that both the 3’ stem-loop and the Sm site are important for Sm protein binding and cap methylation.

wiki [SO_0000398]

U11 snRNA plays a role in splicing of the minor U12-dependent class of eukaryotic nuclear introns, similar to U1 snRNA in the major class spliceosome it base pairs to the conserved 5’ splice site sequence.

wiki [SO_0000399]

The U12 small nuclear (snRNA), together with U4atac/U6atac, U5, and U11 snRNAs and associated proteins, forms a spliceosome that cleaves a divergent class of low-abundance pre-mRNA introns.

wiki [SO_0000404]

A family of RNAs are found as part of the enigmatic vault ribonucleoprotein complex. The complex consists of a major vault protein (MVP), two minor vault proteins (VPARP and TEP1), and several small untranslated RNA molecules. It has been suggested that the vault complex is involved in drug resistance.

wiki [SO_0000405]

Y RNAs are components of the Ro ribonucleoprotein particle (Ro RNP), in association with Ro60 and La proteins. The Y RNAs and Ro60 and La proteins are well conserved, but the function of the Ro RNP is not known. In humans the RNA component can be one of four small RNAs: hY1, hY3, hY4 and hY5. These small RNAs are predicted to fold into a conserved secondary structure containing three stem structures. The largest of the four, hY1, contains an additional hairpin.

wiki [SO_0000406]

An intron within an intron. Twintrons are group II or III introns, into which another group II or III intron has been transposed.

wiki [SO_0000407]

Cytosolic 18S rRNA is an RNA component of the small subunit of cytosolic ribosomes in eukaryotes. Renamed to cytosolic_18S_rRNA from rRNA_18S on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.

wiki [SO_0000412]

A region of polynucleotide sequence produced by digestion with a restriction endonuclease.

wiki [SO_0000418]

The signal_peptide is a short region of the peptide located at the N-terminus that directs the protein to be secreted or part of membrane components. Old def before biosapiens:The sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane leader sequence.

wiki [SO_0000440]

A replicon that has been modified to act as a vector for foreign sequence. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

wiki [SO_0000500]

A type of non-canonical base-pairing. This is less energetically favourable than watson crick base pairing. Hoogsteen GC base pairs only have two hydrogen bonds.

wiki [SO_0000544]

A rolling circle transposon. Autonomous helitrons encode a 5’-to-3’ DNA helicase and nuclease/ligase similar to those encoded by known rolling-circle replicons.

wiki [SO_0000552]

A region in the 5’ UTR that pairs with the 16S rRNA during formation of the preinitiation complex. Not found in Eukaryotic sequence.

wiki [SO_0000569]

An attribute of a feature that occurred as the product of a reverse transcriptase mediated event. GO:0003964 RNA-directed DNA polymerase activity.

wiki [SO_0000584]

A tmRNA liberates a mRNA from a stalled ribosome. To accomplish this part of the tmRNA is used as a reading frame that ends in a translation stop signal. The broken mRNA is replaced in the ribosome by the tmRNA and translation of the tmRNA leads to addition of a proteolysis tag to the incomplete protein enabling recognition by a protease. Recently a number of permuted tmRNAs genes have been found encoded in two parts. TmRNAs have been identified in eubacteria and some chloroplasts but are absent from archeal and Eukaryote nuclear genomes.

wiki [SO_0000587]

Group I catalytic introns are large self-splicing ribozymes. They catalyze their own excision from mRNA, tRNA and rRNA precursors in a wide range of organisms. The core secondary structure consists of 9 paired regions (P1-P9). These fold to essentially two domains, the P4-P6 domain (formed from the stacking of P5, P4, P6 and P6a helices) and the P3-P9 domain (formed from the P8, P3, P7 and P9 helices). Group I catalytic introns often have long ORFs inserted in loop regions. GO:0000372.

wiki [SO_0000605]

A region containing or overlapping no genes that is bounded on either side by a gene, or bounded by a gene and the end of the chromosome. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

wiki [SO_0000619]

A variably distant linear promoter region recognized by TFIIIC, with consensus sequence TGGCnnAGTGG. Binds TFIIIC.

wiki [SO_0000624]

A specific structure at the end of a linear chromosome, required for the integrity and maintenance of the end.

wiki [SO_0000627]

A regulatory region that 1) when located between a CRM and a gene’s promoter prevents the CRM from modulating that genes expression and 2) acts as a chromatin boundary element or barrier that can block the encroachment of condensed chromatin from an adjacent region. moved from is_a: SO:0001055 transcriptional_cis_regulatory_region as per request from GREEKC initiative in August 2020.

wiki [SO_0000634]

An mRNA that encodes multiple proteins from at least two non-overlapping regions.

wiki [SO_0000643]

A repeat region containing tandemly repeated sequences having a unit length of 10 to 40 bp.

wiki [SO_0000644]

Antisense RNA is RNA that is transcribed from the coding, rather than the template, strand of DNA. It is therefore complementary to mRNA.

wiki [SO_0000646]

A small RNA molecule that is the product of a longer exogenous or endogenous dsRNA, which is either a bimolecular duplex or very long hairpin, processed (via the Dicer pathway) such that numerous siRNAs accumulate from both strands of the dsRNA. siRNAs trigger the cleavage of their target molecules.

wiki [SO_0000653]

Cytosolic 28S rRNA is an RNA component of the large subunit of cytosolic ribosomes in metazoan eukaryotes. Renamed from rRNA_28S to cytosolic_28S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.

wiki [SO_0000655]

An RNA transcript that does not encode for a protein rather the RNA molecule is the gene product. A ncRNA is a processed_transcript, so it may not contain parts such as transcribed_spacer_regions that are removed in the act of processing. For the corresponding primary_transcripts, please see term SO:0000483 nc_primary_transcript.

wiki [SO_0000658]

A repeat that is located at dispersed sites in the genome.

wiki [SO_0000696]

A short oligonucleotide sequence, of length on the order of 10’s of bases; either single or double stranded.

wiki [SO_0000713]

A motif that is active in the DNA form of the sequence.

wiki [SO_0000724]

A region of a DNA molecule where transfer is initiated during the process of conjugation or mobilization.

wiki [SO_0000728]

A region of a peptide that is able to excise itself and rejoin the remaining portions with a peptide bond. Intein-mediated protein splicing occurs after mRNA has been translated into a protein.

wiki [SO_0000732]

An attribute describing an unverified region.

wiki [SO_0000741]

A kinetoplast is an interlocked network of thousands of minicircles and tens of maxicircles, located near the base of the flagellum of some protozoan species.

wiki [SO_0000755]

A plasmid that has been generated to act as a vector for foreign sequence.

wiki [SO_0000756]

DNA synthesized by reverse transcriptase using RNA as a template.

wiki [SO_0000772]

A genomic island is an integrated mobile genetic element, characterized by size (over 10 Kb). It that has features that suggest a foreign origin. These can include nucleotide distribution (oligonucleotides signature, CG content etc.) that differs from the bulk of the chromosome and/or genes suggesting DNA mobility. Genomic islands are transmissible elements characterized by large size (>10kb).

wiki [SO_0000817]

An attribute describing sequence with the genotype found in nature and/or standard laboratory stock.

wiki [SO_0000854]

A homologous_region that is paralogous to another region. A term to be used in conjunction with the paralogous_to relationship.

wiki [SO_0000855]

A homologous_region that is orthologous to another region. This term should be used in conjunction with the similarity relationships defined in SO.

wiki [SO_0000860]

Attribute describing sequence regions occurring in same order on chromosome of different species.

wiki [SO_0000973]

A terminal_inverted_repeat_element that is bacterial and only encodes the functions required for its transposition between these inverted repeats.

wiki [SO_0001008]

A base-paired stem with loop of 4 non-hydrogen bonded nucleotides.

wiki [SO_0001015]

A type of non-canonical base pairing, most commonly between G and U, which is important for the secondary structure of RNAs. It has similar thermodynamic stability to the Watson-Crick pairing. Wobble base pairs only have two hydrogen bonds. Other wobble base pair possibilities are I-A, I-U and I-C.

wiki [SO_0001017]

A sequence variant that does not affect protein function. Silent mutations may occur in genic ( CDS, UTR, intron etc) and intergenic regions. Silent mutations may have affects on processes such as splicing and regulation. Added in March 2007 in after meeting with PharmGKB. Although this term is in common usage, it is better to annotate with the most specific term possible, such as synonymous codon, intron variant etc.

wiki [SO_0001018]

A binding site that, in the molecule, interacts selectively and non-covalently with antibodies, B cells or T cells. Requested by Trish Whetzel.

wiki [SO_0001027]

A genotype is a variant genome, complete or incomplete.

wiki [SO_0001035]

A small non coding RNA, part of a silencing system that prevents the spreading of selfish genetic elements.

wiki [SO_0001042]

The nucleotide sequence of a virus that infects bacteria.

wiki [SO_0001079]

Motif is a three-dimensional structural element within the chain, which appears also in a variety of other molecules. Unlike a domain, a motif does not need to form a stable globular unit.

wiki [SO_0001080]

A coiled coil is a structural motif in proteins, in which alpha-helices are coiled together like the strands of a rope. Range.

wiki [SO_0001089]

A region where a transformation occurs in a protein after it has been synthesized. This which may regulate, stabilize, crosslink or introduce new chemical functionalities in the protein. Discrete.

wiki [SO_0001111]

A beta strand describes a single length of polypeptide chain that forms part of a beta sheet. A single continuous stretch of amino acids adopting an extended conformation of hydrogen bonds between the N-O and the C=O of another part of the peptide. This forms a secondary protein structure in which two or more extended polypeptide regions are hydrogen-bonded to one another in a planar array. Range.

wiki [SO_0001117]

The helix has 3.6 residues per turn which corresponds to a translation of 1.5 angstroms (= 0.15 nm) along the helical axis. Every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier. Range.

wiki [SO_0001118]

The pi helix has 4.1 residues per turn and a translation of 1.15 (=0.115 nm) along the helical axis. The N-H group of an amino acid forms a hydrogen bond with the C=O group of the amino acid five residues earlier. Range.

wiki [SO_0001119]

The 3-10 helix has 3 residues per turn with a translation of 2.0 angstroms (=0.2 nm) along the helical axis. The N-H group of an amino acid forms a hydrogen bond with the C=O group of the amino acid three residues earlier. Range.

wiki [SO_0001180]

A cis-acting element found in the 3’ UTR of some mRNA which is rich in AUUUA pentamers. Messenger RNAs bearing multiple AU-rich elements are often unstable.

wiki [SO_0001189]

An oligo composed of LNA residues.

wiki [SO_0001191]

An oligo composed of TNA residues.

wiki [SO_0001193]

An oligo composed of GNA residues.

wiki [SO_0001210]

The region of mRNA (not divisible by 3 bases) that is skipped or added during the process of translational frameshifting (GO:0006452), causing the reading frame to be different. Added synonym ‘ribosomal_slippage’ on Feb 1, 2021, a term in INSDC and GenBank. See GitHub Issue #522.

wiki [SO_0001213]

Group III introns are introns found in the mRNA of the plastids of euglenoid protists. They are spliced by a two step transesterification with bulged adenosine as initiating nucleophile. GO:0000374.

wiki [SO_0001229]

A modified RNA base in which the 5- position of the uracil is bound to the ribose ring instead of the 4- position. The free molecule is CHEBI:17802.

wiki [SO_0001234]

An attribute describing a feature that has either intra-genome or intracellular mobility.

wiki [SO_0001236]

A base is a sequence feature that corresponds to a single unit of a nucleotide polymer.

wiki [SO_0001237]

A sequence feature that corresponds to a single amino acid residue in a polypeptide. Probably in the future this will cross reference to Chebi.

wiki [SO_0001248]

A region of the genome of known length that is composed by ordering and aligning two or more different regions.

wiki [SO_0001254]

A kind of chromosome variation where the chromosome complement is an exact multiple of the haploid number and is greater than the diploid number.

wiki [SO_0001255]

A polyploid where the multiple chromosome set was derived from the same organism.

wiki [SO_0001289]

Lysidine is a modified cytidine.

wiki [SO_0001819]

A sequence variant where there is no resulting change to the encoded amino acid. EBI term: Synonymous SNPs - In coding sequence, not resulting in an amino acid change (i.e. silent mutation). This term is sometimes used synonomously with the more general term ‘silent mutation’, although a silent mutation may occur in non coding sequence. The best practice is to annotate to the most specific term.

wiki [SO_0001830]

A PCR product obtained by applying the AFLP technique, based on a restriction enzyme digestion of genomic DNA and an amplification of the resulting fragments. Requested by Bayer Cropscience June, 2011.

wiki [SO_0005836]

A region of sequence that is involved in the control of a biological process.

wiki [SO_0005850]

Non-covalent primer binding site for initiation of replication, transcription, or reverse transcription.

wiki [SO_0005853]

A gene that can be substituted for a related gene at a different site in the genome. This would include, for example, the mating type gene cassettes of S. cerevisiae. Gene cassettes usually exist as linear sequences as part of a larger DNA molecule, such as a chromosome or plasmid.

wiki [SO_1000017]

Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G, or vice versa.

wiki [SO_1000030]

An interchromosomal mutation where a region of the chromosome is inverted with respect to wild type.

wiki [SO_1000037]

An extra chromosome.

wiki [SO_1000043]

A non reciprocal translocation whereby the participating chromosomes break at their centromeres and the long arms fuse to form a single chromosome with a single centromere.

wiki [SO_1000044]

A chromosomal mutation. Rearrangements that alter the pairing of telomeres are classified as translocations.

wiki [SO_1001284]

A set of units of gene expression directly regulated by a common set of one or more common regulatory gene products. Definition updated with Mejia-Almonte et.al PMID:32665585 on Aug 5, 2020. Added relationship has_part SO:0002300

WIKI [SO_0000207]

SSLP are a kind of sequence alteration where the number of repeated sequences in intergenic regions may differ.

wikipedia [SO_0001683]

A sequence motif is a nucleotide or amino-acid sequence pattern that may have biological significance.

wild_type_rescue_gene [SO_0000818]

A gene that rescues.

wybutosine [SO_0001332]

Wybutosine is a modified guanosine base feature.

wyosine [SO_0001336]

Wyosine is a modified guanosine base feature.

X_element [SO_0001497]

The X element is a conserved region, of the telomere, of ~475 bp that contains an ARS sequence and in most cases an Abf1p binding site. Possible functions include roles in chromosomal segregation, maintenance of chromosome stability, recombinational sequestering, or as a barrier to transcriptional silencing. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880747. From Janos Demeter: The only region shared by all chromosome ends, the X element core sequence is a small conserved element (~475 bp) that contains an ARS sequence and in most cases an Abf1p binding site. Between these is a GC-rich region nearly identical to the meiosis-specific regulatory sequence URS1.

X_element_combinatorial_repeat [SO_0001484]

An X element combinatorial repeat is a repeat region located between the X element and the telomere or adjacent Y’ element. X element combinatorial repeats contain Tbf1p binding sites, and possible functions include a role in telomerase-independent telomere maintenance via recombination or as a barrier against transcriptional silencing. These are usually present as a combination of one or more of several types of smaller elements (designated A, B, C, or D). This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880747.

X_region [SO_0002030]

One of two segments of homology found at all three mating loci (HML, MAT and HMR).

Y_prime_element [SO_0001485]

A Y’ element is a repeat region (SO:0000657) located adjacent to telomeric repeats or X element combinatorial repeats, either as a single copy or tandem repeat of two to four copies. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880747.

Y_region [SO_0002001]

A segment of non-homology between a and alpha mating alleles, found at all three mating loci (HML, MAT, and HMR), has two forms (Ya and Yalpha). Requested by Janos Demeter, SGD.

Y_RNA_primary_transcript [SO_0002042]

A primary transcript encoding a Y-RNA.

YAC [SO_0000152]

Yeast Artificial Chromosome, a vector constructed from the telomeric, centromeric, and replication origin sequences needed for replication in yeast cells. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

YAC_end [SO_0001498]

A region of sequence from the end of a YAC clone that may provide a highly specific marker.

Z1_region [SO_0002002]

A mating type region motif, one of two segments of homology found at all three mating loci (HML, MAT, and HMR). Requested by Janos Demeter, SGD.

Z2_region [SO_0002003]

A mating type region motif, the rightmost segment of homology in the HML and MAT mating loci (not present in HMR). Requested by Janos Demeter, SGD.

zinc_finger_binding_site [SO_0001971]

A binding site to which a polypeptide will bind with a zinc finger motif, which is characterized by requiring one or more Zinc 2+ ions for stabilized folding.

zinc_repressed_element [SO_0002006]

A promoter element that has the consensus sequence GNMGATC, and is found in promoters of genes repressed in the presence of zinc. This element is bound by Loz1 in S. pombe. The paper does not name the element. This term was requested by Midoris Harris, for Pombe.


Last modified January 5, 2022: adding IAO (3e12bef)