Protocols of GlycoGene Library Project
In silico searching of glycosyltransferase genes for GGDB
Bioinformatics system was developed for identification and in silico cloning of human glycogenes. Candidates of glycosyltransferases are identified based on several parameters. The information of cloned genes discovered and identified under the protocol were stored in GGDB.
- STEP 1 EST sequences are assembled with Phrap to annotate open reading frames (ORFs).
A gene region of ORF is predicted by the GENSCAN software.
- STEP 2 The likelihood of being a glycogene is evaluated from the scored data derived from the following procedures.
- Motif search is executed to identify candidate sequences which possess the motifs of each glycosyltransferase subfamily. The motifs are determined using the information and tools from MEME /MAST, Pfam and PROSITE.
- The transmembrane domain, which is localized at the N-terminal end, is identified as a region which contains 18-22 hydrophobic amino acid residues.
- DXD (DXH) motif required for divalent cation bonding, is identified. The DXD motif interacts with phosphate groups of sugar-nucleotide donor substrates through coordination of a divalent cation such as Mn2+. However, some of glycosyltransferases, which require no cation for their activities, have no DXD (DXH) motifs.
- The stem region, which is located between a transmembrane domain and a catalytic domain, is identified. The stem region has proline/serine/threonine-rich sequences.
- The localization of cysteine residues which are conserved in subfamily , is identified in the sequences.
- The profile HMM (Hidden Markov Model) method is used to cluster glycosyltransferase families.