Gene superfamilies

Conotoxins, the disulfide rich conopeptides, are classified according to three schemes: the similarities between the ER signal sequence of the conotoxin precursors (gene superfamilies), the cysteine patterns of conotoxin mature peptide regions (cysteine frameworks), and the specificities to pharmacological targets (pharmacological families). This page provides a brief introduction to the gene superfamilies and a list of the gene superfamilies used in ConoServer. The two other classification schemes are detailed in separate pages accessible from the menu on the left. A more comprehensive discussion of the conopeptide classification schemes can be found in Kaas et al. Toxicon 2010 [1].

Conopeptides are expressed as precursor proteins, which are processed into mature peptide toxins in the endoplasmic reticulum (ER) and in the Golgi apparatus. The classical organisation of a conopeptide precursor is shown in Figure 1. During the maturation process, the ER signal sequence and then the N- and C-terminal pro-regions are cleaved and some amino acids can be post-translationally modified (see amino acid post-translational modifications).

Figure 1: Conopeptide protein precursor organization. The organization is examplified using the sequence of SmIVA precursor (P00021).

The sequence regions of the conopeptide precursors (Figure 1) have been shown to evolve at different rate [2]. The sequence of the mature peptide region is highly diverse, in keeping with the high variety of conopeptides, while the ER signal sequence is more conserved. The comparison of conopeptide ER signal sequences allowed to define several groups, the gene superfamilies, that share higher sequence similarity. Figure 2 shows a clustering analysis of the ER signal sequences in ConoServer together with the identification of the superfamilies. This analysis shows that by using a cut-off of 35% sequence identity, most of the superfamilies are well defined. The only exception is the unique member of the Y-superfamily which shares around 40% identity with some members of the M-superfamily.

Figure 2: Clustering analysis of ER signal sequences found in ConoServer and identification of gene superfamilies. The percentage of identity between ER signal sequences was measured on a global alignment performed using clustalw. The dissimilarity matrix was then submited to the hierarchical clustering algorithm hclust implemented in the statistical program R. The gene superfamilies used in ConoServer are highlighted in different colors. The definitions of the gene superfamilies are provided in Table 1. Clicking on the graphics give access to a high resolution picture (986 Kb) on which the ConoServer protein identifier of each signal sequence is visible. This analysis was carried out with available data in ConoServer on 25/08/2010 and is not automatically updated.

Table 1 provides the definition of the 30 published gene superfamilies that are used in ConoServer. The relationship between the gene superfamilies and the other classification schemes, the cysteine frameworks and the pharmacological families, are complex. Up-to-date statistics on those relationships can be found in the ConoServer statistics pages and in Kaas et al. Toxicon 2010 [1]. Recently the gene superfamily classification was extended to the disulfide poor conopeptides [3], and the gene superfamilies B and C have been introduced in ConoServer.

Table 1: Gene superfamilies with published references used in ConoServer. This table is automatically generated and therefore kept up-to-date with the content of ConoServer. The second column shows the mature peptide cysteine frameworks found in each gene superfamily. The third column gives the number of protein precusors for each gene superfamily (clicking on the number gives access to the list of those protein precursors). Some temporary names have been created in ConoServe to describe conopeptide ER signal sequences whose groups are unpublished names. Those temporary names are described in Table 2.
Gene superfamily Cysteine frameworks # protein precursors Reference
A I, II, IV, VI/VII, XIV, XXII 347 Santos,A.D. et al. (2004) J. Biol. Chem. 279:17596-17606
B1 66 Puillandre,N. et al. (2012) J. Mol. Evol. 74:297-309
B2 VIII 25 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
B3 XXIV 1 Luo,S. et al. (2013) PLoS ONE 8
C 8 Puillandre,N. et al. (2012) J. Mol. Evol. 74:297-309
D IV, XIV, XV, XX, XXIV, XXVIII 122 Loughnan,M.L. et al. (2009) Biochemistry 48:3717-3729
E XXII 9 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
F 18 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
G XIII 1 Aguilar,M.B. et al. (2013) Peptides 41:17-20
H VI/VII 21 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
I1 VI/VII, XI, XXII 33 Jimenez,E.C. et al. (2003) J. Neurochem. 85:610-621
I2 VI/VII, XI, XII, XIII, XIV 85 Buczek,O. et al. (2005) FEBS J. 272:4178-4188
I3 VI/VII, XI 16 Yuan,D.D. et al. (2009) Peptides 30:861-865
J XIV 47 Imperial,J.S. et al. (2006) Biochemistry 45:8331-8340
K XXIII 6 Ye,M. et al. (2012) J Biol Chem 287:14973-14983
L XIV, XXIV 28 Peng,C. et al. (2006) Peptides 27:2174-2181
M I, II, III, IV, VI/VII, IX, XIV, XVI, XXXII 650 Corpuz,G.P. et al. (2005) Biochemistry 44:8176-8186
N XV 5 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
O1 I, VI/VII, IX, XII, XIV, XVI, XXIX 709 McIntosh,J.M. et al. (1995) J. Biol. Chem. 270:16796-16802
O2 I, VI/VII, XII, XIV, XV, XVI 192 Zhangsun et al. (2006) Chem Biol Drug Des. 68:256-265
O3 VI/VII, XVI 61 Zhangsun et al. (2006) Chem Biol Drug Des. 68:256-265
P IX, XIV 22 Lirazan,M.B. et al. (2000) Biochemistry 39:1583-1588
Q VI/VII, XVI 23 Lu,A. et al. (2014) Mol. Cell Proteomics 13:105-118
R XIV 8
S VIII, XXXIII 33 Liu,L. et al. (2008) Toxicon 51:1331-1337
T I, V, X, XVI 280 Walker,C.S. et al. (1999) J. Biol. Chem. 274:30664-30671
U VI/VII 9 Robinson,S.D. and Norton,R.S. (2014) Mar Drugs 12:6058-6101
V XV 2 Peng,C. et al. (2008) Peptides 29:985-991
Y VI/VII, XVII 4 Yuan,D.D. et al. (2008) Peptides 29:1521-1525
conodipine 5 Möller et al. (2019) Molecular & Cellular Proteomics 18:876-891

Phylogenetic analyses have classified cone snails into different groups, or clades, according to the homology of their 16S RNA sequence [4]. One clade, named "Early", is highly divergent from the others. In a recent study [5], a number of conopeptide precusors have been sequenced from Conus californicus, a member of the Early clade, and those conopeptides do not correspond to any previously identified superfamilies. Table 2 provides the 'temporary names' that have been introduced in ConoServer to designate those superfamilies. The clustering analysis shown in Figure 2 clearly demonstrates that those new superfamilies are distinct. The names of those superfamilies are only temporary and are likely to be changed in the future when a definitive nomenclature will be published in a peer-reviewed journal.

Table 2: Temporary gene superfamily names introduced in ConoServer to designate superfamilies recently identified in the early divergent clade species. This table is automatically generated and therefore kept up-to-date with the content of ConoServer. The conopeptide precursor sequences associated with those superfamilies have mainly been identified in the study of Biggs et al 2010 [5]. The second column shows the mature peptide cysteine frameworks found in each gene superfamily. The third column gives the number of protein precusors for each gene superfamily (clicking on the number gives access to the list of those protein precursors).
Gene superfamily Cysteine frameworks # protein precursors
Divergent M---L-LTVA VI/VII, IX, XIV 11
Divergent MKFPLLFISL VI/VII 1
Divergent MKLCVVIVLL XIV 3
Divergent MKLLLTLLLG VIII 2
Divergent MKVAVVLLVS XIV 1
Divergent MRCLSIFVLL XVI 2
Divergent MRFLHFLIVA VI/VII 1
Divergent MRFYIGLMAA I, V 3
Divergent MSKLVILAVL IX 1
Divergent MSTLGMTLL- IX, XV, XIX, XXII 9
Divergent MTAKATLLVL XIV 1
Divergent MTFLLLLVSV IX 1
Divergent MTLTFLLVVA VI/VII 1
G2 IX, XXVII 20
Insulin 34

[1]Kaas,Q. et al. (2010) Toxicon 55:1491-1509
[2]Woodward,S.R. et al. (1990) EMBO J. 9:1015-1020
[3]Puillandre,N. et al. (2012) J. Mol. Evol. 74:297-309
[4]Espiritu,D.J. et al. (2001) Toxicon 39:1899-1916
[5]Biggs,J.S. et al. (2010) Mol. Phylogenet. Evol. 56:1-12