PTGL Home

PTGL Documentation

PTGL Home  
 
 

Contents:


What is PTGL ?

PTGL is a web-based database application for protein topologies. In order to define a mathematically unique description of protein topology the secondary structure topology of a protein is described by methods of applied graph theory. The protein graph is defined as an undirected labelled graph on three description levels according to the considered secondary structure elements (SSE): the Alpha graph, the Beta graph, and the Alpha-Beta graph. The connected components of the Protein graph form Folding graphs. A Protein graph can consist of one or more Folding graphs. The three graph types were defined for each protein of the PDB. For each graph type exists four linear notations with corresponding graphic representations. In PTGL all Folding graphs, all SSEs, and additional protein information are stored for every protein structure annotated in PDB for which SSEs according DSSP are defined, which is not a NMR structure, has a resolution less than 3.5 Å and a sequence length of at least 20 amino acids. The database enables the user to search for the topology of a protein or for certain topologies and subtopologies using the linear notations. Additionally, it could be searched for sequence similarity in PDB sequences.

Protein Graphs

Using PDB structure data the SSEs are defined according to the assignment of the DSSP-algorithm with some modifications. Then, the spatial contacts between the SSEs are generated according KOCH et al.. These information form the basis for the description of protein structures as graphs.

A Protein graph is defined as labelled undirected graph. In the case of the Protein graph the vertices correspond to the SSEs, helices and strands. The edges of the Protein graph represent spatial adjacencies of SSEs. These adjacencies are defined through contacts between SSEs. According to the type of atoms forming the contact, there are backbone-backbone-contacts, sidechain-sidechain-contacts, and sidechain-backbone contacts. Two vertices are connected, if there are at least two backbone-backbone-contacts or two sidechain-backbone-contacts or three sidechain-sidechain contacts. The vertices of the Protein graph are enumerated as they occur in the sequence from the N- to the C-terminus. According to this direction two spatial neighboured SSEs, which are connected, could have a parallel (p), anti-parallel (a), or mixed (m) neighbourhood. If only helix or strand topology is of interest, the graph modelling allows to exclude the non-interesting SSE type. According to the SSE type of interest the Protein graph can be defined as Alpha, Beta, or Alpha-Beta graph (see example below). SSEs are ordered as red circles (helices) or black quadrats (strands) on a straight line according to their sequential order from the N- to the C-terminus. The spatial neighbourhoods are drawn as arcs between SSEs. The edges are coloured according to their labelling, red for parallel, green for mixed, and blue for anti-parallel neighbourhood.

Alpha-Beta-Graph

The Alpha-Beta-Graph of the protein 1TIM chain A consisting of 21 SSEs (13 helices and 8 strands).

Alpha-Graph

The Alpha-Graph of the protein 1TIM chain A consisting only of 13 helices.

Beta-Graph

The Beta-Graph of the protein 1TIM chain A consisting only of 8 strands.

Folding Graphs

A connected component of the Protein graph is called Folding graph. Folding graphs are denoted with capital letters in alphabetical order according to their occurrence in the sequence, beginning at the N-terminus.

Protein graphs are built of one or more Folding graphs. Below, you find the schematic representation of the antigen receptor protein 1BEC (figure from the Jena Library of Biological Macromoleculs). Helices are coloured red and strands blue. 1BEC is a transport membrane protein that detects foreign molecules at the cell surface. It has two domains, which are represented by the Folding graphs A and E, which are mainly built by strands. The protein consists of one chain A and exhibits six Folding graphs. Two large Folding graphs (Folding graphs 1BEC_A and 1BEC_E), and four Folding graphs 1BEC_B, 1BEC_C, 1BEC_D, and 1BEC_F consisting only of a single helix (see Protein graph of 1bec: helices 9, 11, 14, and 22). Folding graphs consisting of only one SSE are found mostly at the protein surface and not in the protein core.

Especially in beta-sheet containing Folding graphs, the maximal vertex degree of the Folding graphs is not larger than two. Thus, we distinguish between so-called bifurcated and non-bifurcated topological structures. A Protein graph or a Folding graph is called bifucated, if there is any vertex degree greater than 2, if not, the graph is non- bifurcated.

3D structure of 1BEC

Alpha-Beta Protein graph of 1BEC

Alpha-Beta Folding graph A of 1BEC

Alpha-Beta Folding graph B of 1BEC

Linear Notations

A notation serves as a unique, canonical, and linear description and classification of structures. The notations for Folding graphs reveal to the feature of protein structure as a linear sequence of amino acids, and describe the arrangement of SSEs correctly and completely.

There are two possibilities of representing Protein graphs: first, one can order the SSEs in one line according to their occurrence in sequence, or second, according to their occurrence in space. In the first case, the adjacent notation, ADJ, the reduced notation, RED, and the sequence notation, SEQ, SSEs are ordered as points on a straight line according to their sequential order from the N- to the C-terminus.

It is difficult to draw the spatial arrangements of the SSEs in a straight line, because in most proteins SSEs exhibit more than two spatial neighbours. Therefore, the second description type, the key notation, KEY, can be drawn only for non-bifurcated Folding graphs. Helices and strands are represented by cylinders and arrows, respectively. The sequential neighbourhood is described by arcs between arrows and cylinders.

The notations are written in different brackets: [] denote non-bifurcated, {} bifurcated folding graphs, and () indicate barrel structures.

The Adjacent and reduced Notation

All vertices of the Protein graph are considered in the adjacent (ADJ) notation of a Folding graph. SSEs of the Folding graph are ordered according to their occurrence in the sequence. Beginning with the first SSE and following the spatial neighbourhoods the sequential distances are noted followed by the neighbourhood type.

The reduced (RED) notation is the same as for ADJ notation, but only those SSEs of the considered Folding graph count. See below, the ADJ and RED notations of the Beta-Folding graph E in human alpha thrombin chain B(1D3T). The beta sheet consists of six strands arranged both in parallel with one additional mixed edge to helix 12.

ADJ Notation



RED Notation



KEY Notation

The KEY notation is very close to the topology diagrams of biologists, e.g. Brändén and Tooze (1999). Topologies are described by diagrams of arrows for strands and cylinders for helices. As in the RED notation SSEs of the considered Folding graph are taken into account. SSEs are ordered spatially and are connected in sequential order. Beginning with the first SSE in the sequence and following the sequential edges, the spatial distances are noted; in Alpha-Beta graphs followed by the type of the SSE, h for a helix and e for a strand. If the arrangement of SSEs is parallel an x is noted (Richardson(1977)). In this case the protein chain moves on the other side of the sheet by crossing the sheet (cross over). Antiparallel arrangements are called same end, and are more stable, Chothia and Finkelstein (1990). Mixed arrangements are defined as same end. The notation starts with the type of the first SSE. See the KEY notation of the Alpha-Beta Folding graph B chain B of the histocompatibility antigen (1IEB). The Folding graph consists of 3 helices and 4 strands. This topology exhibits one cross over connection from helix 6 to helix 7 and forms an Alpha-Beta barrel structure.

KEY Notation



SEQ Notation

This notation is the same as the ADJ notation, but the sequential differences are counted. Although the SEQ notation is trivial, the notation can be useful, for example, searching for ψ-loops requires a special SEQ notation.




Linking PTGL

You can link PTGL in two ways:

1. Link to a certain PDB-id, chain id, graph type, and notation type, e.g. PDB-id=1g3e, chain id=A, graph type=Alpha-Beta, and notation type=KEY, then the link is:
http://ptgl.zib.de/cgi-bin/showpict.pl?topology=z&rep=3&protlist=1g3eA

The encoding is the following:
parameter
allowed values
description
topology
z
a
b
  Alpha-Beta
  Alpha
  Beta
rep
1
2
3
4
  ADJ
  RED
  KEY
  SEQ
protlist<pdb-id><chain-id>  e.g. 1g3eA

2. If you only have the PDB-id you can link as follows:
http://ptgl.zib.de/cgi-bin/query1.pl?Field=1&pdbid=1g3e
TOP