常用数据分析方法

常用的分析数据方法

Molecular Dynamics Hands-on Session II

Tutorial on MD simulations using Gromacs 3.3.3

Tsjerk A. Wassenaar

Analysis of data from molecular dynamics simulation

When the simulation has finished, it is time to continue with the analysis of the data. The analysis comprises of two stages. First, it is necessary to perform some standard checks to assess the quality of the simulations. If the results from these analyses show that the simulations are fine, then the analyses required to answer the research question asked can be performed.

As a side note, many of the analysis tools produce graphs in the form of .xvg files. these files can be viewed with the program xmgrace, but they can also be viewed in a terminal using the python script xvg2ascii.py.

Quality assurance

The quality assurance (QA) involves tests for the convergence of thermodynamic parameters, such as the temperature, the pressure, the potential and the kinetic energy. Convergence is also checked in terms of the structure, through the root mean square deviation (RMSD) against the starting structure and the average structure. Next to that, it has to be checked that there have not been interactions between adjacent periodic images, as such interactions could lead to unphysical effects. A final QA test involves the calculation of the root mean square fluctuations of atoms, which can be compared to crystallographic b-factors.

Convergence of energy terms

We start of with the extraction of some thermodynamic parameters from the energy file. The following properties will be investigated: temperature, pressure, potential energy, kinetic energy, unit cell volume, density, and the box dimensions. Energy analysis is performed with the tool g_energy. This program reads in the energy file, which is produced during the simulation (1LW9-MD.edr). g_energy will ask for a term to extract from the energy file and will produce a graph from that. Give the following command to run g_energy:

g_energy -f 1LW9-MD.edr -o temperature.xvg

This will give a list of the energy and related terms which are stored in the .edr energy file. In our case, there are 68 terms in the energy file, each of which may be extracted and graphed. The first nine entries correspond to different energy terms in the force field. Also note the entries from 47 onward, which list the terms split over the groups Protein and Non-Protein, including the interactions between the two. Type "temperature" or "14" and press Return twice. Have a look at the graph with the program xmgrace, and see how the temperature converges to the value specified (300 K). Also note the fluctuations in the temperature:

xmgrace temperature.xvg

Referencing the energy terms by name facilitates automatic processing of the energy file. Using 'echo' and a pipe ("|") to redirect output from one program to the other, the query from g_energy can be answered automatically. Copy and paste or type the following lines to

extract the other terms:

echo pressure 0 | g_energy -f 1LW9-MD.edr -o pressure.xvg

echo potential kinetic 0 | g_energy -f 1LW9-MD.edr -o energy.xvg

echo volume 0 | g_energy -f 1LW9-MD.edr -o volume.xvg

echo density 0 | g_energy -f 1LW9-MD.edr -o density.xvg

echo box-x box-y box-z 0 | g_energy -f 1LW9-MD.edr -o box.xvg

Have a look at each of these files and see how the values converge. If the values have not converged, this indicates that the simulation has not yet reached thermal equilibrium and that it should be extended before doing further analysis. Moreover, the period over which equilibration takes place should not be included in the analysis. Here, for the sake of simplicity, we will neglect these considerations and use the results from the simulation as is.

Minimum distances between periodic images

One of the most important things to check in terms of quality assurance is that there have not been direct interactions between periodic images. Since periodic images are identical, such interactions are unphysical self-interactions. Imagine that a protein with a dipole would have direct interactions. Then the attraction between the two ends of the same molecule over the periodic boundary would affect and invalidate the results relating to the native behaviour of the protein. To verify that such interactions have not taken place, we calculate the minimal distance between periodic images at each time. This is done using g_mindist.

g_mindist -f 1LW9-MD.xtc -s 1LW9-MD.tpr -od minimal-periodic-distance.xvg -pi

Root mean square fluctuations

Next to inspection of energies and such, convergence of the simulation towards equilibrium should also be inspected from the relaxation of the structure. Usually, such relaxation is only considered in terms of the Cartesian distance from the structure to a reference structure, e.g. the crystal structure. This distance is termed the root mean square deviation (RMSD). However, it is recommended also to investigate the relaxation towards the average structure, i.e. the RMSD with respect to the average. The reasons for this will be set out in the next paragraph. But to calculate the RMSD against the average structure, requires first to obtain the average. This structure can be obtained as a side product from calculation the root mean square fluctuations (RMSF). The RMSF captures, for each atom, the fluctuation about its average position. This gives insight into the flexibility of regions of the protein and corresponds to the crystallographic b-factors (temperature factors). Usually, one would expect similar profiles for the RMSF and the b-factors and this can be used to investigate whether the simulation results are in accordance with the crystal structure. The RMSF (and the average structure) are calculated with g_rmsf. The -oq options allows to calculate b-factors and add them to a reference structure:

g_rmsf -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o rmsf-all-atom.xvg -ox average.pdb -oq b

factors.pdb

Have a look at the graph of the RMSF with xmgrace and identify the flexible and rigid regions. Also view the two pdb files. Color the structure bfactors.pdb according to b-factors and inspect the flexible regions. The average structure is an unphysical structure. Have a look at some of the side chains, and notice the effect of averaging over conformations.

Convergence of RMSD

Now that we also have the average structure, we can calculate the RMSD. The RMSD is commonly used as an indicator of convergence of the structure towards an equilibrium state. As mentioned above, the RMSD is merely a distance measure. The RMSD is calculated using the program g_rms. First calculate the RMSD for all protein atoms, using the starting structure as a reference:

g_rms -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o rmsd-all-atom-vs-start.xvg

Have a look at the graph, and note at what time the RMSD levels off and at which value. Then calculate the RMSD again, but now only selecting the backbone atoms:

g_rms -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o rmsd-backbone-vs-start.xvg

This time the RMSD settles at a lower value, which is the result of excluding the, often flexible, side chain atoms. In both cases the RMSD increases to a plateau value. This means that the structure of the protein reaches a certain distance from the reference structure and then keeps that distance more or less. However, with increasing distance, the amount of conformations available also increases. This means that, although the RMSD has reached a plateau value, the structure may still be progressing towards its equilibrium state. For this reason, it is advisable also to check the convergence towards the average structure:

g_rms -f 1LW9-MD.xtc -s average.pdb -o rmsd-all-atom-vs-average.xvg

g_rms -f 1LW9-MD.xtc -s average.pdb -o rmsd-backbone-vs-average.xvg

Compare the resulting graphs to the previous ones. Note at which point the RMSD values level off.

Convergence of radius of gyration

As a final part of the QA, we calculate the radius of gyration. This measure gives an indication of the shape of the molecule at each time. The radius of gyration compares to the experimentally obtainable hyrdodynamic radius. It can be calculated using g_gyrate. This program will also give the individual components, which correspond to the eigenvalues of the matrix of inertia. This means that the first individual component corresponds to the longest axis of the molecule, while the last corresponds to the smallest. In effect, the three axes give a global indication of the shape of the molecule. Issue the command

g_gyrate -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o radius-of-gyration.xvg

Have a look at the radius of gyration and the individual components and note how each of these progress to an equilibrium value.

Structural analysis: properties derived from configurations

Having assured that the simulation has converged to an equilibrium state, it is time to do some real analysis. Analysis

of simulation data can be divided into several categories. The first category comprises of the interpretation of single configurations according to some metric to obtain a value, or a number of values, for each time point. The RMSD and the radius of gyration are examples. Such properties can be called configurational, dependent or instantaneous properties. Next to that, one can analyze processes in the time domain, e.g. through averaging, as (auto)correlations or fluctutations. In this section, a number of common analyses will be performed, each of which yields a time series of values directly derived from the trajectory (the coordinates over time).

Solvent accessible surface area

One property which can be of interest is the surface area of the protein which is accessible to solvent, commonly referred to as the solvent accessible surface (SAS) or the solvent accessible surface area (SASA). This can be further divided into a hydrophilic SAS and a hydrophobic SAS. In addition, the SAS can be used together with some empirical parameters to obtain an estimate for the free energy of solvation. All four of these parameters are calculated by the program g_sas. This program also allows to calculate the average SAS over time per residue and/or per atom. Issue the following command, specifying Protein both for the group to calculate the SAS for and for the output group, and have a look at the output files.

g_sas -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o solvent-accessible-surface.xvg -oa atomic-sas.xvg -or residue-sas.xvg

Hydrogen bonds

Another property which can be informative is the number of hydrogen bonds, either internally (Protein - Protein) or between the protein and the surrounding solvent. The presence or not of a hydrogen bond is inferred from the distance between a donor-H - acceptor pair and the donor - H - acceptor angle. To calculate the hydrogen bonds issue the following commands, and have a look at the output files using xmgrace:

g_hbond -f 1LW9-MD.xtc -s 1LW9-MD.tpr -num hydrogen-bonds-intra-protein.xvg

g_hbond -f 1LW9-MD.xtc -s 1LW9-MD.tpr -num hydrogen-bonds-protein-water.xvg

Salt bridges

Besides hydrogen bonds, proteins also often form salt bridges between oppositely charged residues. These can have an important stabilizing effect on the structure of the protein, in particular when they are embedded in a hydrophobic environment, such as the core of the protein. The existence of salt bridges can be investigated with g_saltbr. If requested (through the flag -sep) This program will generate an output file for every pair of oppositely charged residues which at some point in the trajectory are within a certain cut off distance from each other (here 1.0 nm, specified through the option -t). Execute the following command and look at the output generated.

g_saltbr -f 1LW9-MD.xtc -s 1LW9-MD.tpr -t 1.0 -sep

Secondary structure

Among the most common parameters to judge protein structure is the assignment of secondary structure

elements, such as α-helices and β-sheets. One such assignment is provided by dssp. This program, which is not part of the gromacs distribution, but can be obtained at the CMBI, Radboud University. Gromacs does provide an interface to dssp, to allow the calculation of secondary structure for each frame of a trajectory. To use this, first set an environment variable DSSP, which points to the location of the program. Then run do_dssp as follows:

setenv DSSP /home/coursead/Wassenaar/MD/dssp

do_dssp -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o secondary-structure.xpm -sc secondary-structure.xvg

The file secondary-structure.xvg contains a time series, listing the numbers of residues associated with each type of secondary structure per frame. More detailed information is in the .xpm file, which gives a color coded assignment of the secondary structure per residue over time. The .xpm file can be viewed with e.g. the Gimp, but some useful metadata can be added with the gromacs tool xpm2ps and the result can be viewed with gview or a similar program:

xpm2ps -f secondary-structure.xpm -o secondary-structure.eps

Ramachandran (phi/psi) plots

The phi and psi torsion angles of the protein backbone are two parameters which are useful to gain insight into the structural properties of a protein. The plot of phi against psi is called a Ramachandran plot and certain regions of this plot are characteristic of secondary structure elements or of amino acids. Other regions are considered forbidden (inaccessible). Projection of the phi/psi angles over time may give insight in structural transitions. These angles can be calculated by the program g_rama, albeit in a bit of a crude way, namely put all together in a single file. To investigate single residues, the linux progam 'grep' can be used to select them out of the plot.

g_rama -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o ramachandran.xvg

Analysis of dynamics and time-averaged properties

Analysis of interatomic distances, NOE's from simulations

Relaxation and order parameters

Configurational principal component analysis

The use of principal component analysis in molecular dynamics focuses on revealing the structure of atomic fluctuations.

An often used, but often little understood, method of analysis is principal component analysis (PCA) of the trajectory. This method, which is sometimes also referred to as 'essential dynamics' (ED), aims to identify large scale collective motions of atoms and thus reveal the structures underlying the atomic fluctuations. The fluctuations of particles in a molecular dynamics simulation are by definition correlated due to interactions between the particles. The degree of correlation will vary and notably particles which are directly connected through bonds or lie in the vicinity of each other will move in a concerted manner. The correlations between the motions of the particles give rise to structure in the total fluctuations in the system and for a macromolecule this structure

is often directly related to its function or (bio)physical properties. Thus, the study of the structure of the atomic fluctuations can give valuable insight in the behaviour of such macromolecules. However, it does require a certain level of understanding of linear algebra methods and multivariate statistics to interpret the results and identify the shortcomings of the method. In particular, the aim of principal component analysis is to describe the original data in terms of new variables which are linear combinations of the original ones. This is also the most important problem with PCA: it only offers an interpretation in terms of linear relationships between atomic motions.

The first step in PCA is the construction of the covariance matrix, which captures the degree of collinearity of atomic motions for each pair of atoms. The covariance matrix is, by definition, a symmetric matrix. This matrix is subsequently diagonalized, yielding a matrix of eigenvectors and a diagonal matrix of eigenvalues. Each of the eigenvectors describes a collective motion of particles, where the values of the vector indicate how much the corresponding atom participates in the motion. The associated eigenvalue gives equals the sum of the fluctuation described by the collective motion per atom, and thus is a measure for the total motility associated with an eigenvector. Usually most of the motion in the system (>90%) is described by less than 10 eigenvectors or principal components.

The construction and diagonalization of the covariance matrix can be performed with the program g_covar. Issue the following command to perform the analysis:

g_covar -s 1LW9-MD.tpr -f 1LW9-MD.xtc -o eigenvalues.xvg -v eigenvectors.trr -ascii covariances.dat

Select the backbone when asked for a selection. Constructing and diagonalizing the covariance matrix requires some time. When the program has finished, have a look at the plot of the eigenvalues and calculate how much of the total motility is explained by the first ten eigenvectors.

常用的分析数据方法

Molecular Dynamics Hands-on Session II

Tutorial on MD simulations using Gromacs 3.3.3

Tsjerk A. Wassenaar

Analysis of data from molecular dynamics simulation

When the simulation has finished, it is time to continue with the analysis of the data. The analysis comprises of two stages. First, it is necessary to perform some standard checks to assess the quality of the simulations. If the results from these analyses show that the simulations are fine, then the analyses required to answer the research question asked can be performed.

As a side note, many of the analysis tools produce graphs in the form of .xvg files. these files can be viewed with the program xmgrace, but they can also be viewed in a terminal using the python script xvg2ascii.py.

Quality assurance

The quality assurance (QA) involves tests for the convergence of thermodynamic parameters, such as the temperature, the pressure, the potential and the kinetic energy. Convergence is also checked in terms of the structure, through the root mean square deviation (RMSD) against the starting structure and the average structure. Next to that, it has to be checked that there have not been interactions between adjacent periodic images, as such interactions could lead to unphysical effects. A final QA test involves the calculation of the root mean square fluctuations of atoms, which can be compared to crystallographic b-factors.

Convergence of energy terms

We start of with the extraction of some thermodynamic parameters from the energy file. The following properties will be investigated: temperature, pressure, potential energy, kinetic energy, unit cell volume, density, and the box dimensions. Energy analysis is performed with the tool g_energy. This program reads in the energy file, which is produced during the simulation (1LW9-MD.edr). g_energy will ask for a term to extract from the energy file and will produce a graph from that. Give the following command to run g_energy:

g_energy -f 1LW9-MD.edr -o temperature.xvg

This will give a list of the energy and related terms which are stored in the .edr energy file. In our case, there are 68 terms in the energy file, each of which may be extracted and graphed. The first nine entries correspond to different energy terms in the force field. Also note the entries from 47 onward, which list the terms split over the groups Protein and Non-Protein, including the interactions between the two. Type "temperature" or "14" and press Return twice. Have a look at the graph with the program xmgrace, and see how the temperature converges to the value specified (300 K). Also note the fluctuations in the temperature:

xmgrace temperature.xvg

Referencing the energy terms by name facilitates automatic processing of the energy file. Using 'echo' and a pipe ("|") to redirect output from one program to the other, the query from g_energy can be answered automatically. Copy and paste or type the following lines to

extract the other terms:

echo pressure 0 | g_energy -f 1LW9-MD.edr -o pressure.xvg

echo potential kinetic 0 | g_energy -f 1LW9-MD.edr -o energy.xvg

echo volume 0 | g_energy -f 1LW9-MD.edr -o volume.xvg

echo density 0 | g_energy -f 1LW9-MD.edr -o density.xvg

echo box-x box-y box-z 0 | g_energy -f 1LW9-MD.edr -o box.xvg

Have a look at each of these files and see how the values converge. If the values have not converged, this indicates that the simulation has not yet reached thermal equilibrium and that it should be extended before doing further analysis. Moreover, the period over which equilibration takes place should not be included in the analysis. Here, for the sake of simplicity, we will neglect these considerations and use the results from the simulation as is.

Minimum distances between periodic images

One of the most important things to check in terms of quality assurance is that there have not been direct interactions between periodic images. Since periodic images are identical, such interactions are unphysical self-interactions. Imagine that a protein with a dipole would have direct interactions. Then the attraction between the two ends of the same molecule over the periodic boundary would affect and invalidate the results relating to the native behaviour of the protein. To verify that such interactions have not taken place, we calculate the minimal distance between periodic images at each time. This is done using g_mindist.

g_mindist -f 1LW9-MD.xtc -s 1LW9-MD.tpr -od minimal-periodic-distance.xvg -pi

Root mean square fluctuations

Next to inspection of energies and such, convergence of the simulation towards equilibrium should also be inspected from the relaxation of the structure. Usually, such relaxation is only considered in terms of the Cartesian distance from the structure to a reference structure, e.g. the crystal structure. This distance is termed the root mean square deviation (RMSD). However, it is recommended also to investigate the relaxation towards the average structure, i.e. the RMSD with respect to the average. The reasons for this will be set out in the next paragraph. But to calculate the RMSD against the average structure, requires first to obtain the average. This structure can be obtained as a side product from calculation the root mean square fluctuations (RMSF). The RMSF captures, for each atom, the fluctuation about its average position. This gives insight into the flexibility of regions of the protein and corresponds to the crystallographic b-factors (temperature factors). Usually, one would expect similar profiles for the RMSF and the b-factors and this can be used to investigate whether the simulation results are in accordance with the crystal structure. The RMSF (and the average structure) are calculated with g_rmsf. The -oq options allows to calculate b-factors and add them to a reference structure:

g_rmsf -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o rmsf-all-atom.xvg -ox average.pdb -oq b

factors.pdb

Have a look at the graph of the RMSF with xmgrace and identify the flexible and rigid regions. Also view the two pdb files. Color the structure bfactors.pdb according to b-factors and inspect the flexible regions. The average structure is an unphysical structure. Have a look at some of the side chains, and notice the effect of averaging over conformations.

Convergence of RMSD

Now that we also have the average structure, we can calculate the RMSD. The RMSD is commonly used as an indicator of convergence of the structure towards an equilibrium state. As mentioned above, the RMSD is merely a distance measure. The RMSD is calculated using the program g_rms. First calculate the RMSD for all protein atoms, using the starting structure as a reference:

g_rms -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o rmsd-all-atom-vs-start.xvg

Have a look at the graph, and note at what time the RMSD levels off and at which value. Then calculate the RMSD again, but now only selecting the backbone atoms:

g_rms -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o rmsd-backbone-vs-start.xvg

This time the RMSD settles at a lower value, which is the result of excluding the, often flexible, side chain atoms. In both cases the RMSD increases to a plateau value. This means that the structure of the protein reaches a certain distance from the reference structure and then keeps that distance more or less. However, with increasing distance, the amount of conformations available also increases. This means that, although the RMSD has reached a plateau value, the structure may still be progressing towards its equilibrium state. For this reason, it is advisable also to check the convergence towards the average structure:

g_rms -f 1LW9-MD.xtc -s average.pdb -o rmsd-all-atom-vs-average.xvg

g_rms -f 1LW9-MD.xtc -s average.pdb -o rmsd-backbone-vs-average.xvg

Compare the resulting graphs to the previous ones. Note at which point the RMSD values level off.

Convergence of radius of gyration

As a final part of the QA, we calculate the radius of gyration. This measure gives an indication of the shape of the molecule at each time. The radius of gyration compares to the experimentally obtainable hyrdodynamic radius. It can be calculated using g_gyrate. This program will also give the individual components, which correspond to the eigenvalues of the matrix of inertia. This means that the first individual component corresponds to the longest axis of the molecule, while the last corresponds to the smallest. In effect, the three axes give a global indication of the shape of the molecule. Issue the command

g_gyrate -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o radius-of-gyration.xvg

Have a look at the radius of gyration and the individual components and note how each of these progress to an equilibrium value.

Structural analysis: properties derived from configurations

Having assured that the simulation has converged to an equilibrium state, it is time to do some real analysis. Analysis

of simulation data can be divided into several categories. The first category comprises of the interpretation of single configurations according to some metric to obtain a value, or a number of values, for each time point. The RMSD and the radius of gyration are examples. Such properties can be called configurational, dependent or instantaneous properties. Next to that, one can analyze processes in the time domain, e.g. through averaging, as (auto)correlations or fluctutations. In this section, a number of common analyses will be performed, each of which yields a time series of values directly derived from the trajectory (the coordinates over time).

Solvent accessible surface area

One property which can be of interest is the surface area of the protein which is accessible to solvent, commonly referred to as the solvent accessible surface (SAS) or the solvent accessible surface area (SASA). This can be further divided into a hydrophilic SAS and a hydrophobic SAS. In addition, the SAS can be used together with some empirical parameters to obtain an estimate for the free energy of solvation. All four of these parameters are calculated by the program g_sas. This program also allows to calculate the average SAS over time per residue and/or per atom. Issue the following command, specifying Protein both for the group to calculate the SAS for and for the output group, and have a look at the output files.

g_sas -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o solvent-accessible-surface.xvg -oa atomic-sas.xvg -or residue-sas.xvg

Hydrogen bonds

Another property which can be informative is the number of hydrogen bonds, either internally (Protein - Protein) or between the protein and the surrounding solvent. The presence or not of a hydrogen bond is inferred from the distance between a donor-H - acceptor pair and the donor - H - acceptor angle. To calculate the hydrogen bonds issue the following commands, and have a look at the output files using xmgrace:

g_hbond -f 1LW9-MD.xtc -s 1LW9-MD.tpr -num hydrogen-bonds-intra-protein.xvg

g_hbond -f 1LW9-MD.xtc -s 1LW9-MD.tpr -num hydrogen-bonds-protein-water.xvg

Salt bridges

Besides hydrogen bonds, proteins also often form salt bridges between oppositely charged residues. These can have an important stabilizing effect on the structure of the protein, in particular when they are embedded in a hydrophobic environment, such as the core of the protein. The existence of salt bridges can be investigated with g_saltbr. If requested (through the flag -sep) This program will generate an output file for every pair of oppositely charged residues which at some point in the trajectory are within a certain cut off distance from each other (here 1.0 nm, specified through the option -t). Execute the following command and look at the output generated.

g_saltbr -f 1LW9-MD.xtc -s 1LW9-MD.tpr -t 1.0 -sep

Secondary structure

Among the most common parameters to judge protein structure is the assignment of secondary structure

elements, such as α-helices and β-sheets. One such assignment is provided by dssp. This program, which is not part of the gromacs distribution, but can be obtained at the CMBI, Radboud University. Gromacs does provide an interface to dssp, to allow the calculation of secondary structure for each frame of a trajectory. To use this, first set an environment variable DSSP, which points to the location of the program. Then run do_dssp as follows:

setenv DSSP /home/coursead/Wassenaar/MD/dssp

do_dssp -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o secondary-structure.xpm -sc secondary-structure.xvg

The file secondary-structure.xvg contains a time series, listing the numbers of residues associated with each type of secondary structure per frame. More detailed information is in the .xpm file, which gives a color coded assignment of the secondary structure per residue over time. The .xpm file can be viewed with e.g. the Gimp, but some useful metadata can be added with the gromacs tool xpm2ps and the result can be viewed with gview or a similar program:

xpm2ps -f secondary-structure.xpm -o secondary-structure.eps

Ramachandran (phi/psi) plots

The phi and psi torsion angles of the protein backbone are two parameters which are useful to gain insight into the structural properties of a protein. The plot of phi against psi is called a Ramachandran plot and certain regions of this plot are characteristic of secondary structure elements or of amino acids. Other regions are considered forbidden (inaccessible). Projection of the phi/psi angles over time may give insight in structural transitions. These angles can be calculated by the program g_rama, albeit in a bit of a crude way, namely put all together in a single file. To investigate single residues, the linux progam 'grep' can be used to select them out of the plot.

g_rama -f 1LW9-MD.xtc -s 1LW9-MD.tpr -o ramachandran.xvg

Analysis of dynamics and time-averaged properties

Analysis of interatomic distances, NOE's from simulations

Relaxation and order parameters

Configurational principal component analysis

The use of principal component analysis in molecular dynamics focuses on revealing the structure of atomic fluctuations.

An often used, but often little understood, method of analysis is principal component analysis (PCA) of the trajectory. This method, which is sometimes also referred to as 'essential dynamics' (ED), aims to identify large scale collective motions of atoms and thus reveal the structures underlying the atomic fluctuations. The fluctuations of particles in a molecular dynamics simulation are by definition correlated due to interactions between the particles. The degree of correlation will vary and notably particles which are directly connected through bonds or lie in the vicinity of each other will move in a concerted manner. The correlations between the motions of the particles give rise to structure in the total fluctuations in the system and for a macromolecule this structure

is often directly related to its function or (bio)physical properties. Thus, the study of the structure of the atomic fluctuations can give valuable insight in the behaviour of such macromolecules. However, it does require a certain level of understanding of linear algebra methods and multivariate statistics to interpret the results and identify the shortcomings of the method. In particular, the aim of principal component analysis is to describe the original data in terms of new variables which are linear combinations of the original ones. This is also the most important problem with PCA: it only offers an interpretation in terms of linear relationships between atomic motions.

The first step in PCA is the construction of the covariance matrix, which captures the degree of collinearity of atomic motions for each pair of atoms. The covariance matrix is, by definition, a symmetric matrix. This matrix is subsequently diagonalized, yielding a matrix of eigenvectors and a diagonal matrix of eigenvalues. Each of the eigenvectors describes a collective motion of particles, where the values of the vector indicate how much the corresponding atom participates in the motion. The associated eigenvalue gives equals the sum of the fluctuation described by the collective motion per atom, and thus is a measure for the total motility associated with an eigenvector. Usually most of the motion in the system (>90%) is described by less than 10 eigenvectors or principal components.

The construction and diagonalization of the covariance matrix can be performed with the program g_covar. Issue the following command to perform the analysis:

g_covar -s 1LW9-MD.tpr -f 1LW9-MD.xtc -o eigenvalues.xvg -v eigenvectors.trr -ascii covariances.dat

Select the backbone when asked for a selection. Constructing and diagonalizing the covariance matrix requires some time. When the program has finished, have a look at the plot of the eigenvalues and calculate how much of the total motility is explained by the first ten eigenvectors.


相关内容

  • 网络营销常用的数据分析方法
  • www.guojing.net _________ 网络营销常用的数据分析方法 网络营销常用的数据分析方法 , 首先数据分析在SEM 中是最为基础的技能,说得简单点,数据分析就是为了发现问题,并为解决问题提供数据参考. 网络营销常用的数据分析方法 , 有经验的SEMer 都知道,尽信数据则不如无数据 ...

  • 孟莉附件7[1].22新入职护士规范化培训大纲
  • 附件 新入职护士规范化培训大纲(征求意见稿) 一.适用范围 各级各类综合医院,其他医疗卫生机构参照执行. 二.培训目标 根据<护士条例>等,结合推进优质护理服务工作要求,开展新入职护士的规范化培训.通过培训,使新入职护士掌握从事临床护理工作的基础理论.基本知识和基本技能:具备良好的职业道 ...

  • 软考-信息技术处理员考试大纲
  • 信息处理技术员考试大纲 一.考 试 说 明 1.考试要求: (1) 了解信息技术的基本概念; (2) 熟悉计算机的组成.各主要部件的功能和性能指标; (3) 了解计算机网络与多媒体基础知识; (4) 熟悉信息处理常用设备; (5) 熟悉计算机系统安装和维护的基本知识; (6) 熟悉计算机信息处理的基 ...

  • 无梁楼盖的计算方法
  • [摘 要]近些年来,随着科学技术的不断向前发展,多种科学技术手段已经逐渐的运用到了各种建筑的建设过程当中.其中,无梁楼盖就是一种比较新型的建筑方式,这种建筑方式具有多个方面的有点,因此被大量的运用到了建筑的建设过程当中.但是,无梁楼盖也存在着受力复杂方面的特点.为了保证无梁楼盖的建设质量,无梁楼盖在 ...

  • 卫生统计学教学大纲
  • 卫生统计学教学大纲 (供预防医学专业五年制本科生使用) 前言 卫生统计学是研究居民健康状况以及卫生服务领域中数据的收集.整理和分析的一门科学.本课程的教学目的是为学生在校学习专业课程,毕业后从事公共卫生领域的研究和实际工作,打下必要的卫生统计学基础.在学习本课程时,应注意掌握卫生统计学的基本理论.基 ...

  • 数字化设计手册(软件版)
  • 制造业信息化最佳平台和产品开发设计途径 最前沿的制造业信息化资源库 最重要的企业核心竞争力加速提升平台 最权威的开发研制机构阵容 最专业的制造业最新科研成果与技术 最有效的制造业产品开发设计途径 最广泛的制造业信息化应用支撑系统 试与争锋,舍我其谁.数字手册系列软件,震撼上市,强烈推荐!! 什么是制 ...

  • 机电一体化专业课程大纲总汇
  • 机电一体化专业课程教学大纲 目 录 <机械制图与计算机绘图>教学大纲 .............................................. 2 <机械制图与计算机绘图>教学大纲 ..................................... ...

  • 机械设计手册
  • <机械设计手册(软件网络版)>V3.0是一种面向机械制造企业的综合通用资源和设计技术支持环境平台,它包括了通用基础信息资源.通用标准的技术数据资源.常用机械零部件设计计算软件.气动液压设计资源和方法库.常用工程计算器以及机械工程英汉辞典等分系统组成.在基础标准资源数据库方面的主要内容是: ...

  • 电子测量实训报告
  • 一. 实训目的(1) 熟悉常用电子仪器的功能及使用方法。(2) 掌握常用电子仪器的工作原理。(3) 掌握常用电子仪器附加功能的使用。(4) 熟练使用常用电子仪器进行数据测量。(5) 掌握常用电子元器件的测量方法,掌握电子元器件的焊接技巧和装配工艺;学会 使用万用表、示波器、毫伏表、频率计、 信号发生 ...

  • 09模具与数控技术应用专业(3+2)教学计划
  • 09模具与数控技术应用专业(3+2)教学计划 一.招生对象与学制 本专业招收初中毕业生或具有同等学力者,学制5年. 二.培养目标与业务范围 (一) 培养目标 本专业培养德.智.体.美等全面发展,具有一定的基础理论和专业知识.创新精神和较强实践能力,掌握模具设计与制造专业必需的基础理论.专业知识.先进 ...