Difference between revisions of "Walkthrough on PCA through the command line"

From Dynamo
Jump to navigation Jump to search
Line 104: Line 104:
==TSNE reduction==
==TSNE reduction==
[[ File:walkthroughPCACommandLine_generalToolForExploringTables.png |thumb|center| 600px|general tool for exploring tables]]
[[ File:walkthroughPCACommandLine_scatterTabTunedToShowCobehaviourOfFirstTwoEigencomponents.png |thumb|center| 600px|scatter tab tuned to show cobehaviour of first two eigencomponents]]
[[ File:walkthroughPCACommandLine_scatterplotEigencomponents.png |thumb|center| 600px|scatterplot eigencomponents]]
[[ File:walkthroughPCACommandLine_rightClickToGetTheOptionsToManuallySelectParticlesThroughALassoTool.png |thumb|center| 600px|right click to get the options to manually select particles through a lasso tool]]
[[ File:walkthroughPCACommandLine_lassoParticles.png |thumb|center| 600px|lasso particles]]
[[ File:walkthroughPCACommandLine_rightClickOnTheLassoToGetOptionsOnThatSubsetOfParticles.png |thumb|center| 600px|right click on the lasso to get options on that subset of particles]]
[[ File:walkthroughPCACommandLine_firstEigencomponentOfEachParticle.png |thumb|center| 600px|first eigencomponent of each particle]]
[[ File:walkthroughPCACommandLine_slidingMontageOnTheDistributionOfSeveralEigencomponent.png |thumb|center| 600px|sliding montage on the distribution of several eigencomponent]]
[[ File:walkthroughPCACommandLine_slidingMontageOfSetsOfEigencomponent.png |thumb|center| 600px|sliding montage of sets of eigencomponent]]
[[ File:walkthroughPCACommandLine_eigenvolumes.png |thumb|center| 600px|eigenvolumes ]]
[[ File:walkthroughPCACommandLine_eigenvolumesAfterNomrmalization.png |thumb|center| 600px|eigenvolumes after nomrmalization]]
[[ File:walkthroughPCACommandLine_correlationOfEigencomponensWithTilt.png |thumb|center| 600px|correlation of eigencomponens with tilt]]
[[ File:walkthroughPCACommandLine_rightClickOnEachPointToAccessTheParticle.png |thumb|center| 600px|right click on each point to access the particle]]
[[ File:walkthroughPCACommandLine_tsneClustering.png |thumb|center| 600px|tsne clustering]]
[[ File:walkthroughPCACommandLine_rightClickOnTheAxesForAnAutomatedClustering.png |thumb|center| 600px|right click on the axes for an automated clustering]]
[[ File:walkthroughPCACommandLine_automatedClustering.png |thumb|center| 600px|automated clustering]]
[[ File:walkthroughPCACommandLine_rightClickOnTheAxesToSelectASetOfParticles.png |thumb|center| 600px|right click on the axes to select a set of particles]]
[[ File:walkthroughPCACommandLine_useTheLassoToolToSelectAGroup.png |thumb|center| 600px|use the lasso tool to select a group]]
[[ File:walkthroughPCACommandLine_lassoedParticlesCanBeAveragedTogether.png |thumb|center| 600px|lassoed particles can be averaged together]]
[[ File:walkthroughPCACommandLine_averageOfParticlesInsideLasso.png |thumb|center| 600px|average of particles inside lasso]]
[[ File:walkthroughPCACommandLine_aSecondLassoCanBeCreated.png |thumb|center| 600px|a second lasso can be created]]
[[ File:walkthroughPCACommandLine_averageAllManuallySelectedClusters.png |thumb|center| 600px|average all manually selected clusters]]
[[ File:walkthroughPCACommandLine_dmapviewOnOpening.png |thumb|center| 600px|dmapview on opening]]
[[ File:walkthroughPCACommandLine_showingSeveralVolumesInDmapview.png |thumb|center| 600px|showing several volumes in dmapview]]
[[ File:walkthroughPCACommandLine_correspondigSlicesOf.png |thumb|center| 600px|correspondig slices of  ]]
[[ File:walkthroughPCACommandLine_youCanSelectASingleSliceForDepiction.png |thumb|center| 600px|you can select a single slice for depiction]]
[[ File:walkthroughPCACommandLine_useKeys1And2ToSetAnchors.png |thumb|center| 600px|use keys 1 and 2 to set anchors]]
[[ File:walkthroughPCACommandLine_rightClickOneAnchorToShowAnIntensityProfile.png |thumb|center| 600px|right click one anchor to show an intensity profile]]
[[ File:walkthroughPCACommandLine_intensityProfile.png |thumb|center| 600px|intensity profile]]

Revision as of 15:07, 3 April 2020

PCA computations through the command line are governed through PCA workflow objects. We describe here how to create and handle them:

Creation of a synthetic data set

dtutorial ttest128 -M 64 -N 64 -linear_tags 1 -tight 1

This generates a set of 128 particles where 64 are slightly closer than the other 64. The particle subtomogram are randomly oriented, but the alignment parameters are known.

Creation of a workflow

Input elements

The input of a PCA workflow are:

  • a set of particles (called data container in this article)
  • a table that expreses the alignment
  • a mask that indicates the area of each alignment particle that will be taken into account during the classification procedure.


dataFolder = 'ttest128/data';


tableFile  = 'ttest128/real.tbl';


We create a cylindrical mask with the dimensions of the particles (40 pixels) mask = dcylinder([20,20],40);


We decide a name for the workflow itself, for instance

name = 'classtest128';

Now we are ready to create the workflow:

 wb = dpkpca.new(name,'t',tableFile,'d',dataFolder,'m',mask);

This creates an workflow object (arbitrarily called wb in the workspace during the current session). It also creates a folder called classtest128.PCA where results will be stored as they are produced.

Mathematical parameters

The main parameters that can be chosen in this area are:

  • bandpass
  • symmetry
  • binning level (to accelerate the computations)

Computational parameters

The main burden of the PCA computation is the creation of the cross correlation matrix.

Computing device

PCA computations can be run on GPUs of on CPUs, in both cases in parallel.

Size of parallel blocks



In this workflow we run the steps one by one to discuss them. In real workflows, you can use the run methods to just launch all steps sequentially.



Correlation matrix

All pairs of correlations are computed in blocks, as described above



The correlation matrix is diagonalised. The eigenvectors are used to expressed as the particles as combinations of weights.


These weights are ordered in descending order relative to their impact on the variance of the set, ideally a particle should be represented by its few components on this basis. The weights are stored in a regula Dynamo table. First eigencomponent of a particle goes into column 41.


The eigenvectors are expressed as three=dimensional volumes.


TSNE reduction

TSNE remaps the particles into 2D maps which can be visualised and operated interactively.



Computed elements have been stored in the workflow folder. Some of them () can be directly access through workflow tools.

Correlation matrix


figure;dshow(cmm);h=gca();h.YDir = 'reverse';


Series of plots

To check all the eigencomponents, it is a good idea to do some scripting. The script below uses a handy Dynamo trick to create several plots in the same figure.

 gui = mbgraph.montage();
for i=1:10
    % gui.gca captures the

Series of histograms



Correlation of tilts

It is a good idea to check if some eigenvolumes correlate strongly with the tilt.


In this plot, each point represents a particle in your data set. We see that in this particular experiment, eigencomponent 3 seems to have been "corrupted by the missing wedge"

TSNE reduction

general tool for exploring tables
scatter tab tuned to show cobehaviour of first two eigencomponents
scatterplot eigencomponents
right click to get the options to manually select particles through a lasso tool
lasso particles
right click on the lasso to get options on that subset of particles
first eigencomponent of each particle
sliding montage on the distribution of several eigencomponent
sliding montage of sets of eigencomponent
eigenvolumes after nomrmalization
correlation of eigencomponens with tilt
right click on each point to access the particle
tsne clustering
right click on the axes for an automated clustering
automated clustering
right click on the axes to select a set of particles
use the lasso tool to select a group
lassoed particles can be averaged together
average of particles inside lasso
a second lasso can be created
average all manually selected clusters
dmapview on opening
showing several volumes in dmapview
correspondig slices of
you can select a single slice for depiction
use keys 1 and 2 to set anchors
right click one anchor to show an intensity profile
intensity profile