Difference between revisions of "Walkthrough on PCA through the command line"

From Dynamo
Jump to navigation Jump to search
 
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[PCA]] computations through the command line are governed through ''PCA workflow'' objects. We describe here how to create and handle them:
+
[[Principal component analysis|PCA]] computations through the command line are governed through ''PCA workflow'' objects. We describe here how to create and handle them:
  
 
= Creation of a synthetic data set=
 
= Creation of a synthetic data set=
  
  [[dtutorial ttest128 -M 64 -N 64 -linear_tags 1 -tight 1]]
+
  <tt>dtutorial ttest128 -M 64 -N 64 -linear_tags 1 -tight 1</tt>
  
This generates a set of 128 particles where 64 are slightly closer than the other 64. The particle subtomogram are randomly oriented, but the alignment parameters are known.  
+
This generates a set of 128 particles where 64 of the, are slightly smaller than the other 64. The particle subtomogram are randomly oriented, but the alignment parameters are known.
  
 
= Creation of a workflow=
 
= Creation of a workflow=
Line 14: Line 14:
 
* a table that expreses the alignment
 
* a table that expreses the alignment
 
* a mask that indicates the area of each alignment particle that will be taken into account during the classification procedure.
 
* a mask that indicates the area of each alignment particle that will be taken into account during the classification procedure.
 +
All theree have been generated by the <tt>dtutorial</tt> command.
 
=== Data ===
 
=== Data ===
  [[dataFolder = 'ttest128/data';]]
+
  <tt>dataFolder = 'ttest128/data';</tt>
 +
 
 
=== Table ===
 
=== Table ===
  [[tableFile  = 'ttest128/real.tbl';]]
+
  <tt>tableFile  = 'ttest128/real.tbl';</tt>
 +
 
 
=== Mask ===
 
=== Mask ===
 
We create a cylindrical mask with the dimensions of the particles (40 pixels)
 
We create a cylindrical mask with the dimensions of the particles (40 pixels)
mask = dcylinder([20,20],40);
+
<tt>mask = dcylinder([20,20],40);</tt>
 
===Syntax ===
 
===Syntax ===
 
We decide a name for the workflow itself, for instance
 
We decide a name for the workflow itself, for instance
  [[name = 'classtest128';]]
+
  <tt>name = 'classtest128';</tt>
 
Now we are ready to create the workflow:
 
Now we are ready to create the workflow:
   [[wb = dpkpca.new(name,'t',tableFile,'d',dataFolder,'m',mask);]]
+
   <tt>wb = dpkpca.new(name,'t',tableFile,'d',dataFolder,'m',mask);</tt>
This creates an workflow object (arbitrarily called wb in the workspace during the current session). It also creates a folder called classtest128.PCA where results will be stored as they are produced.
+
This creates an workflow object (arbitrarily called <tt>wb</tt>) in the workspace during the current session). It also creates a folder called classtest128.PCA where results will be stored as they are produced.
  
 
== Mathematical parameters ==
 
== Mathematical parameters ==
 
The main parameters that can be chosen in this area are:
 
The main parameters that can be chosen in this area are:
 
* bandpass
 
* bandpass
 +
<tt>wb.setBand([0,0.5,2]); % in Nyquist.  % Third parameter is the smoothing factor in pixels</tt>
 
* symmetry
 
* symmetry
* binning level (to accelerate the computations)
+
<tt>wb.setSym('c8'); </tt>
 
+
* binning level (to accelerate the computations);
 +
<tt>wb.settings.general.bin.value = 0;</tt>
  
 
==Computational parameters==
 
==Computational parameters==
 
The main burden of the PCA computation is the creation of the cross correlation matrix.
 
The main burden of the PCA computation is the creation of the cross correlation matrix.
 
=== Computing device ===
 
=== Computing device ===
PCA computations can be run on GPUs of on CPUs, in both cases in parallel.
+
PCA computations can be run on GPUs of on CPUs, in both cases in parallel. In each block, two subsets of (prealigned) particles are read from disk.
 +
 
 +
To use all the available cores in your local machine, write:
 +
<tt>wb.settings.computing.cores.value = '*';</tt>
 +
 
 +
==== GPU computing ====
 +
By default, the ccmatrix computation will run on the CPUs. To force computation in the GPUs you need to active them first:
 +
<tt>wb.settings.computing.useGpus.value = true;</tt>
 +
and the specify the device numbers to use:
 +
<tt>wb.settings.computing.gpus.value = [0:7];</tt>
 +
In this example, you would be using 8 GPUs, marked by the systems with numbers 0 to 7.
 +
 
 +
Take into account that your system needs an independent CPU for each used GPU.
 +
 
 
=== Size of parallel blocks ===
 
=== Size of parallel blocks ===
The  
+
The size of the blocks is controlled by the parameter <tt>batch</tt>. You can tune it with:
 +
<tt>wb.setBatch(100);</tt>
 +
meaning that each block executed in parallel will preload two sets of 100 (prealigned) particles into memory, and then proceed to cross correlate all possible pairs. Ideally, you should tune your <tt>batch</tt> parameters l to have as many blocks as processing units (CPU cores or GPUs).
 +
IThis might not be possible, as it might crowd the memory of your device. Thus, in real applications you should aim at using the <tt>batch</tt> parameter to enforce a tradeoff between performance and  memory usage suited to your system.
 +
 
 +
If you get an error when running <tt>wb.setBatch(n);</tt> this is a bug. Try unfolding the project first (<tt>wb.unfold()</tt>), setting the batch number then unfolding again before running calculations.
 +
 
 +
==== Utilities to set the batch number ====
 +
You can get some guidance for the selection of the batch number with
 +
<tt>wb.getBlockSize()</tt>
 +
which will return the size of each block (in Gibabytes) for the current batch parameter.
 +
This number depends of the mask, as the block contains in memory only the part of each particle that is inside the mask.
 +
 
 +
== Unfolding==
 +
Before running the workflow, it needs to be ''unfolded''. This operation checks the mutual coherence of the different elements of the workflow, and prepares the file system for runtime.
 +
 
 +
<tt>wb.unfold();</tt>
  
 
= Running=
 
= Running=
 +
 
In this workflow we run the steps one by one to discuss them. In real workflows, you can use the <tt>run</tt> methods to just launch all steps sequentially.
 
In this workflow we run the steps one by one to discuss them. In real workflows, you can use the <tt>run</tt> methods to just launch all steps sequentially.
  
Line 55: Line 90:
 
The correlation matrix is diagonalised. The eigenvectors are used to expressed as  the particles as combinations of weights.
 
The correlation matrix is diagonalised. The eigenvectors are used to expressed as  the particles as combinations of weights.
 
  <tt>wb.steps.items.eigentable.compute(); </tt>
 
  <tt>wb.steps.items.eigentable.compute(); </tt>
These weights are ordered in descending order relative to their impact on the variance of the set, ideally  a particle should be represented by its few components on this basis. The weights are stored in a regula ''Dynamo'' table. First eigencomponent of a particle goes into column 41.
+
These weights are ordered in descending order relative to their impact on the variance of the set, ideally  a particle should be represented by its few components on this basis. The weights are stored in a regular ''Dynamo'' table. First eigencomponent of a particle goes into column 41.
  
 
== Eigenvolumes ==
 
== Eigenvolumes ==
Line 63: Line 98:
 
== TSNE reduction ==
 
== TSNE reduction ==
 
TSNE remaps the particles into 2D maps which can be visualised and operated interactively.
 
TSNE remaps the particles into 2D maps which can be visualised and operated interactively.
  <tt>wb.steps.items.tsene.compute(); </tt>
+
  <tt>wb.steps.items.tsne.compute(); </tt>
  
 
=Visualization=
 
=Visualization=
Line 70: Line 105:
  
 
== Correlation matrix==
 
== Correlation matrix==
 +
We can visualise the cross correlation matrix
  
 
<tt>m=wb.getCCMatrix();</tt>
 
<tt>m=wb.getCCMatrix();</tt>
  
<tt>figure;dshow(cmm);h=gca();h.YDir = 'reverse'; </tt>
+
<tt>figure;dshow(m);h=gca();</tt>
 +
<tt>h.YDir = 'reverse'; </tt>
  
 
== Eigencomponents ==
 
== Eigencomponents ==
 +
 +
=== Predefined exploring  tools===
 +
You can use a general browser for ''Dynamo'' tables:
 +
<tt>wb.exploreGUI;</tt>
 +
[[ File:walkthroughPCACommandLine_generalToolForExploringTables.png |thumb|center| 600px|general tool for exploring tables]]
 +
 +
Advanced users: This is just a wrapper to the function <tt>dpktbl.gui</tt> applied on the eigentable produced by the workflow.
 +
 +
For instance, you can check how two eigencomponents relate to each other:
 +
 +
[[ File:walkthroughPCACommandLine_scatterTabTunedToShowCobehaviourOfFirstTwoEigencomponents.png |thumb|center| 600px|scatter tab tuned to show co-behaviour of first two eigencomponents]]
 +
 +
Pressing the [Scatter 2D] button will create this interactive plot
 +
[[ File:walkthroughPCACommandLine_scatterplotEigencomponents.png |thumb|center| 600px|scatterplot eigencomponents]]
 +
Where each point represents a different particle.
 +
Right-click on it to create a "lasso", a tool to hand-draw sets of particles.
 +
[[ File:walkthroughPCACommandLine_rightClickToGetTheOptionsToManuallySelectParticlesThroughALassoTool.png |thumb|center| 600px|right click to get the options to manually select particles through a lasso tool]]
 +
[[ File:walkthroughPCACommandLine_lassoParticles.png |thumb|center| 200px|lasso particles]]
 +
 +
Right clicking on the "lassoed" particles give you the option of saving the information on the selected set of particles.
 +
[[ File:walkthroughPCACommandLine_rightClickOnTheLassoToGetOptionsOnThatSubsetOfParticles.png |thumb|center| 600px|right click on the lasso to get options on that subset of particles]]
 +
 +
=== Custom approach===
 +
You can use your own methods to visualize the eigencomponents. They can be accessed through:
 +
<tt>m = wb.getEigencomponents();</tt>
 +
will produce a matrix <tt>m</tt> where each column represents an eigenvector and each row a particle. Thus, to see how a particular eigencomponent distributes among the particles, you can just write:
 +
 +
<tt>plot(m(:,i),'.');</tt>
 +
 +
[[ File:walkthroughPCACommandLine_firstEigencomponentOfEachParticle.png |thumb|center| 600px|first eigencomponent of each particle]]
  
 
=== Series of plots===
 
=== Series of plots===
Line 88: Line 155:
 
gui.shg(); </nowiki>
 
gui.shg(); </nowiki>
  
 +
this will produce ten plots (as i=1:10) collected in a single GUI. Each plot is called a "frame", and you can view them sequentially or in sets, just play with the layout given by the rows and columns, and use the [Refresh] icon on the top left of the GUI.
 +
 +
[[ File:walkthroughPCACommandLine_slidingMontageOnTheDistributionOfSeveralEigencomponent.png |thumb|center| 600px|sliding montage on the distribution of several eigencomponent]]
 
=== Series of histograms ===
 
=== Series of histograms ===
 
+
[[ File:walkthroughPCACommandLine_slidingMontageOfSetsOfEigencomponent.png |thumb|center| 600px|sliding montage of sets of eigencomponent]]
 
== Eigenvolumes ==
 
== Eigenvolumes ==
  
 
  <tt>eigSet=wb.getEigenvolume(1:30);</tt>
 
  <tt>eigSet=wb.getEigenvolume(1:30);</tt>
 +
This creates a cell array (arbitrarily called <tt>eigSet</tt>). We can visualise it through:
  
 
  <tt>mbvol.groups.montage(eigSet); </tt>
 
  <tt>mbvol.groups.montage(eigSet); </tt>
 +
[[ File:walkthroughPCACommandLine_eigenvolumes.png |thumb|center| 600px|eigenvolumes ]]
 +
 +
This plot is showing the true relative intensity of the eigenvolumes. In order to compare them, we can show the normalised eigenvolumes instead:
  
 
  <tt>mbvol.groups.montage(dynamo_normalize_roi(eigSet)); </tt>
 
  <tt>mbvol.groups.montage(dynamo_normalize_roi(eigSet)); </tt>
 
+
[[ File:walkthroughPCACommandLine_eigenvolumesAfterNomrmalization.png |thumb|center| 600px|eigenvolumes after nomrmalization]]
 
==Correlation of tilts==
 
==Correlation of tilts==
 
It is a good idea to check if some eigenvolumes correlate strongly with the tilt.  
 
It is a good idea to check if some eigenvolumes correlate strongly with the tilt.  
 
  <tt>wb.show.correlationEigenvectorTilt(1:10) </tt>
 
  <tt>wb.show.correlationEigenvectorTilt(1:10) </tt>
 +
[[ File:walkthroughPCACommandLine_correlationOfEigencomponensWithTilt.png |thumb|center| 600px|correlation of eigencomponens with tilt]]
 +
 +
Remember that each particle is accessible through right clicking on it.
 +
[[ File:walkthroughPCACommandLine_rightClickOnEachPointToAccessTheParticle.png |thumb|center| 600px|right click on each point to access the particle]]
 +
 
In this plot, each point represents a particle in your data set. We see that in this particular experiment, eigencomponent 3 seems to have been "corrupted by the missing wedge"
 
In this plot, each point represents a particle in your data set. We see that in this particular experiment, eigencomponent 3 seems to have been "corrupted by the missing wedge"
  
 
==TSNE reduction==
 
==TSNE reduction==
  
 +
We create on the fly the TSNE reduction for eigencomponents:
 +
<tt>wb.show.tsne([1,2,4,5]);</tt>
 +
will produce the graphics:
 +
[[ File:walkthroughPCACommandLine_tsneClustering.png |thumb|center| 600px|tsne clustering]]
 +
TSNE has created a bidimensional proximity map for the 4-dimensional distribution induced by the 4 selected eigenvectors. Note that you can enter a cell array of several sets of indices. You could then navigate through different indices to check the shape of the TSNE reduction for different selections of eigenvolumes:
 +
<tt># wb.show.tsne({[1,2,4,5] , [1:6]});</tt>
  
[[ File:walkthroughPCACommandLine_generalToolForExploringTables.png |thumb|center| 600px|general tool for exploring tables]]
+
=== Automated clustering ===
[[ File:walkthroughPCACommandLine_scatterTabTunedToShowCobehaviourOfFirstTwoEigencomponents.png |thumb|center| 600px|scatter tab tuned to show cobehaviour of first two eigencomponents]]
+
Right clicking on a plot you get an option to automatically segment it using the <tt>dbscan</tt> method
[[ File:walkthroughPCACommandLine_scatterplotEigencomponents.png |thumb|center| 600px|scatterplot eigencomponents]]
+
[[ File:walkthroughPCACommandLine_rightClickOnTheAxesForAnAutomatedClustering.png |thumb|center| 200px|right click on the axes for an automated clustering]]
[[ File:walkthroughPCACommandLine_rightClickToGetTheOptionsToManuallySelectParticlesThroughALassoTool.png |thumb|center| 600px|right click to get the options to manually select particles through a lasso tool]]
+
This functionality is intended to let you get a feeling of the results you would get if you would use  <tt>dbscan</tt> in an algorithm.
[[ File:walkthroughPCACommandLine_lassoParticles.png |thumb|center| 600px|lasso particles]]
 
[[ File:walkthroughPCACommandLine_rightClickOnTheLassoToGetOptionsOnThatSubsetOfParticles.png |thumb|center| 600px|right click on the lasso to get options on that subset of particles]]
 
[[ File:walkthroughPCACommandLine_firstEigencomponentOfEachParticle.png |thumb|center| 600px|first eigencomponent of each particle]]
 
[[ File:walkthroughPCACommandLine_slidingMontageOnTheDistributionOfSeveralEigencomponent.png |thumb|center| 600px|sliding montage on the distribution of several eigencomponent]]
 
[[ File:walkthroughPCACommandLine_slidingMontageOfSetsOfEigencomponent.png |thumb|center| 600px|sliding montage of sets of eigencomponent]]
 
[[ File:walkthroughPCACommandLine_eigenvolumes.png |thumb|center| 600px|eigenvolumes ]]
 
[[ File:walkthroughPCACommandLine_eigenvolumesAfterNomrmalization.png |thumb|center| 600px|eigenvolumes after nomrmalization]]
 
[[ File:walkthroughPCACommandLine_correlationOfEigencomponensWithTilt.png |thumb|center| 600px|correlation of eigencomponens with tilt]]
 
[[ File:walkthroughPCACommandLine_rightClickOnEachPointToAccessTheParticle.png |thumb|center| 600px|right click on each point to access the particle]]
 
[[ File:walkthroughPCACommandLine_tsneClustering.png |thumb|center| 600px|tsne clustering]]
 
[[ File:walkthroughPCACommandLine_rightClickOnTheAxesForAnAutomatedClustering.png |thumb|center| 600px|right click on the axes for an automated clustering]]
 
 
[[ File:walkthroughPCACommandLine_automatedClustering.png |thumb|center| 600px|automated clustering]]
 
[[ File:walkthroughPCACommandLine_automatedClustering.png |thumb|center| 600px|automated clustering]]
[[ File:walkthroughPCACommandLine_rightClickOnTheAxesToSelectASetOfParticles.png |thumb|center| 600px|right click on the axes to select a set of particles]]
+
The results are reasonable, but inferior to what human inspection would yield.
 +
 
 +
=== Manual clustering ===
 +
We first right click on the axis to get a lasso tool.
 +
[[ File:walkthroughPCACommandLine_rightClickOnTheAxesToSelectASetOfParticles.png |thumb|center| 200px|right click on the axes to select a set of particles]]
 +
Remember that you will have a different plot (depending on the random seed used in TSNE). However, you will probably have at least a well defined cluster.
 
[[ File:walkthroughPCACommandLine_useTheLassoToolToSelectAGroup.png |thumb|center| 600px|use the lasso tool to select a group]]
 
[[ File:walkthroughPCACommandLine_useTheLassoToolToSelectAGroup.png |thumb|center| 600px|use the lasso tool to select a group]]
 
[[ File:walkthroughPCACommandLine_lassoedParticlesCanBeAveragedTogether.png |thumb|center| 600px|lassoed particles can be averaged together]]
 
[[ File:walkthroughPCACommandLine_lassoedParticlesCanBeAveragedTogether.png |thumb|center| 600px|lassoed particles can be averaged together]]
 
[[ File:walkthroughPCACommandLine_averageOfParticlesInsideLasso.png |thumb|center| 600px|average of particles inside lasso]]
 
[[ File:walkthroughPCACommandLine_averageOfParticlesInsideLasso.png |thumb|center| 600px|average of particles inside lasso]]
[[ File:walkthroughPCACommandLine_aSecondLassoCanBeCreated.png |thumb|center| 600px|a second lasso can be created]]
+
[[ File:walkthroughPCACommandLine_aSecondLassoCanBeCreated.png |thumb|center| 100px|a second lasso can be created]]
 +
 
 +
After delimiting the second subset, we can create the class averages corresponding to this manual selection.
 
[[ File:walkthroughPCACommandLine_averageAllManuallySelectedClusters.png |thumb|center| 600px|average all manually selected clusters]]
 
[[ File:walkthroughPCACommandLine_averageAllManuallySelectedClusters.png |thumb|center| 600px|average all manually selected clusters]]
 +
 +
On opening, <tt>dmapview</tt> will show all slices of a single volume:
 
[[ File:walkthroughPCACommandLine_dmapviewOnOpening.png |thumb|center| 600px|dmapview on opening]]
 
[[ File:walkthroughPCACommandLine_dmapviewOnOpening.png |thumb|center| 600px|dmapview on opening]]
[[ File:walkthroughPCACommandLine_showingSeveralVolumesInDmapview.png |thumb|center| 600px|showing several volumes in dmapview]]
+
Use the "simultaneous view" icon in the toolbar to show the two volumes together.
 +
[[ File:walkthroughPCACommandLine_showingSeveralVolumesInDmapview.png |thumb|center| 100px|showing several volumes in dmapview]]
 
[[ File:walkthroughPCACommandLine_correspondigSlicesOf.png |thumb|center| 600px|correspondig slices of  ]]
 
[[ File:walkthroughPCACommandLine_correspondigSlicesOf.png |thumb|center| 600px|correspondig slices of  ]]
 
[[ File:walkthroughPCACommandLine_youCanSelectASingleSliceForDepiction.png |thumb|center| 600px|you can select a single slice for depiction]]
 
[[ File:walkthroughPCACommandLine_youCanSelectASingleSliceForDepiction.png |thumb|center| 600px|you can select a single slice for depiction]]
Line 135: Line 219:
 
[[ File:walkthroughPCACommandLine_rightClickOneAnchorToShowAnIntensityProfile.png |thumb|center| 600px|right click one anchor to show an intensity profile]]
 
[[ File:walkthroughPCACommandLine_rightClickOneAnchorToShowAnIntensityProfile.png |thumb|center| 600px|right click one anchor to show an intensity profile]]
 
[[ File:walkthroughPCACommandLine_intensityProfile.png |thumb|center| 600px|intensity profile]]
 
[[ File:walkthroughPCACommandLine_intensityProfile.png |thumb|center| 600px|intensity profile]]
 +
 +
=Manipulation of PCA workflows=
 +
==Reading workflows==
 +
The workflow object can be recreated by reading the workflow folder.
 +
 +
<tt>wb2 = dread('classtest128');</tt>
 +
 +
Note that we can name the workflow with an arbitrary variable name.
 +
The workflow needs to be unfolded before being used:
 +
<tt>wb2.unfold();</tt>
 +
This concerns not only recomputation of selected steps, but also use of graphical exploration options.
 +
 +
==Workflow GUIs ==
 +
 +
Execution of a PCA workflow can be controlled graphically through:
 +
<tt>wb.workflowGUI()</tt>
 +
 +
Overview and edition of just the workflow parameters can be accessed through:
 +
<tt>wb.settings.editGUI</tt>

Latest revision as of 22:58, 8 February 2021

PCA computations through the command line are governed through PCA workflow objects. We describe here how to create and handle them:

Creation of a synthetic data set

dtutorial ttest128 -M 64 -N 64 -linear_tags 1 -tight 1

This generates a set of 128 particles where 64 of the, are slightly smaller than the other 64. The particle subtomogram are randomly oriented, but the alignment parameters are known.

Creation of a workflow

Input elements

The input of a PCA workflow are:

  • a set of particles (called data container in this article)
  • a table that expreses the alignment
  • a mask that indicates the area of each alignment particle that will be taken into account during the classification procedure.

All theree have been generated by the dtutorial command.

Data

dataFolder = 'ttest128/data';

Table

tableFile  = 'ttest128/real.tbl';

Mask

We create a cylindrical mask with the dimensions of the particles (40 pixels)

mask = dcylinder([20,20],40);

Syntax

We decide a name for the workflow itself, for instance

name = 'classtest128';

Now we are ready to create the workflow:

 wb = dpkpca.new(name,'t',tableFile,'d',dataFolder,'m',mask);

This creates an workflow object (arbitrarily called wb) in the workspace during the current session). It also creates a folder called classtest128.PCA where results will be stored as they are produced.

Mathematical parameters

The main parameters that can be chosen in this area are:

  • bandpass
wb.setBand([0,0.5,2]); % in Nyquist.  % Third parameter is the smoothing factor in pixels
  • symmetry
wb.setSym('c8'); 
  • binning level (to accelerate the computations);
wb.settings.general.bin.value = 0;

Computational parameters

The main burden of the PCA computation is the creation of the cross correlation matrix.

Computing device

PCA computations can be run on GPUs of on CPUs, in both cases in parallel. In each block, two subsets of (prealigned) particles are read from disk.

To use all the available cores in your local machine, write:

wb.settings.computing.cores.value = '*';

GPU computing

By default, the ccmatrix computation will run on the CPUs. To force computation in the GPUs you need to active them first:

wb.settings.computing.useGpus.value = true;

and the specify the device numbers to use:

wb.settings.computing.gpus.value = [0:7];

In this example, you would be using 8 GPUs, marked by the systems with numbers 0 to 7.

Take into account that your system needs an independent CPU for each used GPU.

Size of parallel blocks

The size of the blocks is controlled by the parameter batch. You can tune it with:

wb.setBatch(100);

meaning that each block executed in parallel will preload two sets of 100 (prealigned) particles into memory, and then proceed to cross correlate all possible pairs. Ideally, you should tune your batch parameters l to have as many blocks as processing units (CPU cores or GPUs). IThis might not be possible, as it might crowd the memory of your device. Thus, in real applications you should aim at using the batch parameter to enforce a tradeoff between performance and memory usage suited to your system.

If you get an error when running wb.setBatch(n); this is a bug. Try unfolding the project first (wb.unfold()), setting the batch number then unfolding again before running calculations.

Utilities to set the batch number

You can get some guidance for the selection of the batch number with

wb.getBlockSize()

which will return the size of each block (in Gibabytes) for the current batch parameter. This number depends of the mask, as the block contains in memory only the part of each particle that is inside the mask.

Unfolding

Before running the workflow, it needs to be unfolded. This operation checks the mutual coherence of the different elements of the workflow, and prepares the file system for runtime.

wb.unfold();

Running

In this workflow we run the steps one by one to discuss them. In real workflows, you can use the run methods to just launch all steps sequentially.

Prealigning

wb.steps.items.prealign.compute(); 

Correlation matrix

All pairs of correlations are computed in blocks, as described above

wb.steps.items.ccmatrix.compute(); 

Eigentable

The correlation matrix is diagonalised. The eigenvectors are used to expressed as the particles as combinations of weights.

wb.steps.items.eigentable.compute(); 

These weights are ordered in descending order relative to their impact on the variance of the set, ideally a particle should be represented by its few components on this basis. The weights are stored in a regular Dynamo table. First eigencomponent of a particle goes into column 41.

Eigenvolumes

The eigenvectors are expressed as three=dimensional volumes.

wb.steps.items.eigenvolumes.compute(); 

TSNE reduction

TSNE remaps the particles into 2D maps which can be visualised and operated interactively.

wb.steps.items.tsne.compute(); 

Visualization

Computed elements have been stored in the workflow folder. Some of them () can be directly access through workflow tools.

Correlation matrix

We can visualise the cross correlation matrix

m=wb.getCCMatrix();

figure;dshow(m);h=gca(); h.YDir = 'reverse';

Eigencomponents

Predefined exploring tools

You can use a general browser for Dynamo tables:

wb.exploreGUI;
general tool for exploring tables

Advanced users: This is just a wrapper to the function dpktbl.gui applied on the eigentable produced by the workflow.

For instance, you can check how two eigencomponents relate to each other:

scatter tab tuned to show co-behaviour of first two eigencomponents

Pressing the [Scatter 2D] button will create this interactive plot

scatterplot eigencomponents

Where each point represents a different particle. Right-click on it to create a "lasso", a tool to hand-draw sets of particles.

right click to get the options to manually select particles through a lasso tool
lasso particles

Right clicking on the "lassoed" particles give you the option of saving the information on the selected set of particles.

right click on the lasso to get options on that subset of particles

Custom approach

You can use your own methods to visualize the eigencomponents. They can be accessed through:

m = wb.getEigencomponents();

will produce a matrix m where each column represents an eigenvector and each row a particle. Thus, to see how a particular eigencomponent distributes among the particles, you can just write:

plot(m(:,i),'.'); 
first eigencomponent of each particle

Series of plots

To check all the eigencomponents, it is a good idea to do some scripting. The script below uses a handy Dynamo trick to create several plots in the same figure.

 gui = mbgraph.montage();
for i=1:10
    plot(m(:,i),'.','Parent',gui.gca);
    % gui.gca captures the
      gui.step;
end
gui.first();
gui.shg(); 

this will produce ten plots (as i=1:10) collected in a single GUI. Each plot is called a "frame", and you can view them sequentially or in sets, just play with the layout given by the rows and columns, and use the [Refresh] icon on the top left of the GUI.

sliding montage on the distribution of several eigencomponent

Series of histograms

sliding montage of sets of eigencomponent

Eigenvolumes

eigSet=wb.getEigenvolume(1:30);

This creates a cell array (arbitrarily called eigSet). We can visualise it through:

mbvol.groups.montage(eigSet); 
eigenvolumes

This plot is showing the true relative intensity of the eigenvolumes. In order to compare them, we can show the normalised eigenvolumes instead:

mbvol.groups.montage(dynamo_normalize_roi(eigSet)); 
eigenvolumes after nomrmalization

Correlation of tilts

It is a good idea to check if some eigenvolumes correlate strongly with the tilt.

wb.show.correlationEigenvectorTilt(1:10) 
correlation of eigencomponens with tilt

Remember that each particle is accessible through right clicking on it.

right click on each point to access the particle

In this plot, each point represents a particle in your data set. We see that in this particular experiment, eigencomponent 3 seems to have been "corrupted by the missing wedge"

TSNE reduction

We create on the fly the TSNE reduction for eigencomponents:

wb.show.tsne([1,2,4,5]);

will produce the graphics:

tsne clustering

TSNE has created a bidimensional proximity map for the 4-dimensional distribution induced by the 4 selected eigenvectors. Note that you can enter a cell array of several sets of indices. You could then navigate through different indices to check the shape of the TSNE reduction for different selections of eigenvolumes:

# wb.show.tsne({[1,2,4,5] , [1:6]});

Automated clustering

Right clicking on a plot you get an option to automatically segment it using the dbscan method

right click on the axes for an automated clustering

This functionality is intended to let you get a feeling of the results you would get if you would use dbscan in an algorithm.

automated clustering

The results are reasonable, but inferior to what human inspection would yield.

Manual clustering

We first right click on the axis to get a lasso tool.

right click on the axes to select a set of particles

Remember that you will have a different plot (depending on the random seed used in TSNE). However, you will probably have at least a well defined cluster.

use the lasso tool to select a group
lassoed particles can be averaged together
average of particles inside lasso
a second lasso can be created

After delimiting the second subset, we can create the class averages corresponding to this manual selection.

average all manually selected clusters

On opening, dmapview will show all slices of a single volume:

dmapview on opening

Use the "simultaneous view" icon in the toolbar to show the two volumes together.

showing several volumes in dmapview
correspondig slices of
you can select a single slice for depiction
use keys 1 and 2 to set anchors
right click one anchor to show an intensity profile
intensity profile

Manipulation of PCA workflows

Reading workflows

The workflow object can be recreated by reading the workflow folder.

wb2 = dread('classtest128');

Note that we can name the workflow with an arbitrary variable name. The workflow needs to be unfolded before being used:

wb2.unfold();

This concerns not only recomputation of selected steps, but also use of graphical exploration options.

Workflow GUIs

Execution of a PCA workflow can be controlled graphically through:

wb.workflowGUI()

Overview and edition of just the workflow parameters can be accessed through:

wb.settings.editGUI