Walkthrough on PCA through the command line

From Dynamo
Revision as of 18:20, 28 March 2020 by Daniel Castaño (talk | contribs) (Created page with "PCA computations through the command line are governed through ''PCA workflow'' objects. We describe here how to create and handle them: = Creation of a synthetic data se...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

PCA computations through the command line are governed through PCA workflow objects. We describe here how to create and handle them:

Creation of a synthetic data set

dtutorial ttest128 -M 64 -N 64 -linear_tags 1 -tight 1

This generates a set of 128 particles where 64 are slightly closer than the other 64. The particle subtomogram are randomly oriented, but the alignment parameters are known.

Creation of a workflow

Input elements

The input of a PCA workflow are:

  • a set of particles (called data container in this article)
  • a table that expreses the alignment
  • a mask that indicates the area of each alignment particle that will be taken into account during the classification procedure.

Data

dataFolder = 'ttest128/data';

Table

tableFile  = 'ttest128/real.tbl';

Mask

We create a cylindrical mask with the dimensions of the particles (40 pixels) mask = dcylinder([20,20],40);

Syntax

We decide a name for the workflow itself, for instance

name = 'classtest128';

Now we are ready to create the workflow:

 wb = dpkpca.new(name,'t',tableFile,'d',dataFolder,'m',mask);

This creates an workflow object (arbitrarily called wb in the workspace during the current session). It also creates a folder called classtest128.PCA where results will be stored as they are produced.

Mathematical parameters

The main parameters that can be chosen in this area are:

  • bandpass
  • symmetry
  • binning level (to accelerate the computations)


Computational parameters

The main burden of the PCA computation is the creation of the cross correlation matrix.

Computing device

PCA computations can be run on GPUs of on CPUs, in both cases in parallel.

Size of parallel blocks