Principal component analysis
In general, a Principal Component Analysis (PCA) aims at analyzing a data set and discovering a set of coordinates that capture the most representative features of said data. Often the term PCA classification is loosely used. PCA is not a classification method: classification itself is performed on the features extracted through PCA.
In Dynamo, the PCA is the process of finding a reduced set of "eigenvolumes" that allow to approximatively represent each particle in our data set as a combination of these eigenvolumes. Which this representation, a generic particle can be represented by the contributions of each "eigenvolume" to the particle, i.e., by a set of "eigencomponents", normally in a number no much higher than 20.
Once the particles are represent by small sets of scalars, they can be classified with standard methods like k-means.
In whichever way you control the classification project, operatively a PCA based classification will require the completion of these steps:
- Selecting the input
- a data folder, a table, a mask
- Computing a cross-correlation matrix
- Computing the eigenvalues, eigenvolumes and eigencomponents
- Using the eigencomponents to create a classification.
PCA is computed on a set of aligned particles. Thus, you need a data folder and a table that describes the alignment. In the most common case, you want to focus the classification in a region of the box, so that you need a classification mask.
Additionally, there are some fine tuning parameters that can be passed: particles can be symmetrized, resized or bandpassed.
Computation of cross-correlation matrix
All the aligned particles are compared to each other through cross correlation. This produces an NxN matrix for a set of N matrix. This is typically the most time consuming part of the PCA worklow/
Computation of PCA
The cross-correlation matrix is diagonalized, producing eigenvalues.
GUIs for PCA classification
There are two GUIs available to cover the pipeline through a classification project dynamo_ccmatrix_project_manager
PCA classification through the command line
There are some pdf tutorials available inside the Dynamodistribution:
- General introduction to PCA based classification.
- Command line classification.