Difference between revisions of "Principal component analysis"
Line 8: | Line 8: | ||
==Operative steps == | ==Operative steps == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | + | PCA classifications are most easily handled through ''classification projects''. These projects can be controled through [[#GUIs for PCA classification|GUIs]] or the [[#PCA classification through the command line | command line]] |
+ | |||
+ | In whichever way you control the classification project, operatively a PCA based classification will require the completion of these steps: | ||
+ | ;Selecting the input | ||
+ | :a data folder, a table, a mask | ||
+ | ;Computing a cross-correlation matrix | ||
+ | ;Computing the eigenvalues, eigenvolumes and eigencomponents | ||
+ | :Using the eigencomponents to create a classification. | ||
+ | |||
+ | ===Input=== | ||
+ | PCA is computed on a set of aligned particles. Thus, you need a [[data folder]] and a [[table]] that describes the alignment. | ||
+ | In the most common case, you want to focus the classification in a region of the box, so that you need a [[classification mask]]. | ||
+ | |||
+ | Additionally, there are some fine tuning parameters that can be passed: particles can be symmetrized, resized or bandpassed. | ||
+ | |||
+ | ===Computation of cross-correlation matrix=== | ||
+ | All the aligned particles are compared to each other through cross correlation. This produces an NxN matrix for a set of N matrix. | ||
+ | This is typically the most time consuming part of the PCA worklow/ | ||
− | PCA | + | ===Computation of PCA=== |
+ | The cross-correlation matrix is diagonalized, producing eigenvalues. | ||
− | There are two GUIs available to cover the [[#Operative steps | pipeline]] | + | == GUIs for PCA classification == |
+ | There are two GUIs available to cover the [[#Operative steps | pipeline]] through a classification project | ||
{{t|dynamo_ccmatrix_project_manager}} | {{t|dynamo_ccmatrix_project_manager}} | ||
+ | |||
+ | == PCA classification through the command line== | ||
== Tutorials == | == Tutorials == |
Revision as of 09:08, 19 April 2016
In general, a Principal Component Analysis (PCA) aims at analyzing a data set and discovering a set of coordinates that capture the most representative features of said data. Often the term PCA classification is loosely used. PCA is not a classification method: classification itself is performed on the features extracted through PCA.
In Dynamo, the PCA is the process of finding a reduced set of "eigenvolumes" that allow to approximatively represent each particle in our data set as a combination of these eigenvolumes. Which this representation, a generic particle can be represented by the contributions of each "eigenvolume" to the particle, i.e., by a set of "eigencomponents", normally in a number no much higher than 20.
Once the particles are represent by small sets of scalars, they can be classified with standard methods like k-means.
Contents
Operative steps
PCA classifications are most easily handled through classification projects. These projects can be controled through GUIs or the command line
In whichever way you control the classification project, operatively a PCA based classification will require the completion of these steps:
- Selecting the input
- a data folder, a table, a mask
- Computing a cross-correlation matrix
- Computing the eigenvalues, eigenvolumes and eigencomponents
- Using the eigencomponents to create a classification.
Input
PCA is computed on a set of aligned particles. Thus, you need a data folder and a table that describes the alignment. In the most common case, you want to focus the classification in a region of the box, so that you need a classification mask.
Additionally, there are some fine tuning parameters that can be passed: particles can be symmetrized, resized or bandpassed.
Computation of cross-correlation matrix
All the aligned particles are compared to each other through cross correlation. This produces an NxN matrix for a set of N matrix. This is typically the most time consuming part of the PCA worklow/
Computation of PCA
The cross-correlation matrix is diagonalized, producing eigenvalues.
GUIs for PCA classification
There are two GUIs available to cover the pipeline through a classification project dynamo_ccmatrix_project_manager
PCA classification through the command line
Tutorials
There are some pdf tutorials available inside the Dynamodistribution:
- General introduction to PCA based classification.
- Command line classification.