Difference between revisions of "Principal component analysis"
Line 48: | Line 48: | ||
;{{t|dynamo_ccmatrix_analysis}} | ;{{t|dynamo_ccmatrix_analysis}} | ||
: to use a previously computed ccmatrix. Computes a PCA on it, and allows running different classification experiments on the result of the PCA. | : to use a previously computed ccmatrix. Computes a PCA on it, and allows running different classification experiments on the result of the PCA. | ||
+ | |||
+ | In the general case, you will use {{t|dynamo_ccmatrix_project_manager}} to set up a project for {{t|ccmatrix}} computation very quickly, put the project to run, go for coffee, lunch or sleep (depending on the number or particles), and then go back to office to use the result of the prokject (a {{t|ccmatrix}} to define a PCA interacting with the {{t|dynamo_ccmatrix_analysis}} GUI. | ||
+ | |||
==={{t|dynamo_ccmatrix_project_manager}}=== | ==={{t|dynamo_ccmatrix_project_manager}}=== | ||
+ | It is a rather general tool to define ccmatrix projects in different situations (from scratch, deriving them from other projects..). | ||
====Creating a project from scratch==== | ====Creating a project from scratch==== | ||
Line 56: | Line 60: | ||
Enter the name of the project in the {{t|project}} field, and fill the fields for {{t|data}}, {{t|table}} and {{t|mask}}. | Enter the name of the project in the {{t|project}} field, and fill the fields for {{t|data}}, {{t|table}} and {{t|mask}}. | ||
− | + | Optatively, additional numerical parameters can be chosen in the {{t|Actions}} panel: symmetrization, bandpassing or resizing (to increase speed). | |
====Preparing a project for execution==== | ====Preparing a project for execution==== | ||
Line 67: | Line 71: | ||
==== Computing a PCA ==== | ==== Computing a PCA ==== | ||
− | You need to check if an [[Xmatrix]] is available. If not, just ask ''Dynamo'' to compute one. | + | You need to check if an [[Xmatrix]] is available. If not, just ask ''Dynamo'' to compute one. |
− | |||
== PCA classification through the command line== | == PCA classification through the command line== |
Revision as of 09:53, 19 April 2016
In general, a Principal Component Analysis (PCA) aims at analyzing a data set and discovering a set of coordinates that capture the most representative features of said data. Often the term PCA classification is loosely used. PCA is not a classification method: classification itself is performed on the features extracted through PCA.
In Dynamo, the PCA is the process of finding a reduced set of "eigenvolumes" that allow to approximatively represent each particle in our data set as a combination of these eigenvolumes. Which this representation, a generic particle can be represented by the contributions of each "eigenvolume" to the particle, i.e., by a set of "eigencomponents", normally in a number no much higher than 20.
Once the particles are represent by small sets of scalars, they can be classified with standard methods like k-means.
Contents
Operative steps
PCA classifications are most easily handled through classification projects. These projects can be controled through GUIs or the command line
In whichever way you control the classification project, operatively a PCA based classification will require the completion of these steps:
- Selecting the input
- a data folder, a table, a mask
- Computing a cross-correlation matrix
- Computing the eigenvalues, eigenvolumes and eigencomponents
- Using the eigencomponents to create a classification.
Input
PCA is computed on a set of aligned particles. Thus, you need a data folder and a table that describes the alignment. In the most common case, you want to focus the classification in a region of the box, so that you need a classification mask.
Additionally, there are some fine tuning parameters that can be passed: particles can be symmetrized, resized or bandpassed.
Computation of cross-correlation matrix
- Main article: Cross correlation matrix
All the aligned particles are compared to each other through cross correlation. This produces an NxN matrix for a set of N matrix. This is typically the most time consuming part of the PCA workflow.
Computation of PCA
Eigenvalues
The cross-correlation matrix is diagonalized, producing a set eigenvalues which should decay to zero (the slower the decay, the more eigenvolumes will be relevant). This computation occurs very fast.
Eigenvolumes
To each eigenvalue an eigenvector is attached. Eigenvectors are called eigenvolumes in this context. Note that they will be only defined inside the classification mask attached to the classification.
Eigencomponents
- Main article: Eigentable
Also a time consuming step (although much less intensive than the computation of the ccmatrix). Each particle is compared to each eigenvolume.
GUIs for PCA classification
There are two GUIs available to cover the pipeline through a classification project:
- dynamo_ccmatrix_project_manager
- for setting up the project and computing the ccmatrix
- dynamo_ccmatrix_analysis
- to use a previously computed ccmatrix. Computes a PCA on it, and allows running different classification experiments on the result of the PCA.
In the general case, you will use dynamo_ccmatrix_project_manager to set up a project for ccmatrix computation very quickly, put the project to run, go for coffee, lunch or sleep (depending on the number or particles), and then go back to office to use the result of the prokject (a ccmatrix to define a PCA interacting with the dynamo_ccmatrix_analysis GUI.
dynamo_ccmatrix_project_manager
It is a rather general tool to define ccmatrix projects in different situations (from scratch, deriving them from other projects..).
Creating a project from scratch
We will work with the Derived ccmatrix project panel. This panel contains the settings for the project to be created in a session of dynamo_ccmatrix_project_manager (the source project panel is used when you want to apply PCA on the results of an alignment project).
Enter the name of the project in the project field, and fill the fields for data, table and mask.
Optatively, additional numerical parameters can be chosen in the Actions panel: symmetrization, bandpassing or resizing (to increase speed).
Preparing a project for execution
We are back into the Derived ccmatrix project panel. Here, you have to decide in which environment to execute the project: Matlab or standalone, and in which case how many cores to use. GPU computations are Why is GPU not available for classification? not available for classification projects. For big matrices (i.e. big data sets), you'll need to tune the batch parameter, which controls how many particles are kept in memory in any given time.
dynamo_ccmatrix_analysis
This GUI can be invoked directly from dynamo_ccmatrix_project_manager, or opened directly on an existing project.
Computing a PCA
You need to check if an Xmatrix is available. If not, just ask Dynamo to compute one.
PCA classification through the command line
This is explained in the tutorial below: XX
Tutorials
There are some pdf tutorials available inside the Dynamodistribution:
- General introduction to PCA based classification. XX
- Command line classification. XX