# Difference between revisions of "Principal component analysis"

Line 8: | Line 8: | ||

==Operative steps == | ==Operative steps == | ||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | == | + | PCA classifications are most easily handled through ''classification projects''. These projects can be controled through [[#GUIs for PCA classification|GUIs]] or the [[#PCA classification through the command line | command line]] |

+ | |||

+ | In whichever way you control the classification project, operatively a PCA based classification will require the completion of these steps: | ||

+ | ;Selecting the input | ||

+ | :a data folder, a table, a mask | ||

+ | ;Computing a cross-correlation matrix | ||

+ | ;Computing the eigenvalues, eigenvolumes and eigencomponents | ||

+ | :Using the eigencomponents to create a classification. | ||

+ | |||

+ | ===Input=== | ||

+ | PCA is computed on a set of aligned particles. Thus, you need a [[data folder]] and a [[table]] that describes the alignment. | ||

+ | In the most common case, you want to focus the classification in a region of the box, so that you need a [[classification mask]]. | ||

+ | |||

+ | Additionally, there are some fine tuning parameters that can be passed: particles can be symmetrized, resized or bandpassed. | ||

+ | |||

+ | ===Computation of cross-correlation matrix=== | ||

+ | All the aligned particles are compared to each other through cross correlation. This produces an NxN matrix for a set of N matrix. | ||

+ | This is typically the most time consuming part of the PCA worklow/ | ||

− | PCA | + | ===Computation of PCA=== |

+ | The cross-correlation matrix is diagonalized, producing eigenvalues. | ||

− | There are two GUIs available to cover the [[#Operative steps | pipeline]] | + | == GUIs for PCA classification == |

+ | There are two GUIs available to cover the [[#Operative steps | pipeline]] through a classification project | ||

{{t|dynamo_ccmatrix_project_manager}} | {{t|dynamo_ccmatrix_project_manager}} | ||

+ | |||

+ | == PCA classification through the command line== | ||

== Tutorials == | == Tutorials == |

## Revision as of 10:08, 19 April 2016

In general, a Principal Component Analysis (PCA) aims at analyzing a data set and discovering a set of coordinates that capture the most representative features of said data. Often the term *PCA classification* is loosely used. PCA is not a classification method: classification itself is performed on the features extracted through PCA.

In *Dynamo*, the PCA is the process of finding a reduced set of "eigenvolumes" that allow to approximatively represent each particle in our data set as a combination of these eigenvolumes. Which this representation, a generic particle can be represented by the contributions of each "eigenvolume" to the particle, i.e., by a set of "eigencomponents", normally in a number no much higher than 20.

Once the particles are represent by small sets of scalars, they can be classified with standard methods like k-means.

## Contents

## Operative steps

PCA classifications are most easily handled through *classification projects*. These projects can be controled through GUIs or the command line

In whichever way you control the classification project, operatively a PCA based classification will require the completion of these steps:

- Selecting the input
- a data folder, a table, a mask
- Computing a cross-correlation matrix
- Computing the eigenvalues, eigenvolumes and eigencomponents
- Using the eigencomponents to create a classification.

### Input

PCA is computed on a set of aligned particles. Thus, you need a data folder and a table that describes the alignment. In the most common case, you want to focus the classification in a region of the box, so that you need a classification mask.

Additionally, there are some fine tuning parameters that can be passed: particles can be symmetrized, resized or bandpassed.

### Computation of cross-correlation matrix

All the aligned particles are compared to each other through cross correlation. This produces an NxN matrix for a set of N matrix. This is typically the most time consuming part of the PCA worklow/

### Computation of PCA

The cross-correlation matrix is diagonalized, producing eigenvalues.

## GUIs for PCA classification

There are two GUIs available to cover the pipeline through a classification project
`dynamo_ccmatrix_project_manager`

## PCA classification through the command line

## Tutorials

There are some pdf tutorials available inside the *Dynamo*distribution:

- General introduction to PCA based classification.
- Command line classification.