Difference between revisions of "Principal component analysis"
Line 1: | Line 1: | ||
[[Category:PCA]] | [[Category:PCA]] | ||
− | In general, a Principal Component Analysis aims at analyzing a data set and discovering a set of coordinates that capture the most representative features of said data. | + | [[Category:Classification]] |
+ | In general, a Principal Component Analysis (PCA) aims at analyzing a data set and discovering a set of coordinates that capture the most representative features of said data. Often the term ''PCA classification'' is used, although PCA is not a classification method: classification itself is performed on the features extracted through PCA. | ||
In ''Dynamo'', the PCA is the process of finding a reduced set of "eigenvolumes" that allow to approximatively represent each particle in our data set as a combination of these eigenvolumes. Which this representation, a generic particle can be represented by the contributions of each "eigenvolume" to the particle, i.e., by a set of "eigencomponents", normally in a number no much higher than 20. | In ''Dynamo'', the PCA is the process of finding a reduced set of "eigenvolumes" that allow to approximatively represent each particle in our data set as a combination of these eigenvolumes. Which this representation, a generic particle can be represented by the contributions of each "eigenvolume" to the particle, i.e., by a set of "eigencomponents", normally in a number no much higher than 20. |
Revision as of 08:39, 19 April 2016
In general, a Principal Component Analysis (PCA) aims at analyzing a data set and discovering a set of coordinates that capture the most representative features of said data. Often the term PCA classification is used, although PCA is not a classification method: classification itself is performed on the features extracted through PCA.
In Dynamo, the PCA is the process of finding a reduced set of "eigenvolumes" that allow to approximatively represent each particle in our data set as a combination of these eigenvolumes. Which this representation, a generic particle can be represented by the contributions of each "eigenvolume" to the particle, i.e., by a set of "eigencomponents", normally in a number no much higher than 20.
Once the particles are represent by small sets of scalars, they can be classified with standard methods like k-means.
Operatively, this entails:
- Selecting the input
- a data folder
- a table
- a mask
- Computing a cross-correlation matrix
- this is typically the most consuming part, as it involves to compare all particles in the data folder against all particles.
- Computing the eigenvalues, eigenvolumes and eigencomponents
- Using the eigencomponents to create a classification.