Cropping unbinned particles after having worked with binned particles
Work in progress You will do a lot of operations with binned data before moving back to your full resolution data.
Contents
Basic approach
The simplest approach for handling the transitions between binned and unbinned data consists in:
- Keeping one catalogue to crop particles in full resolution.
- Creating binned versions of the tomograms for visualization purposes.
- When particles need to be binned to ease computations, a binning order is passed to the alignment or classification project.
This will work in many situations. Sometimes, however, cropping particles in full resolution can be tedious or prohibitively expensive. When operating seed oversampling for instance, most cropped particles will be excluded before the analysis comes to the point where full resolution pixels are needed.
In such cases, a different strategy can be used. One can keep a second catalogue (a "binned catalogue"), where the main entries are binned versions of the tomograms in the "unbinned catalogue". After operating with particles cropped in the binned catalogue, they can be re-extracted from the unbinned catalogue using the catalogue metadata. The key requirement is to make sure that both (binned and unbinned) versions of the same tomogram have the same index in both catalogues. This requirement should meet automatically.
Walkthrough on synthetic data
Creating a data set
This part of the walkthrough serves only to create a synthetic data set, comprising 3 full sized tomograms, each one containing already a model. Models are simple clouds of points.
Creates the catalogue in full size (let us assume 1 a.u. -arbitrary unit- per pixel)
dctutorial full -n 3 -cc withmodels
In real life, you are advised to always have a catalogue that links the "big", full resolution tomogram
Let Dynamo create a 1x binned version of each tomogram.
dynamo_catalogue_bin full_withmodels 1 -ss 256
-ss 256 defines the size of the chunks that are loaded at once from disk. This parameter does not matter for small tomograms like here. Each newly created tomogram will have a pixel size of 2 a.u.
Now, we create a new catalogue where the 1x bin versions are the main entries (meaning that the particles will be cropped from the binned version). For this, we first extract the names of the tomograms in the unbinned catalogue.
dcm -c full_withmodels -l t -ws o
This creates a cell array in the results field of the created variable o, which should have been geneerated in the workspace. We use it to create a list with the names of the binned tomograms with some easy string handling; we just add the string _CatBinned1.mrc to the name of each tomogram in the "unbinned catalogue".
binnedFiles=strrep((o.results),'.mrc','_CatBinned1.mrc');
A good policy is to check that the chars that we generated with this string manipulation do correspond to files that actually exist:
isfile(binnedFiles);
should produce an array of 3 "trues", i.e:
% ans = % % 1×3 logical array % % 1 1 1
To create the new catalogue, we need a text file which just a list of filenames:
mbio.writeas.textlines(binnedFiles,'binnedFileList.vll');
which is used to create our binned catalogue: dcm -create binned1 -fromvll binnedFileList.vll
Now we import the models residing in the "unbinned catalogue. Note that in "real life" you do not necessarily need to go through this step; our assumption was that you just want to work with binned tomograms during the pipeline and jump back to unbinned particles at the very end of the road.
To import the models in the main catalogue we need to write a small script that loops on the preexisting models in full resolution (as they lay in the full catalogue) and then creates corresponding models in the binned catalogue. Loops are not easily handled from the command line, therefore we create a text file; a script that will run this task. The command in Matlab
edit scaleModels
will open an editor where you can write:
nVolumes = 3; myCatalogue = dread('binned1.ctlg'); lines = {}; % initialize a cell array for i=1:nVolumes; lines{end+1} = ['# just a comment']; lines{end+1} = myCatalogue.volumes{i}.fullFileName(); % get the models in thid volume modelsInVolume=myCatalogue.volumes{i}.modelFiles(); for imo=1:length(modelsInVolume); lines{end+1} = ['> ',modelsInVolume{imo}]; end lines{end+1} = ' '; end mbio.writeas.textlines(lines,'binned.vll');
We run the script just by writing its name in the command line:
scaleModels;
Let us check that the new catalogue has indeed 3 models:
dcmodels binned1
Cropping particles from the binned tomogram
We need to create a [Volume list file | "vll"] text file that expresses which models come from which tomogram. The format is as follows:
tomogram 1 >model file 1 in tomogram 1 >model file 2 in tomogram 1 >tomogram 2 >model file 1 in tomogram 2 >model file 2 in tomogram 2
Again, as this operaiton involves loops, we write an adhoc script:
edit createCroppingVll(); and fill it with the text: nVolumes = 3; myCatalogue = dread('binned1.ctlg'); lines = {}; % initialize a cell array for i=1:nVolumes; lines{end+1} = ['# just a comment']; lines{end+1} = myCatalogue.volumes{i}.fullFileName(); % get the models in thid volume modelsInVolume=myCatalogue.volumes{i}.modelFiles(); for imo=1:length(modelsInVolume); lines{end+1} = ['> ',modelsInVolume{imo}]; end lines{end+1} = ' '; end mbio.writeas.textlines(lines,'binned.vll');
We then use created the script by:
createCroppingVll();
The created vll file can be used directly with dtcrop:
dtcrop binned.vll reorder targetData 24;
Here, the 'reorder' string as second argument tells dynamo to parse the contents of the vll and to use a new set of tags on the created particles.
To check the results
ddbrowse -d targetData -t targetData/crop.tbl
Cropping particles from the full resolution tomogram
Ok, now the actual "meat", i.e., the simulation of how we should proceed in real conditions. How do we crop the particles in the unbinned tomogram using as input the table collected on the binned particles? Remember, we don't want to use the metadata from the full size tomogram.
The simplest approach is to use dtcrop from the command line, passing to it a table and a text file that details the locations of the tomograms indicated in this table.
Upscale the coordinates in the table
We first upscale the -till now binned- table:
tUpscale = dynamo_table_rescale('targetData/crop.tbl','factor',2);
Tomogram map file
Then we create a text file that indicates which tomogram file is intended for which tomogram number in the table (remember that tomogram numbers are represented in the index entry in the column 20).
The text file is formatted as
% 1 tomogramFileForIndex1 % 2 tomogramFileForIndex2
We probably need to set the orders we need in a separate script:
edit createTomogramIndexMapFile
where you can write the loop on the file names.
binnedCatalogue = dread('binned1.ctlg'); linesMap = {}; foundIndices = tUpscale(:,20); for i=1:length(foundIndices) thisIndex = foundIndices(i); fileBinned = binnedCatalogue.volumes{thisIndex}.fullFileName(); fileUnbinned = strrep(fileBinned,'_CatBinned1',''); if ~isfile(fileUnbinned) error('Hm... the corresponding unbinned file cannot be found: %s',fileUnbinned); end linesMap{end+1} = [num2str(thisIndex), ' ',fileUnbinned]; %linesMap{end+1} = [num2str(i), ' ',fileUnbinned]; end % writes the actual file mbio.writeas.textlines(linesMap,'mapFullTomograms.doc');
We run the newly created script:
createTomogramIndexMapFile();
Actual cropping
Now we crop the particles with the newly created file mapFullTomograms
dtcrop('mapFullTomograms.doc',tUpscale,'targetDataFull', 48);
Now we check the results:
ddbrowse -d targetDataFull -t targetDataFull/crop.tbl