Evaluation of Aldolase Micrographs from EMPIAR-10379¶
This tutorial shows how to analyze 1,118 motion corrected micrographs of rabbit muscle aldolase from EMPIAR-10379.
1. Prepare Input Data¶
First, we need to create a metadata table used for prismPYP training and embedding generation containing information about microscope parameters, CTF statistics, and motion information for all micrographs in the dataset.
In general, you can build metadata using either nextPYP preprocessing outputs or cryoSPARC outputs, but for this example we will use pre-calculated results.
๐งช Download the Test Data¶
We will download the example_data.tar.gz archive from Zenodo, which contains micrograph and power-spectrum images plus all the necessary metadata:
This command extracts the data into an example_data/ folder containing:
example_data/
โโโ pkl/ # metadata from nextPYP preprocessing
โโโ webp/ # 512ร512 images (micrographs + power spectra)
โโโ J7_exposures_accepted_exported.cs
โโโ sp-preprocessing-*.micrographs
โโโ .pyp_config.toml
๐ฆ Intermediate Results¶
The Zenodo entry also contains the following files:
model_weights.tar.gz: Trained model weights for the real domain (real_model_best.pth.tar) and the Fourier domain (fft_model_best.pth.tar) inputs.fft_good_export.parquet: Data points that have high-quality features in the Fourier domain.real_good_export.parquet: Data points that have high-qualtiy features in the real domain.
By taking the intersection between fft_good_export.parquet and real_good_export.parquet, you can obtain the 862 high-quality micrographs that we used to obtain a 2.9 ร
structure of aldolase.
2. Build Metadata Table¶
Before starting, create directories to store the metadata and all generated outputs:
- Run the following command to assemble metadata from nextPYP preprocessing results:
You can omit --cryosparc-path if you do not need relative ice thickness visualization.
To build metadata directly from cryoSPARC outputs, youโll need data from the Import, Manually Curate Exposures (with outputs of Patch CTF Estimation as the inputted micrographs), and CTFFIND4 jobs.
For the test dataset (EMPIAR-10379), the deposited data already contains motion corrected micrographs, so you can skip motion correction.
-
Export the outputs of the following jobs and note their locations:
- Import Micrographs โ
J1 - Patch CTF Estimation โ
J2 - CTFFIND4 โ
J3 - **Manually Curate Exposures
โJ4` - cryoSPARC project directory โ
/cryosparc/output/dir
- Import Micrographs โ
-
Build the metadata table:
prismpyp metadata_cryosparc \ --imported-dir "/cryosparc/output/dir/J1/imported" \ --patch-ctf-file "/cryosparc/output/dir/exports/groups/J4_exposures_accepted/J4_exposures_accepted_exported.cs" \ --ctffind-dir "/cryosparc/output/dir/J3/ctffind_output" \ --ctffind-file "/cryosparc/output/dir/exports/groups J3_exposures_success/J3_exposures_success_exported.cs" \ --output-dir metadata
Depending on how many micrographs you have, this process may take several minutes to run.
3. Generated Outputs¶
The metadata-building command will produce a file named micrograph_metadata.csv, containing:
| Column | Description |
|---|---|
micrograph_name |
Name of each micrograph |
rel_ice_thickness |
Relative ice thickness (if --cryosparc-path was provided) |
ctf_fit |
CTF fit correlation coefficient |
est_resolution |
Estimated resolution in ร |
avg_motion |
Average beam-induced motion |
num_particles |
Number of picked particles |
mean_defocus |
Mean defocus (ร ) |
In addition, the following files are generated:
pixel_size.txtโ microscope pixel size for this datasetall_micrographs_list.micrographsโ list of all micrographs (without extensions)webp/โ directory of.webpimages for both micrographs and their CTFFIND4-derived power spectra