Pattern mining (MiLoPYP)

MiLoPYP can be used to map the contents of a set of tomograms, with the goal of identifying targets of interest for sub-tomogram averaging as described in Huang et al. (2024).

The MiLoPYP workflow in nextPYP consists of two steps and is implemented using four blocks:

  1. Pattern minning uses the MiLoPYP (train) and MiLoPYP (eval) blocks

  2. Position refinement uses the Particle-Picking (train) and Particle-Picking (eval) blocks

Here is an example of how the workflow looks in the project view (MiLoPYP blocks are highlighted in blue):

MiLoPYP workflow

Pre-requisites

Visualization

To analyze the results of MiLoPYP interactively, you need to install and run Phoenix-Arize either remotely or in your local machine.

For a local installation on macOS, for example, follow these steps:

  1. Download and install miniconda following these instructions

  2. Activate the miniconda installation, create a new conda environment and install Phoenix:

source ${INSTALLATION_PATH}/miniconda3/bin/activate
conda create -n "phoenix" python=3.8 -y
conda activate phoenix
conda install -c conda-forge arize-phoenix==0.0.28 pandas -y

Data pre-processing

Since MiLoPYP operates on reconstructed tomograms, you first need to pre-process your tilt-series using the Pre-processing block (see examples of how to do this in the tomography and classification tutorials)

Pattern mining (training)

To train the mining/exploration module:

  1. Click on Tomograms (output of the Pre-processing block) and select MiLoPYP (train)

  2. Set the training parameters as needed

  3. (optional) If you want to train MiLoPYP on a subset of tomograms from your dataset, create a Filter in the Pre-processing block and select its name from the Filter tomograms dropdown menu at the top of the form. For datasets with many tomograms, doing this will considerably speed up training

  4. Click Save, Run, and Start Run for 1 block

  5. Once the run completes, navigate to the MiLoPYP (train) block to monitor the training metrics

Pattern mining (evaluation)

The trained model can now be evaluated to visualize the results:

  1. Click on MiLoPYP model (output of the MiLoPYP (train) block) and select MiLoPYP (eval)

  2. Select the trained model from the block upstream (extension *.pth), for example, model_last_contrastive.pth. The models are saved in sub-folders named with the date and time of training: YYYYMMDD_HHMMSS

  3. Click Save, Run, and Start Run for 1 block

  4. Once the run completes, navigate to the MiLoPYP (eval) block to visualize the embedding and the cluster labels

MiLoPYP evaluation

Target selection

There are two ways to select target positions to train the refinement module:

Option A: Manual cluster selection

This option requires specifying a comma separated list of cluster numbers as displayed in the Class Labels panel, and can be done directly within nextPYP (no external tools needed)

Option B: Interactive target selection

This option requires running the program Phoenix-Arize to interactively select locations of interest:

  • Navigate to the MiLoPYP (eval) block, go to the Mapping tab, and download the file *_milo.gzip by clicking on the gray/green download badge

  • Open a terminal in your local machine, decompress the *_milo.tbz file, and run Phoenix:

cd $WORK_DIRECTORY
tar xvfz *_milo.tbz
conda activate phoenix
curl https://raw.githubusercontent.com/nextpyp/cet_pick/main/cet_pick/phoenix_visualization.py -o phoenix_visualization.py
python phoenix_visualization.py --input interactive_info_parquet.gzip

If everything went well, you should see an output like this:

    name           coord                                         embeddings  label                             image
0  TS_43   [299, 57, 96]  [-0.006966044, 0.014659109, -0.020045772, 0.00...     29  http://localhost:7000/imgs/0.png
1  TS_43  [421, 145, 87]  [-0.024671286, 0.0323345, -0.06243068, 0.02977...     53  http://localhost:7000/imgs/1.png
2  TS_43  [57, 267, 124]  [-0.016118556, 0.021317916, -0.044905104, 0.01...     29  http://localhost:7000/imgs/2.png
3  TS_43  [288, 61, 104]  [-0.015271036, 0.024842143, -0.028918939, 0.00...     29  http://localhost:7000/imgs/3.png
4  TS_43   [278, 71, 98]  [-0.022570543, 0.034957167, -0.03830565, 0.016...     29  http://localhost:7000/imgs/4.png
🌍 To view the Phoenix app in your browser, visit http://localhost:57534/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix

On another shell (in the same directory), activate the miniconda environment and start the image server:

conda activate phoenix
cd $WORK_DIRECTORY
python -m http.server 7000

With Phoenix now running:

  • Open a browser and visit the url as displayed above, for example: http://localhost:57534/

  • Under Embeddings, click on image_embedding to visualize the results. Clicking on a point in the cloud will show the associated image in the bottom panel. You can also select a cluster of points using the left side bar (the corresponding image gallery will be shown at the bottom of the page)

  • Select the points or clusters of interest using the Select tool

  • Export your selection using the Export button and Download the results as a .parquet file

Note

By default, Phoenix’s web server runs on port 7000. If that port is not available on your computer, you can specify a custom one using phoenix_visualization.py’s --port option, for example, phoenix_visualization.py --input interactive_info_parquet.gzip --port 8000. In this case, you will need to specify the same port number when running the http.server, for example, python -m http.server 8000.

  • Go back to nextPYP and navigate to the MiLoPYP (eval) block

  • Click on the Upload button , browse to the location of the .parquet file you exported from Phoenix, and upload the file

Note

Currently, the file will be uploaded and always be renamed to particles.parquet on the remote server. If a file with that name already exists, it will be overwritten with the new file

Particle refinement (training)

Now that we have identified our targets of interest, we will use them to train the refinement module:

  • Click on MiLoPYP Particles (output of the MiLoPYP (eval) block) and select Particle-Picking (train)

  • Option A: From the Coordinates for training menu select “class labels from MiLoPYP” and specify a comma separated list of classes using the class IDs displayed in the Class Labels panel

  • Option B: From the Coordinates for training menu select “parquet file from MiLoPYP”, and specify the location of the .parquet file you uploaded in the previous step: particles.parquet

  • Set parameters for training as needed

  • Click Save, Run, and Start Run for 1 block

  • Once the run completes, navigate to the Particle-Picking (eval) block to inspect the training metrics

Particle refinement (evaluation)

The last step is to evaluate the model and obtain the final particle positions on all tomograms in the dataset:

  1. Click on Particles Model (output of the Particle-Picking (train) block) and select Particle-Picking (eval)

  2. Select the location of the Trained model (*.pth) using the file browser. The models will be saved in sub-folders named with the date and time of training: YYYYMMDD_HHMMSS

  3. Set parameters for evaluation as needed

  4. Click Save, Run, and Start Run for 1 block

  5. Once the run completes, navigate to the Particle-Picking (eval) block to inspect the particle picking results

The resulting set of particles can be used for 3D refinement using the Particle refinement block (see examples of how to do this in the tomography and classification tutorials)

Tip

  • To detect particles distributed along fibers or tubules, select Fiber mode. This will group neighboring particles, fit a smooth trajectory to them, and re-sample positions along the fitted curve