PALM Tracer Python specifications

Description

The goal of the project is to port PALM Tracer, currently available as a Metamorph plugin.

It was developed in C++ and will be ported to a Python environment. The porting process will be “intelligent”, meaning it will not be limited to a simple copy-paste of the existing plugin.

A preliminary study will be conducted to define the elements required for this new version as well as the necessary modifications to existing functionalities.

The project will be released under the GNU General Public License 3.0 (subject to patent-related constraints).

Technical specifications

The development environment will be based on Python, using the Napari library for the user interface. During the first phase, most functionalities will rely on existing C++ DLLs already used in the original plugin. A performance study will be carried out to compare this approach with a fully Python-based process. The development may be divided into four main stages:

Provide the complete set of PALM Tracer offline processing tools for internal use. In parallel, new offline processing methods will be added.
An external deployment may be considered at the end of or during the first stage. Napari specifications (in the case of an official release) and licensing constraints will have to be addressed.
A second stage will consist of adding various online (pseudo real-time) processing methods.
A third stage will involve integrating an acquisition module in order to fully replace the Metamorph software.

Version control will be handled using Git with GitHub, and continuous integration will rely on GitHub Actions. Example files will be added to the repository in order to test the different processing steps and to provide usage examples for new users.

Requirements and features

Feature development status

Project management

Versioning (the size and modular structure of the project require proper version control)
Continuous integration (continuous integration will help ensure the overall health of the project)
Unit tests
Integration tests
Documentation generation (code and user manual)
Language files (secondary, but enabling a multilingual interface)
Allow the community to easily add new languages.
Settings file automatically loaded (or created if missing), allowing some options to be pre-filled depending on the user.
Define whether the file is saved incrementally as changes occur or only upon closing.

Core features

Data file input/output:
Streaming MetaSeries Format (.smf) (input/output)
MetaMorph Stack File (.stk) (input/output)
MetaSeries single/multi-plane TIFF (.tif) (input/output)
Portable Network Graphics (.png) (output)
Pipeline file input/output: (unless a widely adopted specific format is used, such as JSON)
Provide a conversion tool for PALM Tracer Metamorph pipelines to PALM Tracer Python.
Pre-processing file input/output: (CSV is preferred for future analyses; the current header can be stored separately)
Provide a conversion tool for PALM Tracer Metamorph localization files to PALM Tracer Python (this mainly consists in removing the header and converting to CSV)
Simulated data generation tool (Sample Maker) (for basic tests as well as more complex scenarios)
Algorithm optimization (pay attention to library updates)
Evaluate the performance of the C++ DLLs (≈2010) compared to the Python code.
Validate the Python interfacing for multithreading and GPU porting.
Bash mode for batch processing (to run a pipeline on a directory with or without subdirectories).
Ability to launch batch processing from the command line or, for beginners, via a dedicated batch-processing option in the interface.
Automatic file generation (optionally enabled) during pipeline execution (pipeline copy, text file with hardware configuration, execution time, resource usage, number of detected molecules, etc.).
Ability to cut a file (select frames from N to N). During acquisition, issues may have occurred on some frames, which should therefore be removed from the analysis.

Visualization (depending on Napari limitations)

Drag-and-drop image loading (basic but not trivial depending on the GUI)
Image display with zoom percentage.
Display of a shrinkable histogram.
Proposal to include a checkbox for automatic image shrinking (default value: 0.5% of the “white” histogram, which appears consistent with the provided example images, but remains configurable).
Display of selectable Look-Up Tables: Monochrome, Pseudo Color, Gold, Custom (are additional LUTs necessary or excessive?).
Thresholding option.
Resampling option.
Play/pause of individual frames.

Processing

Acquisition options (pixel size, exposure, etc.).
2D/3D localization
- 2D localization preview
- GPU support (on/off) why ask? Use GPU if possible.
- Automatic thresholding with spin box adjustment.
- ROI size
- Watershed (on/off)
- Gaussian fitting (define options; a dedicated documentation section may describe the implementation details).
- 3D-related options?
Tracking
- Maximum distance (units to be clearly defined: pixels, µm, or others?).
- Minimum distance (clearly define whether this refers to the number of frames; if a molecule is detected in fewer frames than the minimum, is it simply removed from tracking? In that case, the minimum length may be better applied during HR image generation to preserve as much information as possible during pre-processing).
- Drift correction
Denoising using neural networks (Abdel’s method)
ROI map (requested by Laetitia for Abdel’s test)

Filtering

High-resolution processing

Localization
Indicate whether localization files are detected, or disable the tab until directories are available.
Drift option if pre-processing has been performed.
Filtering options
Zoom level (limited to powers of two? This appears to be the case, but is hidden).
Selected channel (light intensity or other).
Tracking
Indicate whether tracking files are detected, or disable the tab until directories are available.
Drift option if pre-processing has been performed.
Filtering option
Zoom level
Selected channel (trajectory, velocity, or other).

Outputs

During pre-processing and high-resolution image generation, a subdirectory will be created following this structure:

Mon_fichier.tif
Mon_fichier_PALM_Tracer
        |-Meta_Timestamp.txt
        |-Localization_Timestamp.csv
        |-Tracking_Timestamp.csv
        |-Drift_Timestamp.csv
        |-Mon_fichier_Localization_Timestamp.png
        |-Mon_fichier_Tracking_Timestamp.png
        |-Pipeline_Timestamp.json
        |-log_Timestamp.log

Meta: File containing the headers of the previous files (Width, Height, nb_Planes, Pixel_Size (µm), Frame_Duration (s)), along with additional information such as date, hardware configuration, acquisition parameters (pixel size, exposure, etc.), and the PALM Tracer version.
Localization, Tracking, Drift: coordinate tables.
Image files: Different formats may be used if more conventional; files are saved automatically at the end of each processing step.
Pipeline: File continuously updated according to the executed processes. It contains four sections (General, Processing, Localization, Tracking, Filtering) with the latest options used and the date of the last run. It allows traceability and can be loaded as a batch pipeline.
Log: Log file (time, resources, number of molecules, etc.).

Test protocol

Simulated data: The protocol for generating a dataset will be similar to Adel’s thesis (Kechkar, 2013):

Controlled simulated image
Stripe / square / 3D sun pattern images
To be defined, but an image containing a set of known issues could also be considered.
A range of variants based on particle density and SNR.
To be defined, but possible densities include 0.1, 0.25, 0.5, 0.75, and 1 molecule/µm².
To be defined, but possible SNR values include 2, 4, 6, 8, and 10.
This results in 25 (5×5) combinations of the same simulated image.

Report: A report will be generated during the execution of a test pipeline and directly integrated into the automatically generated online documentation. The following information will be collected:

Hardware configuration
Input file
Definition of variants (density, SNR, methods)
Localization accuracy
Simplified confusion matrix: number of molecules, false positives, and false negatives.
Total and detailed execution time
Maximum memory usage
Maximum number of CPU cores used

Different processes: When different processes lead to similar results, a comparative diagram of the two methods will be generated. A dedicated tool could be developed to generate plots and reports from multiple input reports.

Documentation

The documentation will be divided into several parts, generated dynamically or statically depending on the case. The different sections will include:

Overview (README or similar)
User guide to be written from the beginning of the process (installation and usage). This includes instructions for installing Python to accommodate users with varying levels of computer expertise.
License
Code documentation (API)
Several pages explaining different complex processes.
Results of continuous integration tests. One part may be static, presenting tests on different machines and configurations, while another part will depend on the CI system, requiring CI specifications to be reported.

Glossary

Development environment: In this context, the development environment refers to the programming languages, libraries, and operating systems used.
Programming language: A programming language is a means of writing source code before it is processed by a machine.
Library: In software development, a library is a collection of already developed functions (code) that can be reused.
DLL (Dynamic Link Library): A set of functions (code) stored in machine language and preloaded at program startup. The source code is not necessarily available.
Offline computation: A computation that takes a non-negligible amount of time to complete (i.e., not instantaneous).
Online computation: A computation performed in pseudo real-time, almost instantaneously and without noticeable waiting time for the user.
Versioning: Versioning makes it possible to keep track of all changes made to files, ensuring traceability and allowing easy switching between versions.
Continuous integration (CI): Continuous Integration is an automated routine used to verify that the code remains functional. This may involve compiling the program, running it, checking code quality, executing tests, analyzing the code, managing memory, and more. It can be triggered on every update, daily, weekly, or on demand.
ROI (Region Of Interest): A selected area of interest used to retrieve a set of pixels around a given point.