File Formats

These notes cover imaging data you’ll load into Caiman for analysis, typically 2d (occasionally 3d) data over time. There are several formats we support, some with caveats. If there are other formats you want, or caveats you find particularly burdensome, reach out and if it’s easy we may be able to adjust the software, or alternatively offer ideas on how to convert your files for use with Caiman.

If you are comfortable programming, you will find the code to load the movies in caiman/base/movies.py, spread over a few functions:

  • load()

  • load_iter()

  • get_file_size()

If you want to implement your own handlers with a fork of our codebase, or submit a PR to us, start there. Caiman converts files from these formats into its own internal formats during motion correction.

Notes for specific formats follow:

Matlab (*.mat)

Caiman may be able to handle Matlab’s mat files, although it is not a desirable format. We rely on the scipy.io.matlab functions to handle files of this type, except for certain versions of the format which are actually hdf5 files

TIFF (*.tiff, *.tif, *.btf)

Caiman can handle many TIFF files, but TIFF is a very flexible file format and we do not handle every variant. Because calcium imaging data has a time element, there are many ways that files might saved in this format (series, pages, some other things). We don’t anticipate every variant.

AVI/MKV (*.avi, *.mkv)

Caiman can handle many AVI or MKV files, both of which are common video formats often generated by microscopes. AVI is a flexible container format that can use a number of different codecs for video encoding; we depend on available libraries in the environment or on the system to handle this; this is an area where Caiman has different capabilities depending on the operating system it runs on. We rely on OpenCV to handle this format, but try to automatically fall back to PIMS using its PyAv backend if OpenCV fails. You can force use of this fallback by setting the CAIMAN_LOAD_AVI_FORCE_FALLBACK environment variable

Numpy arrays (*.npy, *.npz)

Caiman can load files in numpy’s native array file format. This will likely only have been produced by some other software (that you found or wrote) that preprocessed your original data and saved it, and that software was likely written in Python because few things not written in Python might’ve written this Python-specific format. It’s fast and easy to work with this format

HDF5/N5/zarr (*.h5, *.hdf5, *.n5, *.zarr)

Caiman can easily work with files using these container formats for grid data. In many ways they’re ideal - the format is regular, parsing them is easy, a variety of programming languages can work with them, and reads/writes are very fast. HDF5 files are really files; n5 and zarr files are directories with a hierarchy beneath, which can make moving them between hosts annoying (but their performance is even better). Because all these formats can have multiple datasets inside a single store, if there is more than one dataset present you’ll need to tell Caiman what dataset you want to work with when you first start processing the data (using the var_name_hdf5 flag to cm.load() and related functions). If there is only one dataset present in the file, Caiman will default to using that one and doesn’t need to be told.

Memmapped Numpy Array (*.mmap)

Caiman internally uses this data format, and you can use it too with Caiman, although it is really unlikely you’ll come across this format elsewhere and we don’t recommend you generate it outside of Caiman (we also may eventually change this format or remove it)

Scanbox Format (*.sbx)

This format is a common format used by the Scanbox application (written in Matlab). It is a variant of the .mat file (above) with a particular internal structure and interpretation of values. If you need to use it, Caiman may be able to load it, but we don’t recommend you use this format otherwise. The scipy.io.loadmat() functions (with a lot of supporting code) are used to support this.

SIMA (*.sima)

Support for this format was removed in an earlier version of Caiman.

Zipfile full of images (*.zip)

The default functions don’t support this. Read the docs for movies.py:from_zip_file_to_movie() to use this support

File Formats: Addenda

Conversion tools

  • ImageJ - A powerful (but high learning curve) tool that can visualise and convert a number of formats

  • TiffIt - https://github.com/EricThomson/tiffit - Can convert incompatible tiff files into compatible tiff files

  • ImageMagick - A suite of commandline tools for image manipulation

Adding new types

Qualifiers for other formats we might consider adding on request: - Grid data formats or movie formats are more likely than formats designed to store single images - Support for storing large enough datasets to be usable by caiman - Lossless data compression is preferable over lossy data compression - It helps if support for the format either is already available with a library we depend on, or is addable using a library that’s available on conda-forge with few dependencies - Easy licensing - No oddities about bit-depth - If you’re really locked into the format (because your microscope natively produces it)

We are more likely to add support for reading a variety of formats than for writing them