Migrating from anndata to anndataR
Source:vignettes/migration_to_anndataR.Rmd
migration_to_anndataR.RmdWhy migrate?
The anndata R package (this package) is superseded by anndataR, available on
Bioconductor.
| Feature |
anndata (CRAN) |
anndataR (Bioconductor) |
|---|---|---|
| Python dependency | Required | Optional (only needed for ReticulateAnnData) |
| h5ad I/O | Via Python | Native R via rhdf5 (preferred), or via Python
reticulate |
| In-memory backend | Python-backed | Native R (InMemoryAnnData) |
| HDF5-backed backend | Python-backed | Native R (HDF5AnnData) |
| Reticulate-backed backend | Always | Optional (ReticulateAnnData) |
| Seurat interop | Not supported |
as_Seurat() / as_AnnData()
|
| SingleCellExperiment interop | Not supported |
as_SingleCellExperiment() /
as_AnnData()
|
| Distribution | CRAN | Bioconductor |
The preferred backends (InMemoryAnnData,
HDF5AnnData) require no Python. For users who already have
Python anndata installed, the
ReticulateAnnData backend offers a low-friction starting
point; see below.
Installation
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("anndataR")
# Required for native h5ad reading/writing:
BiocManager::install("rhdf5")
# Optional: SingleCellExperiment conversion
BiocManager::install("SingleCellExperiment")
# Optional: Seurat conversion
install.packages("SeuratObject")Side-by-side migration guide
Creating an AnnData object
The AnnData() constructor is identical in both packages.
The only difference is that anndataR returns an
InMemoryAnnData (a pure-R object) whereas
anndata returned an AnnDataR6 wrapping a
Python object.
Reading and writing h5ad files
The function signatures are identical. anndataR reads
natively without Python.
# Reading
ad <- read_h5ad("path/to/file.h5ad") # InMemoryAnnData (default)
ad <- read_h5ad("path/to/file.h5ad", as = "HDF5AnnData") # disk-backed, low memory
sce <- read_h5ad("path/to/file.h5ad", as = "SingleCellExperiment")
obj <- read_h5ad("path/to/file.h5ad", as = "Seurat")
# Writing (works identically to anndata)
write_h5ad(ad, "path/to/output.h5ad")Slot access and subsetting
Both packages use the same $ notation and bracket
subsetting syntax.
ad$X; ad$obs; ad$var; ad$obsm; ad$varm; ad$obsp; ad$varp; ad$layers; ad$uns
# Subsetting returns an AnnDataView (native R) rather than a Python-backed view
subset <- ad[1:5, ]
subset <- ad[, c("var1", "var2")]
subset <- ad[ad$obs$group == "a", ]
concrete <- subset$as_InMemoryAnnData() # materialise to a concrete objectInteroperability (new in anndataR)
# AnnData <-> SingleCellExperiment
sce <- ad$as_SingleCellExperiment()
ad <- as_AnnData(sce)
# AnnData <-> Seurat
obj <- ad$as_Seurat()
ad <- as_AnnData(obj)Using ReticulateAnnData as a stepping stone
anndataR’s ReticulateAnnData backend wraps
a Python anndata.AnnData object via reticulate,
implementing the same AbstractAnnData interface as the
native backends. This is the closest equivalent to the old
anndata R package: you can call Python tools
(e.g. scanpy) on the object while also using all
anndataR slot accessors from R.
library(reticulate)
library(anndataR)
# Make scanpy available in the same Python environment that reticulate uses
py_require("scanpy")
sc <- import("scanpy")
# Read data with scanpy
# Note: anndataR will automatically wrap the resulting Python object in a ReticulateAnnData
url <- "https://cf.10xgenomics.com/samples/cell-exp/6.0.0/SC3_v3_NextGem_DI_CellPlex_CSP_DTC_Sorted_30K_Squamous_Cell_Carcinoma/SC3_v3_NextGem_DI_CellPlex_CSP_DTC_Sorted_30K_Squamous_Cell_Carcinoma_count_sample_feature_bc_matrix.h5"
ad <- sc$read_10x_h5("dataset.h5", backup_url = url)
# Use anndataR slot accessors from R
dim(ad)
head(ad$obs)
rowMeans(ad$X[1:10, ])
# Pass back to scanpy for preprocessing
# Note: anndataR automatically unwraps the Python object when calling scanpy functions.
sc$pp$filter_cells(ad, min_genes = 200L)
sc$pp$filter_genes(ad, min_cells = 3L)
sc$pp$normalize_per_cell(ad)
sc$pp$log1p(ad)
# Write from R when done
write_h5ad(ad, "path/to/output.h5ad")
# Migrate to a different anndataR backend
ad_mem <- ad$as_InMemoryAnnData()
# ad_hdf5 <- ad$as_HDF5AnnData(path = "path/to/disk_backed.h5ad")
# sce <- ad$as_SingleCellExperiment()
# seu <- ad$as_Seurat()Getting help with anndataR
- Documentation: https://anndataR.scverse.org
- Vignettes:
browseVignettes("anndataR") - Issue tracker: https://github.com/scverse/anndataR/issues