vignettes/introduction.Rmd
introduction.Rmd
SPIAT (Spatial Image Analysis of Tissues) is an R package with a suite of data processing, quality control, visualization, data handling and data analysis tools (Yang et al. 2020). SPIAT is directly compatible with Opal multiplex immunohistochemistry images analysed through the HALO and InForm analysis software, but its intuitive implementation allows use with a diversity of platforms.
The Opal multiplex immunohistochemistry staining protocol enables 6-8 tissue markers to be used simultaneously on a single slide. It is suitable for use with formalin fixed paraffin embedded (FFPE) tissue sections, so it is also of relevance for clinical practice. The fluorescence of each marker is measured through imaging, and each cell is assigned an X,Y coordinate of its location. In effect this provides single-cell resolution. The fluorescence intensity of individual markers are then combined to identify the phenotype of the cells in the tissue.
SPIAT includes novel algorithms for the identification of cell clusters, cell margins and cell gradients, the calculation of neighbourhood proportions, and algorithms for the prediction of cell phenotypes in tissue images. SPIAT also includes speedy implementations of the calculation of cell distances and detection of cell communities. An overview of the functions available is shown in the figure below.
First we load the SPIAT library.
library(SPIAT)
We can use format_image_to_sce
to read in and format raw data for SPIAT.format_image_to_sce
creates a SingleCellExperiment object. It requires at least 4 arguments:
format
: “INFORM,” “HALO,” “Visium,” “CODEX” “cellprolifer, or”general"image
: path to the raw InForm or HALO image data filemarkers
: names of markers used in the OPAL staining. These must be in the same order as the marker columns in the input file, and for InForm must match the marker name used in the input file. One of the markers must be DAPI.locations
: locations of the markers, either Nucleus, Cytoplasm or Membrane. These must be in the order of the markers. The locations are used to auto-detect the intensity (and dye) columns. Alternatively, the column names can be specified, see below for more information.The simplest way to read in both InForm and HALO files, is to specify the locations of the markers as shown below. This will automatically detect the appropriate intensity (and dye) columns to use. Here we use a tiny InForm file that’s part of the SPIAT package for raw_inform_data
but you would replace system.file
with /path/to/your/input/file
. The tiny file is just for demonstrating importing a raw data file, later in the Example data section we load a larger preformatted dataset. For information on the InForm and HALO formats see below. For “Visium,” “CODEX” or “cellprolifer see the help (?format_image_to_sce
). Format”general" allows you to input a matrix of intensities (intensity_matrix
), and a vector of phenotypes
, cell phenotypes in the same order in which they appear in intensity_matrix. They should be of the form “CD3,CD8,” as opposed to cell names (e.g. “cytotoxic T cells”), as it does some matching with the marker names. If no phenotypes are available, then a vector of NA can be used as input. For general format, you also need to provide a vector with the X coordinates of the cells and one with the Y (coord_x
and coord_y
). The cells must be in the same order as in the intensity_matrix. If you have Xmin and Xmax columns (and Ymin, Ymax) you can calculate the average and input that.
raw_inform_data <- system.file("extdata", "tiny_inform.txt.gz", package = "SPIAT")
markers <- c("DAPI", "CD3", "PDL-1", "CD4", "CD8", "AMACR")
locations <- c("Nucleus", "Cytoplasm", "Membrane", "Cytoplasm", "Cytoplasm", "Cytoplasm")
formatted_image <- format_image_to_sce(
format="INFORM",
path=raw_inform_data,
markers=markers,
locations=locations)
If for some reason you don’t want to specify the locations
, you can specify the intensity columns with intensity_columns_interest
as shown below. This will give the same result as specifying the locations above.
raw_inform_data <- system.file("extdata", "tiny_inform.txt.gz", package = "SPIAT")
markers <- c("DAPI", "CD3", "PDL-1", "CD4", "CD8", "AMACR")
intensity_columns_interest <- c(
"Nucleus DAPI (DAPI) Mean (Normalized Counts, Total Weighting)",
"Cytoplasm CD3 (Opal 520) Mean (Normalized Counts, Total Weighting)",
"Membrane PDL-1 (Opal 540) Mean (Normalized Counts, Total Weighting)",
"Cytoplasm CD4 (Opal 620) Mean (Normalized Counts, Total Weighting)",
"Cytoplasm CD8 (Opal 650) Mean (Normalized Counts, Total Weighting)",
"Cytoplasm AMACR (Opal 690) Mean (Normalized Counts, Total Weighting)"
)
formatted_image <- format_image_to_sce(
format="INFORM",
path=raw_inform_data,
markers=markers,
intensity_columns_interest=intensity_columns_interest)
format_image_to_sce
uses the Cell X Position and Cell Y Position columns in the InForm raw data.
format_image_to_sce
uses the Phenotype column in the InForm raw data. The phenotype of a cell can be a single marker, for example, “CD3” (T cell), or a combination of markers, such as “CD3,CD4” (helper T cell). “OTHER” means a cell was positive for DAPI but no other marker. The phenotypes must be based on the markers (e.g. CD3,CD4), rather than names of cells (e.g. cytotoxic T cells). The names of the cells can be added later using the define_celltypes
function.
For HALO you can also just specify the locations
to auto-detect the columns as shown above. But if you want to specify the columns instead, you can do so with intensity_columns_interest
, as shown in the example below. Note that then you also must specify the dye_columns_interest
.
format_image_to_sce
requires the HALO dye columns to assign the phenotype. If not using locations
to auto-detect, these can be specified with dye_columns_interest
. These columns have the marker status (1 or 0 for whether the cell is positive or negative for the marker) and these are used to create the phenotype column. For example, if HALO has assigned a cell a marker status of 1 for CD3 and 1 for CD4, SPIAT will give it the Phenotype “CD3,CD4.” Cells that have a marker status of 1 for DAPI and no other marker, are given the phenotype “OTHER.”
raw_halo_data <- "path/to/halo/file"
markers <- c("DAPI", "CD3", "PDL-1", "CD4", "CD8", "AMACR")
intensity_columns_interest <- c("Dye 1 Nucleus Intensity",
"Dye 2 Cytoplasm Intensity",
"Dye 3 Membrane Intensity",
"Dye 4 Cytoplasm Intensity",
"Dye 5 Cytoplasm Intensity",
"Dye 6 Cytoplasm Intensity")
dye_columns_interest <- c("Dye 1 Positive Nucleus",
"Dye 2 Positive Cytoplasm",
"Dye 3 Positive Membrane",
"Dye 4 Positive Cytoplasm",
"Dye 5 Positive Cytoplasm",
"Dye 6 Positive Cytoplasm")
formatted_image <- format_image_to_sce(
format="HALO",
path=raw_halo_data,
markers=markers,
intensity_columns_interest=intensity_columns_interest,
dye_columns_interest=dye_columns_interest
)
format_image_to_sce
takes the average of the HALO X min and X max columns for each cell to create the Cell.X.Position column. It takes the average of the Y min and Y max to create the Cell.Y.Position column.
In the case of large images, or images where there are two independent tissue sections, it is recommended to split images into sections defined by the user. This can be performed with image_splitter
after format_image_to_sce
. It can also plot the different combinations of markers within the image segments.
split_image <- image_splitter(formatted_image, number_of_splits=3, plot = FALSE)
In this vignette we will use an InForm data file that’s already been formatted for SPIAT with format_image_to_sce
, which we can load with data
.
data("formatted_image")
This is SingleCellExperiment format.
class(formatted_image)
## [1] "SingleCellExperiment"
## attr(,"package")
## [1] "SingleCellExperiment"
This example data has 6 markers and 8419 cells.
dim(formatted_image)
## [1] 6 8419
assay
stores the intensity level of every marker (rows) for every cell (columns).
# take a look at first 5 columns
assay(formatted_image)[, 1:5]
## Cell_1 Cell_2 Cell_3 Cell_4 Cell_5
## DAPI 14.557 17.588 21.262 18.951 18.631
## CD3 0.169 0.229 0.206 0.226 0.212
## PDL-1 0.274 0.117 0.103 0.126 0.091
## CD4 1.655 1.188 0.924 1.367 1.266
## CD8 0.096 0.087 0.053 0.037 0.046
## AMACR 0.054 2.074 1.521 2.462 1.968
colData
stores the phenotype, x and y coordinates, and the cell properties.
# take a look at first 5 rows
colData(formatted_image)[1:5, ]
## DataFrame with 5 rows and 8 columns
## Phenotype Cell.X.Position Cell.Y.Position Cell.Area Nucleus.Area
## <character> <numeric> <numeric> <numeric> <numeric>
## Cell_1 OTHER 82 30 477 160
## Cell_2 AMACR 171 22 464 177
## Cell_3 AMACR 184 38 553 212
## Cell_4 AMACR 201 52 462 239
## Cell_5 AMACR 219 63 876 451
## Nucleus.Compactness Nucleus.Axis.Ratio Cell.Axis.Ratio
## <numeric> <numeric> <numeric>
## Cell_1 0.52 2.05 1.48
## Cell_2 0.54 1.84 1.00
## Cell_3 0.51 1.92 1.21
## Cell_4 0.53 1.47 1.09
## Cell_5 0.53 1.19 1.34
We can check what phenotypes are present with print_column
, which will print any specified column in the data.
print_column(formatted_image, column = "Phenotype")
## [1] "OTHER" "AMACR" "CD3,CD4" "CD3,CD8" "CD3" "PDL-1"
The phenotypes in this example data can be interpreted as follows:
AMACR identifies which cells in the image are prostate cancer cells. CD3,CD4, CD3,CD8, CD3 which types of T cells (immune cells) are present, and PDL-1 is used to help determine if the cancer may benefit from immunotherapy treatment.
SPIAT can predict cell phenotypes using the marker intensity levels with predict_phenotypes
. This can be used to check the phenotypes that have been assigned by InForm and HALO. It can also potentially be used to automate the manual phenotypying performed with InForm/HALO. predict_phenotypes
produces a density plot that shows the cutoff for calling a cell positive for a marker. It also prints to the console the number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). It returns a table containing the phenotypes predicted by SPIAT and the actual phenotypes from InForm/HALO. This can help show if we need to go back to Inform or HALO to refine the phenotyping. If you don’t have a tumour_marker
you can use a marker that is present in many cells, that has at least one peak more than the main marker peak, a background peak, required by the algorithm. Of note, this algorithm does not take into account cell shape or size, so if these are required for phenotyping, manual phenotyping with HALO or InForm is recommended.
predicted_image <- predict_phenotypes(formatted_image,
thresholds = NULL,
tumour_marker = "AMACR",
baseline_markers = c("CD3", "CD4", "CD8"),
reference_phenotypes = FALSE)
## [1] "DAPI threshold intensity: 31.0120578435792"
## [1] "CD3 threshold intensity: 0.585067686263291"
## Warning: Removed 6 rows containing non-finite values (stat_density).
## [1] "PDL-1 threshold intensity: 0.41961026013577"
## [1] "CD4 threshold intensity: 2.94518565168331"
## Warning: Removed 6 rows containing non-finite values (stat_density).
## [1] "CD8 threshold intensity: 0.490401084591211"
## Warning: Removed 6 rows containing non-finite values (stat_density).
## [1] "AMACR threshold intensity: 0.560351632207539"
## Warning: Removed 6 rows containing non-finite values (stat_density).
We can use marker_prediction_plot
to plot the predicted cell phenotypes and the ones obtained using HALO or InForm, for comparison.
#marker_prediction_plot(predicted_image, marker="CD3")
The plot shows CD3+ cells in the tissue. On the left are the CD3+ cells defined by InForm and on the right are the CD3+ cells predicted using SPIAT.
Specifying each Phenotype can be tedious if the Phenotype consists of multiple markers. To make it easier, SPIAT define_celltypes
can be used to add a column indicating the cell type. By default the column is called Cell.Type. The difference between Phenotype and Cell.Type is that phenotype represents all the markers present on the cell but Cell.Type is the identity of the cell.
formatted_image <- define_celltypes(formatted_image, phenotypes = c("AMACR", "CD3,CD4", "CD3,CD8", "CD3", "PDL-1"), names = c("Tumour", "T_helper", "T_cyto", "T_other", "PDL1"))
## [1] "AMACR" "CD3,CD4" "CD3,CD8" "CD3" "PDL-1"
## [1] "Tumour" "T_helper" "T_cyto" "T_other" "PDL1"
Incorrect or unreliable phenotyping of cells may also be identified through comparing the marker level for cells labelled positive and negative for a marker. Cells positive for a marker should have high levels of the marker. An unclear separation of marker intensities between positive and negative cells would suggest phenotypes have not been accurately assigned. We can use marker_intensity_boxplot
to produce a boxplot for cells phenotyped as being positive or negative for a marker.
marker_intensity_boxplot(formatted_image, "CD3")
Note that some negative cells will have high marker intensity, and vice versa. This is because HALO and InForm use machine learning to determine positive cells, and not a strict threshold, and also take into account properties such as cell shape, nucleus size etc. .
Uneven marker staining or high background intensity can be identified with plot_cell_marker_levels
. This produces a scatter plot of the intensity of a marker in each cell. Cells that were not phenotyped as being positive for the particular marker are excluded.
plot_cell_marker_levels(formatted_image, "CD3")
For large images, there is also the option of ‘blurring’ the image, where the image is split into multiple small areas, and marker intensities are averaged within each. The image is blurred based on the num_splits
parameter.
plot_marker_level_heatmap(formatted_image, num_splits = 100, "AMACR")
## Identifying incorrect phenotypes
We may see incorrect phenotypes present in the input data. For example, cells might be incorrectly assigned a phenotype such as “AMACR,CD3,” both a tumour cell (AMACR) and an immune cell phenotype (CD3), which is biologically implausible. Incorrect cell phenotypes may be present due to low cell segmentation quality, antibody ‘bleeding’ from one cell to another or inadequate marker thresholding.
We can use marker_permutation
to help identify if incorrect cell phenotypes are present. This permutes the marker labels of cells to create a null distribution, and then calculates the empirical p-value of whether an image is enriched or depleted in a particular combination of markers. A low P value for Depletion.p suggests the phenotype is unreliable. However, it is not absolute and it is recommended to review the results.
# permute marker labels of cells
sig <- marker_permutation(formatted_image, num_iter = 100)
# sort by Observed_cell_number
sig.sorted <- sig[order(-sig$Observed_cell_number), ]
# take a look
head(sig.sorted)
## Observed_cell_number Percentage_of_iterations_where_present
## AMACR 4446 100
## CD3,CD4 512 100
## CD3,CD8 138 99
## CD3 19 100
## PDL-1 4 89
## CD4 0 100
## Average_bootstrap_cell_number Enrichment.p Depletion.p
## AMACR 3777.30 0.01 1.00
## CD3,CD4 19.39 0.01 1.00
## CD3,CD8 4.71 0.01 1.00
## CD3 292.05 1.00 0.01
## PDL-1 1.60 0.01 1.00
## CD4 215.85 1.00 0.01
If you identify incorrect phenotypes or have any you want to exclude you can do that with select_phenotypes
.
data_subset <- select_phenotypes(formatted_image, keep=TRUE,
phenotypes = c("AMACR",
"CD3,CD8",
"PDL-1"))
print_column(data_subset, column = "Phenotype")
## [1] "AMACR" "CD3,CD8" "PDL-1"
In this vignette we will work with all the original phenotypes present in formatted_image
.
We can obtain the number and proportion of each cell type with calculate_cell_proportions
. We can use reference_celltypes
to specify cell types to use as the reference. For example, “Total” will calculate the proportion of each cell type against all cells. We can exclude any cell types that are not of interest e.g. “Undefined” with celltypes_to_exclude
.
p_cells <- calculate_cell_proportions(formatted_image, reference_celltypes=c("Total"), column="Cell.Type")
p_cells
## Cell_type Number_of_celltype Proportion Percentage Proportion_name Reference
## 6 Undefined 3299 Inf Inf Undefined/Custom Total
## 5 Tumour 4446 Inf Inf Tumour/Custom Total
## 4 T_other 19 Inf Inf T_other/Custom Total
## 3 T_helper 513 Inf Inf T_helper/Custom Total
## 2 T_cyto 138 Inf Inf T_cyto/Custom Total
## 1 PDL1 4 Inf Inf PDL1/Custom Total
To visualize the cell type proportions as barplots we can use plot_cell_percentages
.
plot_cell_percentages(p_cells)
In addition to the marker level tissue plots for QC, SPIAT has other methods for visualizing markers and phenotypes in tissues.
We can visualize a selected marker in 3D with marker_surface_plot
. The image is blurred based on the num_splits
parameter.
marker_surface_plot(formatted_image, num_splits=15, marker="CD3")
To visualize multiple markers in 3D in a single plot we can use marker_surface_plot_stack
. This shows normalized intensity level of specified markers and enables the identification of co-occurring and mutually exclusive markers.
marker_surface_plot_stack(formatted_image, num_splits=15, markers=c("AMACR", "CD3"))
The stacked surface plots of the AMACR (tumour cell) and CD3 (T cell) markers in this prostate tissue shows how AMACR and CD3 are mutually exclusive as the peaks and valleys are opposite.
We can see the location of all cell types (or any column in the data) in the tissue with plot_cell_basic
. Each dot in the plot corresponds to a cell and cells are coloured by cell type. Any cell types present in the data but not in the cell types of interest will be put in the category “OTHER” and coloured lightgrey.
my_colors <- c("darkgrey", "skyBlue", "darkcyan", "white", "orange")
plot_cell_basic(formatted_image, c("Tumour", "T_helper", "T_cyto", "T_other", "PDL1"), my_colors, "Cell.Type")
We can calculate the distances between two cell types (cell type A and cell type B) by identifying the closest cell of cell type B to each of the cells of cell type A with calculate_all_distances_between_cell_types
.
distances <- calculate_all_distances_between_cell_types(formatted_image, cell_types_of_interest = c("T_helper", "T_cyto"), column="Cell.Type")
This creates a distribution of minimum distances between phenotypes A and B, which can be visualized as a violin plot with plot_cell_distances_violin
. Visualization of this distribution often reveals whether pairs of cells are evenly spaced across the image, or whether there are clusters of pairs of phenotypes.
plot_cell_distances_violin(distances)
From this plot we can see that T_helper cells seem to be more closely interacting with each other, as there are shorter distances between those cells, than between T_cyto cells or between T_helper and T_cyto cells.
We can also calculate summary statistics for the distances between each combination of cell types, the mean, median and standard deviation, with calculate_summary_distances_between_cell_types
.
summary_distances <- calculate_summary_distances_between_cell_types(formatted_image, column = "Cell.Type")
## [1] "All markers are used in pair-wise distance calculation: "
## [1] "Undefined" "Tumour" "T_helper" "T_cyto" "T_other" "PDL1"
These cell distances can then be visualized as a heatmap with plot_distance_heatmap
.
plot_distance_heatmap(summary_distances)
This plot shows that Tumour cells are interacting most closely with T_helper, followed by T_cyto, while PDL1 cells are more distant.
With SPIAT we can quantify how much two cell types are mixing, and thus potentially interacting, through a variety of ways.
To determine whether two cell types are interacting with another, or if there is repulsion between them, SPIAT uses the concept of gradients. The assumption is that, if marker A intensity is high close to cells of marker B, and this value decreases as the distance from B increases, it suggests the cells are close and interacting. Conversely, if we see that marker A intensity is low when close to cells of marker B but increases with distance, it suggests there is repulsion between the two cell types.
plot_average_intensity
calculates the average intensity of a target marker for a number of user-supplied radii values. It plots the intensity level at each specified radius as a line graph. The radius unit is pixels.
plot_average_intensity(formatted_image, reference_marker="CD8", target_marker="CD4", c(30, 35, 40, 45, 50, 75, 100))
This plot shows that high levels of CD8 were observed in cells near CD4 cells and these levels decreased at larger radii. This suggests CD4 and CD8 cells may be closely interacting in this tissue.
We can use average_marker_intensity_within_radius
to calculate the average intensity of the target_marker within a radius from the cells positive for the reference marker. Note that it pools all cells with the target marker that are within the specific radius of any reference cell. Results represent the average intensities within a radius, but do not correspond to metrics for each cell. This function can also be used to help identify radii values for plot_average_intensity
.
average_marker_intensity_within_radius(formatted_image,
reference_marker ="CD8",
target_marker = "CD4",
radius=30)
## [1] 10.13677
We can calculate the average percentage of cells of one cell type (target) within a radius of another cell type (reference).
average_percentage_of_cells_within_radius(formatted_image, reference_phenotypes = "T_helper", target_phenotypes = "T_cyto", radius=100,
column="Cell.Type")
## [1] 5.613999
This score was originally defined as the number of immune-tumor interactions divided by the number of immune-immune interactions (Keren et al. 2018). SPIAT generalizes this method to allow calculation of any two cell types. mixing_score_summary
returns the mixing score between a reference cell type and a target cell type. This mixing score is defined as the number of target-reference interactions/number of reference-reference interactions within a specified radius. The higher the score the greater the mixing of the two cell types. The normalised score is normalised for the number of target and reference cells in the image.
mixing_score_summary(formatted_image, reference_marker = "T_helper", target_marker = "T_cyto", radius=100, column="Cell.Type")
## Reference Target Number_of_reference_cells Number_of_target_cells
## 2 T_helper T_cyto 513 138
## Reference_target_interaction Reference_reference_interaction Mixing_score
## 2 1646 8082 0.2036625
## Normalised_mixing_score
## 2 0.7570931
K-cross function is similar to mixing score, but calculates the score across all possible distances. It also compares the behaviour of the image with an image that has randomly distributed points of the same number. There are 4 patterns that can be distinguished from K-cross function, as illustrated in the plots below (generated here).
Here, the black line represents our image, the red line represents a randomly distibuted point pattern.
We can calculate the cross K-function using SPIAT.
df_cross <- bivariate_dependence(formatted_image, method = "Kcross", phenotypes = c("T_helper","T_cyto"), column ="Cell.Type")
The plot tells us that the T_cyto and T_helper form one mixed cluster, as it looks like the 3rd plot above.
We can calculate the area under the curve (AUC) of the cross K-function. This tells us the two types of cells are:
AUC_of_cross_function(df_cross)
## [1] 146677210
The AUC score is very positive so this tells us that the two types of cells form mixed clusters, similar to the plot.
If the image contains tumour, it may be a clear tumour structure or unclear (dispersed). The function average_nearest_neighbor_index
can inform us whether there is a clear tumour structure (clustered) or unclear (dispersed), along with a P value for the estimate.
average_nearest_neighbor_index(formatted_image, reference_cell="Tumour", column="Cell.Type")
## $pattern
## [1] "Clustered"
##
## $`p-value`
## [1] 1.132853e-19
When there is a clear tumour region in the image, we define four tumour structures: “Inside,” “Outside,” “Infiltration front” and “Invasive front,” as shown in the figure below. Why are we interested in defining such structure? For some types of cancer, there can be small clusters of tumour cells, like globes. We may be interested in not only whether immune cells infiltrate into the whole tumour region but also whether the immune cells infiltrate into the small globes. The identification of the tumour structure can be achieved by calculating the distance from each cell to the tumour boundary, after the identification of bordering cells.
We can identify borders with identify_bordering_cells
. This uses the alpha hull method (Pateiro-Lopez, Rodriguez-Casal, and. 2019) via the hull2spatial package. Here we use tumour cells (AMACR) as the reference to identify the bordering cells but any cell type can be used.
formatted_border <- identify_bordering_cells(formatted_image, reference_cell = "Tumour", column="Cell.Type")
## [1] "The alpha of Polygon is: 85.9125"
We then calculate the distance of each cell to the borders.
formatted_distance <- calculate_min_distance_to_borders(formatted_border)
We define the area within a 5-cell distance to the boundary as “infiltration front” (inside) and “invasive front” (outside). The areas located more than a 5 cell-distance away from the boundary are defined as “in” (inside) and “out” (outside).
names_of_immune_cells <- c("T_helper", "T_cyto")
formatted_structure <- define_structure(formatted_distance, names_of_immune_cells = names_of_immune_cells,
n_margin_layers = 5)
## [1] "How many layers of cells in the external/internal margin: 5"
## [1] "The width of internal/external margin: 79.6210533312122"
categories <- print_column(formatted_structure, "Structure")
## [1] "External.margin" "Border" "Internal.margin"
## [4] "Stromal.immune" "Outside" "External.margin.immune"
## [7] "Inside" "Internal.margin.immune" "Infiltrated.immune"
We can plot and colour these structure categories.
my_colors <- c("yellow", "darkblue", "darkgreen", "black", "purple", "orange", "darkgreen", "pink", "green")
plot_cell_basic(formatted_structure, categories,
my_colors,
column = "Structure")
We can calculate the proportion of immune cells in the tumour structure.
immune_proportions <- calculate_proportions_of_immune_cells_in_structure(formatted_structure, immune_cells = names_of_immune_cells, column="Cell.Type")
immune_proportions
## Cell.Type2 Relative_to P.Infiltrated.Immune P.Internal.Margin.Immune
## 1 T_helper To_all_cells 0.010325346 0.016913319
## 2 T_cyto To_all_cells 0.003117086 0.007188161
## 3 T_helper To_immune_cells 0.768115942 0.701754386
## 4 T_cyto To_immune_cells 0.231884058 0.298245614
## P.External.Margin.Immune P.Stromal.Immune
## 1 0.12205567 0.10551724
## 2 0.04068522 0.02310345
## 3 0.75000000 0.82037534
## 4 0.25000000 0.17962466
We can calculate summaries of the distances for immune cells in the tumour structure.
immune_distances <- calculate_summary_distances_of_immune_cells_to_borders(formatted_structure, names_of_immune_cells, "Cell.Type")
immune_distances
## Cell.Type Area Min_d Max_d Mean_d Median_d St.dev_d
## 1 All_immune Tumor_area 13.00000 336.7432 98.97223 82.82800 69.53922
## 2 All_immune Stroma 14.31782 558.1684 154.49817 131.04579 102.04069
## 3 T_helper Tumor_area 13.00000 336.7432 103.04387 84.20214 72.87082
## 4 T_helper Stroma 15.13275 558.1684 156.44987 135.19738 102.17890
## 5 T_cyto Tumor_area 17.69181 288.8910 87.49760 73.23933 58.64129
## 6 T_cyto Stroma 14.31782 468.3289 146.69138 121.16518 101.59620
We can determine if there are aggregates (or a lack of aggregates) of a particular phenotype or combination of cell types. SPIAT can be used to identify two types of cell aggregates, clusters and communities.
Clusters are cell aggregates composed of non-tumour cells, often by immune cells. For the detection of clusters, identify_cell_clusters
only considers cell types of interest defined by the user. Euclidean distances between cells are calculated, and pairs of cells with a distance less than a threshold are considered to be ‘interacting,’ with the rest being ‘non-interacting.’ Hierarchical clustering is then used to separate the clusters. We need to specify a radius. The radius corresponds to how lenient we are in defining clusters. A bigger radius would mean we are more likely to merge individual clusters. The radius is chosen based on what looks ‘reasonable.’ Cells not assigned to clusters (“free” cells) are assigned to Cluster_NA in the output table.
clusters <- identify_cell_clusters(formatted_image, cell_types_of_interest = c("T_helper", "T_cyto"), radius = 100, column = "Cell.Type")
This plot shows clusters of CD3,CD4 and CD3,CD8 cells. Each number and colour corresponds to a distinct cluster. Black cells correspond to ‘free,’ un-clustered cells.
Users are recommended to test out different thresholds and then visualize the clustering results. But SPIAT also has average_minimum_distance
, which calculates the average minimum distance between all cells in an image, and can be used as a starting point.
average_minimum_distance(formatted_image)
## [1] 15.92421
Communities correspond to micro-niches or micro-ecosystems of cells that are located close to each other (Jackson et al. 2020). The main distinction between clusters and communities is that the algorithm for the detection of communities used by identify_cell_communities
does not take into account cell phenotype. Therefore, communities often consist of a combination of different cell types, including tumour cells. dbscan is used as the clustering algorithm to detect communities (Hahsler, Piekenbrock, and Doran 2019). Cells without a phenotype are excluded.
communities <- identify_cell_communities(formatted_image, radius=100)
This plot shows communities of cells in the tissue. Each number and colour corresponds to a distinct community.
We can visualize the cell composition of clusters or communities. This allows us to determine whether there are regions where cells of different types are interacting, or whether there is little mixing between celltypes. To do this, we can use composition_of_clusters_and_communities
to obtain the percentages of cells with a specific marker within each cluster and the number of cells in the cluster.
clusters_vis <- composition_of_clusters_and_communities(clusters, type_of_aggregate = "Cluster", column = "Cell.Type")
clusters_vis <- clusters_vis[clusters_vis$Total_number_of_cells >=5,]
Then we can use plot_composition_heatmap
to produce a heatmap showing the marker percentages within each cluster.
plot_composition_heatmap(clusters_vis, type_of_aggregate = "Cluster", column="Cell.Type")
This plot shows that most clusters are composed of a mixture of T_helper and T_cyto cells and some are more dominated by T_helper than others.
We can visualize the composition of the communities in a similiar way.
communities_vis <- composition_of_clusters_and_communities(communities, type_of_aggregate="Community", column = "Cell.Type")
plot_composition_heatmap(communities_vis, type_of_aggregate = "Community", column="Cell.Type")
This plot shows that several communities detected have a combination of tumour (AMACR) and immune cells while some are dominated by tumour or immune cells.
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] SPIAT_0.4 SingleCellExperiment_1.14.1
## [3] SummarizedExperiment_1.22.0 Biobase_2.52.0
## [5] GenomicRanges_1.44.0 GenomeInfoDb_1.28.4
## [7] IRanges_2.26.0 S4Vectors_0.30.1
## [9] BiocGenerics_0.38.0 MatrixGenerics_1.4.3
## [11] matrixStats_0.61.0 BiocStyle_2.20.2
##
## loaded via a namespace (and not attached):
## [1] scattermore_0.7 R.methodsS3_1.8.1 SeuratObject_4.0.2
## [4] ragg_1.1.3 tidyr_1.1.3 ggplot2_3.3.5
## [7] bit64_4.0.5 knitr_1.34 irlba_2.3.3
## [10] DelayedArray_0.18.0 R.utils_2.11.0 data.table_1.14.0
## [13] rpart_4.1-15 RCurl_1.98-1.5 doParallel_1.0.16
## [16] generics_0.1.0 cowplot_1.1.1 RANN_2.6.1
## [19] future_1.22.1 bit_4.0.4 tzdb_0.1.2
## [22] Rphenograph_0.99.1 spatstat.data_2.1-0 lubridate_1.7.10
## [25] httpuv_1.6.3 xfun_0.26 hms_1.1.1
## [28] jquerylib_0.1.4 evaluate_0.14 promises_1.2.0.1
## [31] fansi_0.5.0 readxl_1.3.1 igraph_1.2.6
## [34] htmlwidgets_1.5.4 xROI_0.9.20 apcluster_1.4.8
## [37] spatstat.geom_2.2-2 purrr_0.3.4 ellipsis_0.3.2
## [40] crosstalk_1.1.1 dplyr_1.0.7 ggpubr_0.4.0
## [43] backports_1.2.1 bookdown_0.24 moments_0.14
## [46] deldir_0.2-10 vctrs_0.3.8 Cairo_1.5-12.2
## [49] ROCR_1.0-11 abind_1.4-5 cachem_1.0.6
## [52] rgdal_1.5-27 vroom_1.5.5 sctransform_0.3.2
## [55] mmand_1.6.1 dittoSeq_1.4.3 goftest_1.2-2
## [58] cluster_2.1.2 splancs_2.01-42 lazyeval_0.2.2
## [61] crayon_1.4.1 pkgconfig_2.0.3 labeling_0.4.2
## [64] nlme_3.1-153 rlang_0.4.11 globals_0.14.0
## [67] lifecycle_1.0.1 miniUI_0.1.1.1 colourpicker_1.1.0
## [70] sgeostat_1.0-27 dbscan_1.1-8 cellranger_1.1.0
## [73] rprojroot_2.0.2 polyclip_1.10-0 lmtest_0.9-38
## [76] tiff_0.1-8 Matrix_1.3-4 divo_1.0.1
## [79] raster_3.4-13 carData_3.0-4 zoo_1.8-9
## [82] ggridges_0.5.3 GlobalOptions_0.1.2 pheatmap_1.0.12
## [85] png_0.1-7 viridisLite_0.4.0 rjson_0.2.20
## [88] bitops_1.0-7 R.oo_1.24.0 KernSmooth_2.23-20
## [91] shape_1.4.6 stringr_1.4.0 parallelly_1.28.1
## [94] jpeg_0.1-9 rstatix_0.7.0 ggsignif_0.6.3
## [97] tripack_1.3-9.1 scales_1.1.1 memoise_2.0.0
## [100] magrittr_2.0.1 plyr_1.8.6 ica_1.0-2
## [103] zlibbioc_1.38.0 compiler_4.1.0 RColorBrewer_1.1-2
## [106] clue_0.3-59 fitdistrplus_1.1-5 cli_3.0.1
## [109] XVector_0.32.0 listenv_0.8.0 patchwork_1.1.1
## [112] pbapply_1.5-0 MASS_7.3-54 mgcv_1.8-37
## [115] tidyselect_1.1.1 stringi_1.7.4 forcats_0.5.1
## [118] textshaping_0.3.5 hull2spatial_0.1.0 highr_0.9
## [121] yaml_2.2.1 ggrepel_0.9.1 grid_4.1.0
## [124] sass_0.4.0 spatstat.linnet_2.3-0 tools_4.1.0
## [127] future.apply_1.8.1 rio_0.5.27 circlize_0.4.13
## [130] foreach_1.5.1 foreign_0.8-81 gridExtra_2.3
## [133] alphahull_2.2 farver_2.1.0 Rtsne_0.15
## [136] digest_0.6.28 BiocManager_1.30.16 shiny_1.7.0
## [139] pracma_2.3.3 Rcpp_1.0.7 car_3.0-11
## [142] broom_0.7.9 later_1.3.0 RcppAnnoy_0.0.19
## [145] httr_1.4.2 ComplexHeatmap_2.8.0 colorspace_2.0-2
## [148] fs_1.5.0 tensor_1.5 reticulate_1.22
## [151] splines_4.1.0 uwot_0.1.10 spatstat.utils_2.2-0
## [154] pkgdown_1.6.1 sp_1.4-5 plotly_4.9.4.1
## [157] systemfonts_1.0.2 xtable_1.8-4 jsonlite_1.7.2
## [160] spatstat_2.2-0 R6_2.5.1 pillar_1.6.3
## [163] htmltools_0.5.2 mime_0.11 glue_1.4.2
## [166] fastmap_1.1.0 codetools_0.2-18 utf8_1.2.2
## [169] lattice_0.20-45 bslib_0.3.0 spatstat.sparse_2.0-0
## [172] tibble_3.1.4 curl_4.3.2 leiden_0.3.9
## [175] gtools_3.9.2 zip_2.2.0 shinyjs_2.0.0
## [178] openxlsx_4.2.4 survival_3.2-13 rmarkdown_2.11
## [181] desc_1.3.0 munsell_0.5.0 GetoptLong_1.0.5
## [184] GenomeInfoDbData_1.2.6 iterators_1.0.13 haven_2.4.3
## [187] reshape2_1.4.4 gtable_0.3.0 spatstat.core_2.3-0
## [190] Seurat_4.0.4