Olink® Analyze is an R package that provides a versatile toolbox to enable fast and easy handling of Olink® NPX data for your proteomics research. Olink® Analyze provides functions for using Olink data, including functions for importing Olink® NPX datasets, as well as quality control (QC) plot functions and functions for various statistical tests. This package is meant to provide a convenient pipeline for your Olink NPX data analysis.
Note: Starting with OlinkAnalyze v5.0, detailed analysis workflow vignettes have been moved to the new OlinkAnalyzeVignettes package, which will be published on CRAN soon. This vignette provides an overview of the main functions in OlinkAnalyze and introduces the new v5.0 preprocessing functions
check_npxandclean_npx.
Preprocessing
Statistical analysis
Visualization
Sample datasets
The package contains two test data files named npx_data1 and npx_data2. These are synthetic datasets that resemble Olink® data accompanied by clinical variables. Olink® data that is delivered in long format or imported with the function read_NPX (that converts the data into a long format) contain the following columns:
Note: There are 5 additional variables in the sample datasets npx_data1 and npx_data2 that include clinical or other information, namely: Subject <chr>, Treatment <chr>, Site <chr>, Time <chr>, Project <chr>.
The columns found in an Olink data set may vary based on the version and product.
The read_NPX function imports an NPX file into a tidy format to work with in R. This function supports Olink® NPX files generated by Olink® data software in CSV, Excel, and Parquet formats. No prior alterations to the NPX output file should be made for this function to work as expected.
A tibble in long format containing:
In order to import multiple NPX data files at once, the read_NPX function can be used in combination with the list.files, lapply and dplyr::bind_rows functions, as seen below. The pattern argument of the list.files function specifies the NPX file format (.csv, .xlsx, .parquet, or any combination of these). This method requires that all NPX files are stored in the same folder and have identical column names. No prior alterations to the NPX output file should be made for this method to work as expected.
# Read in multiple NPX files in .csv format
data <- list.files(
path = "path/to/dir/with/NPX/files",
pattern = "csv$",
full.names = TRUE
) |>
lapply(FUN = function(x) {
df_tmp <- OlinkAnalyze::read_NPX(x) |>
# Optionally add additional columns to add file identifiers
dplyr::mutate(File = x)
return(df_tmp)
}) |>
# optional to return a single data frame of all files instead of a list of dfs
dplyr::bind_rows()
# Read in multiple NPX files in .parquet format
data <- list.files(
path = "path/to/dir/with/NPX/files",
pattern = "parquet$",
full.names = TRUE
) |>
lapply(
OlinkAnalyze::read_NPX
) |>
dplyr::bind_rows()
# Read in multiple NPX files in either format
data <- list.files(
path = "path/to/dir/with/NPX/files",
pattern = "parquet$|csv$",
full.names = TRUE
) |>
lapply(
OlinkAnalyze::read_NPX
) |>
dplyr::bind_rows()The check_npx function performs various quality and
format checks on NPX data imported with read_npx. It is
recommended to run this function after reading in NPX data and before
downstream analysis. The result can be passed as the
check_log argument to clean_npx and all
downstream OlinkAnalyze functions, allowing each function to skip its
own internal check and improve performance.
read_npx).A named list with the following elements:
The clean_npx function cleans an NPX data frame by
applying a series of filtering and conversion steps. It removes invalid
or problematic assays and samples identified by check_npx,
and optionally converts column data types. Passing the output of
check_npx via the check_log argument avoids
re-running the internal checks and improves performance.
read_npx.check_npx. If
NULL, check_npx is run internally.TRUE.TRUE.TRUE.TRUE.TRUE.TRUE.TRUE.NULL.TRUE.TRUE.FALSE.A tibble (or ArrowObject) in long format containing the cleaned NPX data, with invalid assays, control samples, QC-failing samples, and problematic entries removed according to the chosen arguments.
Note: We recommend running check_npx
once again after cleaning the data to confirm that all issues have been
resolved and that the data is ready for downstream analysis.
The olink_plate_randomizer function randomly assigns
samples to a plate well with the option to keep the same individuals on
the same plate. Olink® does not recommend to force balance based on
other clinical variables.
For more information on plate randomization, consult the Plate Randomization Vignette in OlinkAnalyzeVignettes.
The bridge selection function selects a number of bridge samples based on the input data. Bridge samples are used to normalize two dataframes/projects that have been ran at different time points, hence, a batch effect is expected. It selects samples that have good detectability (if applicable), pass quality control, and cover a wide range of data points.
For more information on bridge sample selection, consult the Introduction to bridging Olink® NPX datasets tutorial in OlinkAnalyzeVignettes.
The Olink® normalization function normalizes NPX values between two different datasets or one Olink® dataset to a set of reference medians.
The function handles four different types of normalization:
The olink_lod function adds LOD information to an Explore HT or Explore 3072 NPX dataframe. This function can incorporate LOD based on either an Explore dataset’s negative controls or using predetermined fixed LOD values, which can be downloaded from the Document Download Center at olink.com, or using both methods. The default LOD calculation method is based off of the negative controls. If an NPX file is intensity normalized, both intensity normalized and PC normalized LODs are provided.
For more information on calculating LOD, consult the Calculating LOD from Olink® Explore data tutorial in OlinkAnalyzeVignettes.
The olink_ttest function performs a Welch 2-sample
t-test or paired t-test at confidence level 0.95 for every protein (by
OlinkID) for a given grouping variable. It corrects for multiple testing
using the Benjamini-Hochberg method (“fdr”). Adjusted p-values are
logically evaluated towards adjusted p-value < 0.05. The resulting
t-test table is arranged by ascending p-values.
check_npx. If
NULL, check_npx is run internally.A tibble with the following columns:
The olink_wilcox function performs a 2-sample
Mann-Whitney U test or paired Mann-Whitney U test at confidence level
0.95 for every protein (by OlinkID) for a given grouping variable. It
corrects for multiple testing using the Benjamini-Hochberg method
(“fdr”). Adjusted p-values are logically evaluated towards adjusted
p-value<0.05. The resulting Mann-Whitney U table is arranged by
ascending p-values.
check_npx. If
NULL, check_npx is run internally.A tibble with the following columns:
The olink_anova function performs an ANOVA F-test for
each assay (by OlinkID) using Type III sum of squares. The function
handles both factor and numerical variables, and/or confounding
factors.
Samples with missing variable information or factor levels are excluded from the analysis. Character columns in the input data frame are converted to factors.
Control samples and control assays should be removed before using this function.
Crossed/interaction analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases:
For covariates, crossed analyses need to be specified explicitly, i.e. two main effects will not be expanded with a c(‘A’,‘B’) notation. Main effects present in the variable take precedence.
Adjusted p-values are calculated using the Benjamini & Hochberg (1995) method (“fdr”). The threshold is determined by logic evaluation of Adjusted_pval < 0.05. Covariates are not included in the p-value adjustment.
check_npx. If
NULL, check_npx is run internally.# One-way ANOVA, no covariates
anova_results_oneway <- OlinkAnalyze::olink_anova(
df = npx_clean,
variable = "Site",
check_log = check_npx_clean
)
# Two-way ANOVA, no covariates
anova_results_twoway <- OlinkAnalyze::olink_anova(
df = npx_clean,
variable = c("Site", "Time"),
check_log = check_npx_clean
)
# One-way ANOVA, Treatment as covariates
anova_results_oneway <- OlinkAnalyze::olink_anova(
df = npx_clean,
variable = "Site",
covariates = "Treatment",
check_log = check_npx_clean
)A tibble with the following columns:
olink_anova_posthoc performs a post-hoc ANOVA test with
Tukey p-value adjustment per assay (by OlinkID) at confidence level
0.95.
The function handles both factor and numerical variables and/or covariates. The post-hoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation (SD) difference in the numerical variable, e.g. mean NPX at mean (numerical variable) versus mean NPX at mean (numerical variable) + 1*SD (numerical variable).
Control samples and control assays (AssayType is not “assay”, or Assay contains “control” or “ctrl”) should be removed before using this function.
check_npx. If
NULL, check_npx is run internally.# calculate the p-value for the ANOVA
anova_results_oneway <- OlinkAnalyze::olink_anova(
df = npx_clean,
variable = "Site",
check_log = check_npx_clean
)
# extracting the significant proteins
anova_results_oneway_sign <- anova_results_oneway |>
dplyr::filter(
.data[["Threshold"]] == "Significant"
) |>
dplyr::pull(
.data[["OlinkID"]]
)
anova_posthoc_oneway_results <- OlinkAnalyze::olink_anova_posthoc(
df = npx_clean,
olinkid_list = anova_results_oneway_sign,
variable = "Site",
effect = "Site",
check_log = check_npx_clean
)A tibble with the following columns:
The olink_lmer fits a linear mixed effects model for
every protein (by OlinkID) in every panel. The function handles both
factor and numerical variables and/or covariates.
Samples with missing variable information or factor levels are excluded from the analysis. Character columns in the input data frame are converted to factors.
Crossed/interaction analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases:
For covariates, crossed analyses need to be specified explicitly, i.e. two main effects will not be expanded with a c(‘A’,‘B’) notation. Main effects present in the variable take precedence.
Adjusted p-values are calculated using the Benjamini & Hochberg (1995) method (“fdr”). The threshold is determined by logic evaluation of Adjusted_pval < 0.05. Covariates are not included in the p-value adjustment.
check_npx. If
NULL, check_npx is run internally.# Linear mixed model with one variable.
lmer_results_oneway <- OlinkAnalyze::olink_lmer(
df = npx_clean,
variable = "Site",
random = "Subject",
check_log = check_npx_clean
)
# Linear mixed model with two variables.
lmer_results_twoway <- OlinkAnalyze::olink_lmer(
df = npx_clean,
variable = c("Site", "Treatment"),
random = "Subject",
check_log = check_npx_clean
)A tibble with the following columns:
The olink_lmer_posthoc function is similar to olink_lmer but performs a post-hoc analysis based on a linear mixed model effects model. The function handles both factor and numerical variables and/or covariates. Differences in estimated marginal means are calculated for all pairwise levels of a given output variable. Degrees of freedom are estimated using Satterthwaite’s approximation. The post-hoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation difference in the numerical variable, e.g. mean NPX at mean(numerical variable) versus mean NPX at mean(numerical variable) + 1*SD(numerical variable). The output tibble is arranged by ascending adjusted p-values.
check_npx. If
NULL, check_npx is run internally.# Linear mixed model with two variables.
lmer_results_twoway <- OlinkAnalyze::olink_lmer(
df = npx_clean,
variable = c("Site", "Treatment"),
random = "Subject",
check_log = check_npx_clean
)
# extracting the significant proteins
lmer_results_twoway_sign <- lmer_results_twoway |>
dplyr::filter(
.data[["Threshold"]] == "Significant" &
.data[["term"]] == "Treatment"
) |>
dplyr::pull(
.data[["OlinkID"]]
)
# performing post-hoc analysis
lmer_posthoc_twoway_results <- OlinkAnalyze::olink_lmer_posthoc(
df = npx_clean,
olinkid_list = lmer_results_twoway_sign,
variable = c("Site", "Treatment"),
random = "Subject",
effect = "Treatment",
check_log = check_npx_clean
)A tibble with the following columns:
Many other statistical functions can be found within Olink Analyze, including:
To learn more about these function, consult their help documentation
using the help() function.
The olink_pathway_enrichment function can be used to
perform Gene Set Enrichment Analysis (GSEA) or Over-representation
Analysis (ORA) using MSigDB, Reactome, KEGG, or GO. MSigDB includes
curated gene sets (C2) and ontology gene sets (C5) which encompasses
Reactome, KEGG, and GO. This function performs enrichment using the
gsea or enrich functions from clusterProfiler from
BioConductor. The function uses the estimate from a previous statistical
analysis for one contrast for all proteins. MSigDB is subset if ontology
is KEGG, GO, or Reactome. test_results must contain estimates for all
assays. Posthoc results can be used but should be filtered for one
contrast to improve interpretability.
Alternative statistical results can be used as input as long as they include the columns “OlinkID”, “Assay”, and “estimate”. A column named “Adjusted_pal” is also needed for ORA. Any statistical results that contains one estimate per protein will work as long as the estimates are comparable to each other.
ttest_results <- OlinkAnalyze::olink_ttest(
df = npx_df_clean,
variable = "Treatment",
alternative = "two.sided",
check_log = check_npx_clean
)
gsea_results <- OlinkAnalyze::olink_pathway_enrichment(
df = npx_df_clean,
test_results = ttest_results
)
ora_results <- OlinkAnalyze::olink_pathway_enrichment(
df = npx_df_clean,
test_results = ttest_results,
method = "ORA"
)A data frame of enrichment results. Columns for ORA include:
Columns for GSEA:
Generates PCA projection of all samples from NPX data along two principal components (default PC2 vs PC1) colored by the variable specified by color_g (default QC_Warning) and including the percentage of explained variance. By default, the values are scaled and centered in the PCA and proteins with missing NPX values removed from the corresponding assay(s). Unique sample names are required. Imputation by median value is done for assays with missingness <10% and for multi-plate projects, and for missingness <5% for single plate projects.
More information about olink_pca() can be found in the
Outlier Exclusion Vignette in
OlinkAnalyzeVignettes.
Computes a manifold approximation and projection and plots the two specified components. Unique sample names are required and imputation by the median is done for assays with missingness <10% for multi-plate projects and <5% for single plate projects.
The arguments outlierDefX and outlierDefY can be used to identify outliers in the UMAP results. Sample outliers will be labelled.
NOTE: UMAP is a non-linear data transformation that might not accurately preserve the properties of the data. Distances in the UMAP plane should therefore be interpreted with caution
check_npx. If
NULL, check_npx is run internally.OlinkAnalyze::olink_umap_plot(
df = npx_clean,
color_g = "QC_Warning",
byPanel = TRUE,
check_log = check_npx_clean
)A list of objects of class ggplot (silently returned). Plots
are also printed unless option quiet = TRUE is set.
The olink_boxplot function is used to generate boxplots
of NPX values stratified on a variable for a given list of proteins. In
order to annotate the plot with ANOVA posthoc analysis results
(i.e. include statistical asterisks in the plot), control samples and
control assays should be removed from the data.
check_npx. If
NULL, check_npx is run internally.plot <- npx_clean |>
dplyr::filter(
!is.na(.data[["Site"]])
) |> # removing missing values which exist for Site
OlinkAnalyze::olink_boxplot(
variable = "Site",
olinkid_list = c("OID00488", "OID01276"),
number_of_proteins_per_plot = 2L,
check_log = check_npx_clean
)
plot[[1L]]anova_posthoc_results <- npx_clean |>
OlinkAnalyze::olink_anova_posthoc(
olinkid_list = c("OID00488", "OID01276"),
variable = "Site",
effect = "Site",
check_log = check_npx_clean
)
plot2 <- npx_clean |>
stats::na.omit() |> # removing missing values which exists for Site
OlinkAnalyze::olink_boxplot(
variable = "Site",
olinkid_list = c("OID00488", "OID01276"),
number_of_proteins_per_plot = 2L,
posthoc_results = anova_posthoc_results,
check_log = check_npx_clean
)
plot2[[1L]]A list of objects of class ggplot.
Note: Please note that plots will not appear in the plots panel of Rstudio if not assigned to a variable and printing it (see sample code above).
The olink_dist_plot function generates boxplots of NPX values for each sample, faceted by Olink panel. This is used as an initial QC step to identify potential outliers.
More information about olink_dist_plot() can be found in
the Outlier Exclusion Vignette in
OlinkAnalyzeVignettes.
The function olink_lmer_plot generates a point-range plot for a given list of proteins based on linear mixed effect model. The points illustrate the mean NPX level for each group and the error bars illustrate 95% confidence intervals. Facets are labeled by the protein name and corresponding OlinkID for the protein.
check_npx. If
NULL, check_npx is run internally.plot <- OlinkAnalyze::olink_lmer_plot(
df = npx_clean,
olinkid_list = c("OID01216", "OID01217"),
variable = c("Site", "Treatment"),
x_axis_variable = "Site",
col_variable = "Treatment",
random = "Subject",
check_log = check_npx_clean
)
plot[[1L]]A list of objects of class ggplot.
Note: Please note that plots will not appear in the plots panel of Rstudio if not assigned to a variable and printing it (see sample code above).
The olink_pathway_heatmap function generates a heatmap of proteins related to pathways using the enrichment results from the olink_pathway_enrichment function. Either the top terms can be visualized or terms containing a certain keyword. For each term, the proteins in the test_result data frame that are related to that term will be visualized by their estimate. This visualization can be used to determining how many proteins of interest are involved in a particular pathway and in which direction their estimates are.
olink_pathway_enrichment()OlinkAnalyze::olink_pathway_heatmap(
enrich_results = ora_results,
test_results = ttest_results,
method = "ORA",
keyword = "immune"
)A heatmap as a ggplot object.
The olink_pathway_visualization function generates a bar
graph of the top terms or terms related to a certain keyword for results
from the olink_pathway_enrichment function. The bar
represents either the normalized enrichment score (NES) for GSEA results
or counts (number of proteins) for ORA results colored by adjusted
p-value. Pathways are ordered by unadjusted p-value. The ORA
visualization also contains the number of proteins out of the total
proteins in that pathway as a ratio after the bar.
olink_pathway_enrichment()A bar graph as a ggplot object.
The olink_qc_plot function generates a plot faceted by Panel, plotting IQR vs. median for all samples. This is a good first check to find out if any samples have a tendency to be classified as outliers. Horizontal dashed lines indicate +/-3 standard deviations from the mean IQR. Vertical dashed lines indicate +/-3 standard deviations from the mean sample median.
More information about olink_qc_plot() can be found in
the Outlier Exclusion Vignette in
OlinkAnalyzeVignettes.
The olink_heatmap_plot function generates a heatmap for all samples and proteins. By default, the heatmap centers and scales NPX across all proteins and clusters samples and proteins using a dendrogram. Unique sample names are required.
The grouping variable(s) are annotated and colored in the left side of the heatmap.
check_npx. If
NULL, check_npx is run internally.first10 <- npx_clean |>
dplyr::pull(
.data[["OlinkID"]]
) |>
unique() |>
utils::head(n = 10L)
first15samples <- npx_clean |>
dplyr::pull(
.data[["SampleID"]]
) |>
unique() |>
utils::head(n = 15L)
npx_data_small <- npx_clean |>
dplyr::filter(
.data[["OlinkID"]] %in% .env[["first10"]]
) |>
dplyr::filter(
.data[["SampleID"]] %in% .env[["first15samples"]]
)
OlinkAnalyze::olink_heatmap_plot(
df = npx_data_small,
variable_row_list = "Treatment",
check_log = check_npx_clean
)An object of class ggplot.
The olink_volcano_plot function generates a volcano plot
using results from the olink_ttest function. The estimated difference is
shown in the x-axis and -log10(p-value) in the y-axis. The
horizontal dotted line indicates p-value = 0.05. Dots are colored based
on significance following Benjamini-Hochberg adjustment with a p-value
cutoff of 0.05. Significant assays after adjustment can optionally be
annotated by OlinkID.
# perform t-test
ttest_results <- OlinkAnalyze::olink_ttest(
df = npx_clean,
variable = "Treatment",
check_log = check_npx_clean
)
# select names of proteins to show
top_10_name <- ttest_results |>
dplyr::slice_head(
n = 10L
) |>
dplyr::pull(
.data[["OlinkID"]]
)
# volcano plot
OlinkAnalyze::olink_volcano_plot(
p.val_tbl = ttest_results,
x_lab = "Treatment",
olinkid_list = top_10_name
)An object of class ggplot.
This function sets a coherent plot theme for plots by adding it to a ggplot object. It is mainly used for aesthetic reasons.
OlinkAnalyze::npx_data1 |>
dplyr::filter(
!is.na(.data[["Treatment"]])
) |>
dplyr::filter(
.data[["OlinkID"]] == "OID01216"
) |>
ggplot2::ggplot(
ggplot2::aes(
x = .data[["Treatment"]],
y = .data[["NPX"]],
fill = .data[["Treatment"]]
)
) +
ggplot2::geom_boxplot() +
OlinkAnalyze::set_plot_theme()These functions sets a coherent coloring theme for the plots by adding it to a ggplot object. It is mainly used for aesthetic reasons.
OlinkAnalyze::npx_data1 |>
dplyr::filter(
!is.na(.data[["Treatment"]])
) |>
dplyr::filter(
.data[["OlinkID"]] == "OID01216"
) |>
ggplot2::ggplot(
mapping = ggplot2::aes(
x = .data[["Treatment"]],
y = .data[["NPX"]],
fill = .data[["Treatment"]]
)
) +
ggplot2::geom_boxplot() +
OlinkAnalyze::set_plot_theme() +
OlinkAnalyze::olink_fill_discrete()The olink_bridgeability_plot function generates a series
of plots on a per-assay basis for a data frame generated from
between-product bridging. The coloration of the figure headers indicate
whether that assay has been defined as bridgeable or not bridgeable. The
correlation plot, violin plot, and bar chart figures illustrate the
three criteria for determining whether an assay is bridgeable. For
assays determined to be bridgeable, the ECDF curve and corresponding KS
statistic are used to determine which normalization approach (median
centering or quantile smoothing) is most suitable for between-product
normalization. For more information on the between-product bridging
methodology and bridgeability criteria, consult the Bridging
across NGS-based Olink® products
Tutorial in OlinkAnalyzeVignettes.
check_npx. If
NULL, check_npx is run internally.npx_ht <- data_exploreht |>
dplyr::filter(
.data[["SampleType"]] == "SAMPLE"
) |>
dplyr::mutate(
Project = "data1"
)
check_npx_ht <- OlinkAnalyze::check_npx(
df = npx_ht
)
npx_3072 <- data_explore3072 |>
dplyr::filter(
.data[["SampleType"]] == "SAMPLE"
) |>
dplyr::mutate(
Project = "data2"
)
check_npx_3072 <- OlinkAnalyze::check_npx(
df = npx_3072
)
overlapping_samples <- unique(
intersect(
x = npx_ht |> dplyr::distinct(.data[["SampleID"]]) |> dplyr::pull(),
y = npx_3072 |> dplyr::distinct(.data[["SampleID"]]) |> dplyr::pull()
)
)
npx_br_data <- OlinkAnalyze::olink_normalization(
df1 = npx_ht,
df2 = npx_3072,
overlapping_samples_df1 = overlapping_samples,
df1_project_nr = "Explore HT",
df2_project_nr = "Explore 3072",
reference_project = "Explore HT",
format = FALSE,
df1_check_log = check_npx_ht,
df2_check_log = check_npx_3072
)
check_npx_br_data <- OlinkAnalyze::check_npx(
df = npx_br_data
)
npx_br_data_bridgeable_plt <- OlinkAnalyze::olink_bridgeability_plot(
df = npx_br_data,
median_counts_threshold = 150L,
min_count = 10L,
check_log = check_npx_br_data
)
npx_br_data_bridgeable_plt[[1L]]A list of objects of class ggplot.
We are always happy to help. Email us with any questions:
biostat@olink.com for statistical services and general stats questions
support@olink.com for Olink lab product and technical support
info@olink.com for more information
© 2025 Olink Proteomics AB, part of Thermo Fisher Scientific.
Olink products and services are For Research Use Only. Not for use in diagnostic procedures.
All information in this document is subject to change without notice. This document is not intended to convey any warranties, representations and/or recommendations of any kind, unless such warranties, representations and/or recommendations are explicitly stated.
Olink assumes no liability arising from a prospective reader’s actions based on this document.
OLINK, NPX, PEA, PROXIMITY EXTENSION, INSIGHT and the Olink logotype are trademarks registered, or pending registration, by Olink Proteomics AB. All third-party trademarks are the property of their respective owners.
Olink products and assay methods are covered by several patents and patent applications https://www.olink.com/patents/.