Olink® Analyze Vignette

Olink DS team

2026-03-28

Olink® Analyze is an R package that provides a versatile toolbox to enable fast and easy handling of Olink® NPX data for your proteomics research. Olink® Analyze provides functions for using Olink data, including functions for importing Olink® NPX datasets, as well as quality control (QC) plot functions and functions for various statistical tests. This package is meant to provide a convenient pipeline for your Olink NPX data analysis.

Note: Starting with OlinkAnalyze v5.0, detailed analysis workflow vignettes have been moved to the new OlinkAnalyzeVignettes package, which will be published on CRAN soon. This vignette provides an overview of the main functions in OlinkAnalyze and introduces the new v5.0 preprocessing functions check_npx and clean_npx.

Installation

You can install Olink® Analyze from CRAN.

install.packages("OlinkAnalyze")

List of functions

Preprocessing

Statistical analysis

Visualization

Sample datasets

Usage

Preprocessing

Read NPX data (read_NPX)

The read_NPX function imports an NPX file into a tidy format to work with in R. This function supports Olink® NPX files generated by Olink® data software in CSV, Excel, and Parquet formats. No prior alterations to the NPX output file should be made for this function to work as expected.

Function arguments

  • filename: Path to the NPX output file.
data <- OlinkAnalyze::read_NPX("~/NPX_file_location.xlsx")

Function output

A tibble in long format containing:

  • SampleID: Sample names or IDs.
  • Index: Unique number for each SampleID. It is used to make up for non unique sample IDs.
  • OlinkID: Unique ID for each assay assigned by Olink. In case the assay is included in more than one panels it will have a different OlinkID in each one.
  • UniProt: UniProt ID.
  • Assay: Common gene name for the assay.
  • MissingFreq: Missing frequency for the OlinkID, i.e. frequency of samples with NPX value below limit of detection (LOD).
  • Panel: Olink Panel that samples ran on. Read more about Olink Panels here: https://olink.com/products/compare
  • Panel_Version: Version of the panel. A new panel version might include some different or improved assays.
  • PlateID: Name of the plate.
  • QC_Warning: Indication whether the sample passed Olink QC. More information about Olink quality control metrics can be found in our FAQ (Search “Quality control”).
  • LOD: Limit of detection (LOD) is the minimum level of an individual protein that can be measured. LOD is defined as 3 times the standard deviation over background.
  • NPX: Normalized Protein eXpression, is Olink’s unit of protein expression level in a log2 scale. The majority of the functions of this package use NPX values for calculations. Read more about NPX in the Olink FAQ (Search “What is NPX?”) or in Olink’s Data normalization and standardization white paper.

Read multiple NPX data files (read_NPX)

In order to import multiple NPX data files at once, the read_NPX function can be used in combination with the list.files, lapply and dplyr::bind_rows functions, as seen below. The pattern argument of the list.files function specifies the NPX file format (.csv, .xlsx, .parquet, or any combination of these). This method requires that all NPX files are stored in the same folder and have identical column names. No prior alterations to the NPX output file should be made for this method to work as expected.

# Read in multiple NPX files in .csv format
data <- list.files(
  path = "path/to/dir/with/NPX/files",
  pattern = "csv$",
  full.names = TRUE
) |>
  lapply(FUN = function(x) {
    df_tmp <- OlinkAnalyze::read_NPX(x) |>
      # Optionally add additional columns to add file identifiers
      dplyr::mutate(File = x)
    return(df_tmp)
  })  |>
  # optional to return a single data frame of all files instead of a list of dfs
  dplyr::bind_rows()

# Read in multiple NPX files in .parquet format
data <- list.files(
  path = "path/to/dir/with/NPX/files",
  pattern = "parquet$",
  full.names = TRUE
) |>
  lapply(
    OlinkAnalyze::read_NPX
  )  |>
  dplyr::bind_rows()

# Read in multiple NPX files in either format
data <- list.files(
  path = "path/to/dir/with/NPX/files",
  pattern = "parquet$|csv$",
  full.names = TRUE
) |>
  lapply(
    OlinkAnalyze::read_NPX
  )  |>
  dplyr::bind_rows()

Check NPX data quality (check_npx)

The check_npx function performs various quality and format checks on NPX data imported with read_npx. It is recommended to run this function after reading in NPX data and before downstream analysis. The result can be passed as the check_log argument to clean_npx and all downstream OlinkAnalyze functions, allowing each function to skip its own internal check and improve performance.

Function arguments

  • df: NPX data frame in long format (as returned by read_npx).
  • preferred_names: Optional named character vector to resolve column name ambiguities or to map custom column names to internally expected ones.
# Check NPX data quality and format
check_npx_result <- OlinkAnalyze::check_npx(
  df = OlinkAnalyze::npx_data1
)

Function output

A named list with the following elements:

  • col_names <list>: Column names from the input data frame to be used in downstream analyses.
  • oid_invalid <chr>: OlinkID values that do not follow the expected format (OID#####).
  • assay_na <chr>: OlinkIDs of assays where all samples have NA quantification values.
  • sample_id_dups <chr>: Duplicate SampleID values detected in the data.
  • sample_id_na <chr>: SampleIDs of samples with NA quantification values for all assays.
  • col_class <data.frame>: Columns with incorrect data types, including the column key, column name, detected type, and expected type.
  • assay_qc <chr>: OlinkIDs of assays with at least one assay QC warning.
  • non_unique_uniprot <chr>: OlinkIDs mapped to more than one UniProt ID.
  • darid_invalid <data.frame>: Invalid combinations of DataAnalysisRefID and PanelDataArchiveVersion.

Clean NPX data (clean_npx)

The clean_npx function cleans an NPX data frame by applying a series of filtering and conversion steps. It removes invalid or problematic assays and samples identified by check_npx, and optionally converts column data types. Passing the output of check_npx via the check_log argument avoids re-running the internal checks and improves performance.

Function arguments

  • df: NPX data frame in long format as returned by read_npx.
  • check_log: Named list returned by check_npx. If NULL, check_npx is run internally.
  • remove_assay_na: Logical. Remove assays where all samples have NA values. Default: TRUE.
  • remove_invalid_oid: Logical. Remove assays with invalid OlinkIDs. Default: TRUE.
  • remove_dup_sample_id: Logical. Remove samples with duplicate IDs. Default: TRUE.
  • remove_control_assay: Logical. Remove internal control assays. Default: TRUE.
  • remove_control_sample: Logical. Remove external control samples based on SampleType. Default: TRUE.
  • remove_qc_warning: Logical. Remove samples with QC status ‘FAIL’. Default: TRUE.
  • remove_assay_warning: Logical. Remove assays flagged with assay warnings. Default: TRUE.
  • control_sample_ids: Character vector of additional SampleIDs to remove. Default: NULL.
  • convert_df_cols: Logical. Convert columns to their expected data types. Default: TRUE.
  • convert_nonunique_uniprot: Logical. Resolve non-unique OlinkID–UniProt mappings. Default: TRUE.
  • verbose: Logical. Print progress messages. Default: FALSE.
# Clean the NPX data using the check_npx output
npx_clean <- OlinkAnalyze::clean_npx(
  df = OlinkAnalyze::npx_data1,
  check_log = check_npx_result
)

Function output

A tibble (or ArrowObject) in long format containing the cleaned NPX data, with invalid assays, control samples, QC-failing samples, and problematic entries removed according to the chosen arguments.

Note: We recommend running check_npx once again after cleaning the data to confirm that all issues have been resolved and that the data is ready for downstream analysis.

# Check NPX data quality and format
check_npx_clean <- OlinkAnalyze::check_npx(
  df = npx_clean
)

Statistical analysis

Post-hoc ANOVA analysis (olink_anova_posthoc)

olink_anova_posthoc performs a post-hoc ANOVA test with Tukey p-value adjustment per assay (by OlinkID) at confidence level 0.95.

The function handles both factor and numerical variables and/or covariates. The post-hoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation (SD) difference in the numerical variable, e.g. mean NPX at mean (numerical variable) versus mean NPX at mean (numerical variable) + 1*SD (numerical variable).

Control samples and control assays (AssayType is not “assay”, or Assay contains “control” or “ctrl”) should be removed before using this function.

Function arguments

  • df: NPX data frame in long format should minimally contain protein name (Assay), OlinkID, UniProt, Panel and an outcome factor with at least 3 levels.
  • olinkid_list: Character vector of OlinkID’s on which to perform the post-hoc analysis. If not specified, all assays in df are used.
  • variable: Single character value or character array. In case of single character then that should represent a column in the df. Otherwise, if length > 1, the included variable names will be used in crossed analyses. It can also accept the notations ‘:’ or ‘*’.
  • covariates: Single character value or character array. Default: NULL. Confounding factors to include in the analysis. In case of single character then that should represent a column in the df. It can also accept the notations ‘:’ or ‘*’, while crossed analysis will not be inferred from main effects.
  • outcome: Name of the column from df that contains the dependent variable. Default: NPX.
  • effect: Term on which to perform the post-hoc analysis. Character vector. Must be subset of or identical to the variable and no adjustment is performed.
  • mean_return: Logical. If true, returns the mean of each factor level rather than the difference in means (default). Note that no p-value is returned for mean_return = TRUE.
  • verbose: Logical. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.
  • check_log: Named list returned by check_npx. If NULL, check_npx is run internally.
# calculate the p-value for the ANOVA
anova_results_oneway <- OlinkAnalyze::olink_anova(
  df = npx_clean,
  variable = "Site",
  check_log = check_npx_clean
)

# extracting the significant proteins
anova_results_oneway_sign <- anova_results_oneway |>
  dplyr::filter(
    .data[["Threshold"]] == "Significant"
  ) |>
  dplyr::pull(
    .data[["OlinkID"]]
  )

anova_posthoc_oneway_results <- OlinkAnalyze::olink_anova_posthoc(
  df = npx_clean,
  olinkid_list = anova_results_oneway_sign,
  variable = "Site",
  effect = "Site",
  check_log = check_npx_clean
)

Function output

A tibble with the following columns:

  • Assay <chr>: Assay name.
  • OlinkID <chr>: Unique Olink ID.
  • UniProt <chr>: UniProt ID.
  • Panel <chr>: Olink Panel.
  • term <chr>: Name of the variable that was used for the p-value calculation. The “:” between variables indicates interaction between variables.
  • contrast <chr>: Variables (in term) that are compared.
  • estimate <dbl>: Difference in mean NPX between variables (from contrast).
  • conf.low <dbl>: Low bound of the confidence interval for the mean.
  • conf.high <dbl>: High bound of the confidence interval for the mean.
  • Adjusted_pval <dbl>: Adjusted p-value for the test (Benjamini & Hochberg).
  • Threshold <chr>: Text indication if assay is significant (adjusted p-value < 0.05).

Post-hoc linear mixed effects model analysis (olink_lmer_posthoc)

The olink_lmer_posthoc function is similar to olink_lmer but performs a post-hoc analysis based on a linear mixed model effects model. The function handles both factor and numerical variables and/or covariates. Differences in estimated marginal means are calculated for all pairwise levels of a given output variable. Degrees of freedom are estimated using Satterthwaite’s approximation. The post-hoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation difference in the numerical variable, e.g. mean NPX at mean(numerical variable) versus mean NPX at mean(numerical variable) + 1*SD(numerical variable). The output tibble is arranged by ascending adjusted p-values.

Function arguments

  • df: NPX data frame in long format should minimally contain protein name (Assay), OlinkID, UniProt, Panel and 1-2 variables with at least 2 levels and subject ID.
  • variable: Single character value or character array. In case of single character then that should represent a column in the df. Otherwise, if length > 1, the included variable names will be used in crossed analyses. It can also accept the notations ‘:’ or ‘*’.
  • olinkid_list: Character vector of OlinkID’s on which to perform the post-hoc analysis. If not specified, all assays in df are used.
  • effect: Term on which to perform the post-hoc analysis. Character vector. Must be subset of or identical to the variable.
  • outcome: Name of the column from df that contains the dependent variable. Default: NPX.
  • random: Single character value or character array with random effects.
  • covariates: Single character value or character array. Default: NULL. Confounding factors to include in the analysis. In case of single character then that should represent a column in the df. It can also accept the notations ‘:’ or ‘*’, while crossed analysis will not be inferred from main effects.
  • mean_return: Logical. If true, returns the mean of each factor level rather than the difference in means (default). Note that no p-value is returned for mean_return = TRUE and no adjustment is performed.
  • verbose: Logical. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.
  • check_log: Named list returned by check_npx. If NULL, check_npx is run internally.
# Linear mixed model with two variables.
lmer_results_twoway <- OlinkAnalyze::olink_lmer(
  df = npx_clean,
  variable = c("Site", "Treatment"),
  random = "Subject",
  check_log = check_npx_clean
)

# extracting the significant proteins
lmer_results_twoway_sign <- lmer_results_twoway |>
  dplyr::filter(
    .data[["Threshold"]] == "Significant" &
      .data[["term"]] == "Treatment"
  ) |>
  dplyr::pull(
    .data[["OlinkID"]]
  )

# performing post-hoc analysis
lmer_posthoc_twoway_results <- OlinkAnalyze::olink_lmer_posthoc(
  df = npx_clean,
  olinkid_list = lmer_results_twoway_sign,
  variable = c("Site", "Treatment"),
  random = "Subject",
  effect = "Treatment",
  check_log = check_npx_clean
)

Function output

A tibble with the following columns:

  • Assay <chr>: Assay name.
  • OlinkID <chr>: Unique Olink ID.
  • UniProt <chr>: UniProt ID.
  • Panel <chr>: Olink Panel.
  • term <chr>: Name of the variable that was used for the p-value calculation. The “:” between variables indicates interaction between variables.
  • contrast <chr>: Variables (in term) that are compared.
  • estimate <dbl>: Difference in mean NPX between variables (from contrast).
  • conf.low <dbl>: Low bound of the confidence interval for the mean.
  • conf.high <dbl>: High bound of the confidence interval for the mean.
  • Adjusted_pval <dbl>: Adjusted p-value for the test (Benjamini & Hochberg).
  • Threshold <chr>: Text indication if assay is significant (adjusted p-value < 0.05).

Additional Statistical Tests

Many other statistical functions can be found within Olink Analyze, including:

To learn more about these function, consult their help documentation using the help() function.

Exploratory analysis

Visualization

Theming function (set_plot_theme)

This function sets a coherent plot theme for plots by adding it to a ggplot object. It is mainly used for aesthetic reasons.

OlinkAnalyze::npx_data1 |>
  dplyr::filter(
    !is.na(.data[["Treatment"]])
  ) |>
  dplyr::filter(
    .data[["OlinkID"]] == "OID01216"
  ) |>
  ggplot2::ggplot(
    ggplot2::aes(
      x = .data[["Treatment"]],
      y = .data[["NPX"]],
      fill = .data[["Treatment"]]
    )
  ) +
  ggplot2::geom_boxplot() +
  OlinkAnalyze::set_plot_theme()

Contact Us

We are always happy to help. Email us with any questions: