--- title: "Getting started with rcloner" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with rcloner} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(rcloner) has_rclone <- rclone_available() if (!has_rclone) { message("rclone is not installed on this system. ", "Code chunks that require rclone are skipped. ", "Install with install_rclone().") } ``` ## Overview `rcloner` provides an R interface to [rclone](https://rclone.org), a command-line program that supports over 40 cloud storage backends, including: - **S3-compatible** stores: Amazon S3, MinIO, Ceph, Cloudflare R2, Backblaze B2, … - **Google Cloud Storage** and Google Drive - **Azure Blob Storage** - **Dropbox**, **OneDrive**, **Box**, and many others All file operations (copy, sync, list, move, delete, …) use the same consistent interface regardless of the storage backend. ## Installation Install from CRAN: ```r install.packages("rcloner") ``` Or the development version from GitHub: ```r # install.packages("pak") pak::pak("boettiger-lab/rcloner") ``` ### Installing the rclone binary `rcloner` automatically locates a system-installed rclone binary. If rclone is not already on your `PATH`, install it with: ```{r install, eval=FALSE} install_rclone() ``` This downloads the appropriate pre-built binary from for your operating system and architecture and stores it in a user-writable directory — no system privileges required. Check the installed version: ```{r version, eval = has_rclone} rclone_version() ``` ## Configuring a remote `rcloner` manages cloud storage credentials through *remotes*, which are named configurations stored in rclone's config file. ### Amazon S3 ```{r s3-config, eval=FALSE} rclone_config_create( "aws", type = "s3", provider = "AWS", access_key_id = Sys.getenv("AWS_ACCESS_KEY_ID"), secret_access_key = Sys.getenv("AWS_SECRET_ACCESS_KEY"), region = "us-east-1" ) ``` ### MinIO / S3-compatible ```{r minio-config, eval=FALSE} rclone_config_create( "minio", type = "s3", provider = "Minio", access_key_id = Sys.getenv("MINIO_ACCESS_KEY"), secret_access_key = Sys.getenv("MINIO_SECRET_KEY"), endpoint = "https://minio.example.com" ) ``` ### Listing configured remotes ```{r listremotes, eval=FALSE} rclone_listremotes() ``` ## Listing objects `rclone_ls()` returns a data frame of objects at a given path. ### Local paths (no credentials needed) ```{r ls-local, eval = has_rclone} # List a local directory rclone_ls(tempdir(), files_only = TRUE) ``` ### Remote paths ```{r ls-remote, eval=FALSE} # List a bucket on a configured remote rclone_ls("aws:my-bucket") # Recursive listing rclone_ls("aws:my-bucket/data/", recursive = TRUE) # Directories only rclone_lsd("aws:my-bucket") ``` `rclone_ls()` parses `rclone lsjson` output and returns a data frame with columns `Path`, `Name`, `Size`, `MimeType`, `ModTime`, and `IsDir`. ## Copying and syncing files ### Copy `rclone_copy()` copies files from source to destination, skipping identical files. It never deletes destination files. ```{r copy-local, eval = has_rclone} src <- tempfile() dest <- tempfile() dir.create(src) dir.create(dest) writeLines("hello from rcloner", file.path(src, "readme.txt")) rclone_copy(src, dest) list.files(dest) ``` ```{r cleanup-copy, echo = FALSE, eval = has_rclone} unlink(src, recursive = TRUE) unlink(dest, recursive = TRUE) ``` ### Copy to/from the cloud ```{r copy-cloud, eval=FALSE} # Upload a local directory to S3 rclone_copy("/local/data", "aws:my-bucket/data") # Download a file from S3 rclone_copy("aws:my-bucket/report.csv", "/local/downloads/") # Copy a URL directly to cloud storage (no local intermediate) rclone_copyurl( "https://raw.githubusercontent.com/tidyverse/readr/main/inst/extdata/mtcars.csv", "aws:my-bucket/mtcars.csv" ) ``` ### Sync `rclone_sync()` makes the destination *identical* to the source, deleting destination files that are not in the source. Use with care. ```{r sync, eval=FALSE} rclone_sync("aws:my-bucket/data", "/local/backup") ``` ### Move `rclone_move()` copies files and then deletes the source. ```{r move, eval=FALSE} rclone_move("aws:staging/file.csv", "aws:archive/2024/file.csv") ``` ## Other file operations ```{r ops, eval=FALSE} # Read a remote file into R contents <- rclone_cat("aws:my-bucket/config.yaml") # Get metadata for an object rclone_stat("aws:my-bucket/data.csv") # Total size of a path rclone_size("aws:my-bucket") # Create a bucket/directory rclone_mkdir("aws:new-bucket") # Delete files (keeps directories) rclone_delete("aws:my-bucket/old-data/") # Remove a path and all its contents rclone_purge("aws:my-bucket/scratch") # Generate a public link (where supported) rclone_link("aws:my-bucket/report.html") # Get storage quota info rclone_about("aws:") ``` ## Using the low-level rclone() wrapper Every rclone subcommand is accessible via the `rclone()` function, which accepts a character vector of arguments: ```{r lowlevel, eval=FALSE} # Equivalent to: rclone version rclone("version") # Run any rclone command rclone(c("check", "aws:bucket", "/local/backup", "--one-way")) ``` ## Migrating from minioclient If you are migrating from the `minioclient` package, the function mapping is: | `minioclient` | `rcloner` | |---------------------|--------------------------| | `mc_alias_set()` | `rclone_config_create()` | | `mc_cp()` | `rclone_copy()` | | `mc_mv()` | `rclone_move()` | | `mc_mirror()` | `rclone_sync()` | | `mc_ls()` | `rclone_ls()` | | `mc_cat()` | `rclone_cat()` | | `mc_mb()` | `rclone_mkdir()` | | `mc_rb()` | `rclone_purge()` | | `mc_rm()` | `rclone_delete()` | | `mc_du()` | `rclone_size()` | | `mc_stat()` | `rclone_stat()` | | `mc()` | `rclone()` | The main difference is that `rcloner` uses *remotes* (e.g. `"aws:bucket"`) rather than *aliases* (e.g. `"alias/bucket"`). Remote configuration is done with `rclone_config_create()` instead of `mc_alias_set()`.