---
title: "How a5R stores cell IDs without strings"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{How a5R stores cell IDs without strings}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## The problem

An A5 cell ID is a 64-bit unsigned integer (`u64`).
R has no native `u64` type — its integers are 32-bit signed (`-2^31` to
`2^31 - 1`), and its doubles are 64-bit floating point. A `double`
can only represent integers exactly up to 2^53, while a `u64` can go up to
2^64 - 1.

The obvious workaround is to store cell IDs as hex strings
(`"0800000000000006"`). This works, but every trip across the R--Rust
boundary requires hex parsing and formatting — O(n) string allocation that
dominates the cost of lightweight operations like `a5_get_resolution()` or
`a5_cell_to_parent()`.

## The solution: eight raw-byte fields

A `u64` is exactly 8 bytes. We store each byte of the little-endian
representation as a separate `raw` vector field in a vctrs record type:

```
cell_id (u64):  0x0800000000000006

little-endian bytes:
  b1 = 0x06, b2 = 0x00, b3 = 0x00, b4 = 0x00,
  b5 = 0x00, b6 = 0x00, b7 = 0x00, b8 = 0x08
```

This is lossless — the eight bytes are the exact same bits as the original
`u64`, just stored across eight contiguous `raw` vectors. No precision loss,
no special-case handling. On the Rust side, reconstructing the `u64` from
the eight byte slices is a single `u64::from_le_bytes()` call. This also
avoids pointers, so there is no need to think about serialization when saving
an `a5_cell` object to disk.

## R-side: a vctrs record type

On the R side, `a5_cell` is a **vctrs record** (`vctrs::new_rcrd()`) with
eight fields (`b1` through `b8`):

```{r}
library(a5R)
cell <- a5_lonlat_to_cell(-3.19, 55.95, resolution = 10)
vctrs::field(cell, "b1")
vctrs::field(cell, "b8")
```

Each field is a plain `raw` vector — a contiguous block of memory with
no per-element overhead. Subsetting, combining, and NA propagation are all
handled automatically by vctrs.

Hex strings are only produced on demand:

```{r}
# Display calls format(), which converts to hex for readability
cell

# Explicit conversion
a5_u64_to_hex(cell)

# Round-trip from hex
a5_cell("0800000000000006")
```

## Why this matters

Compare memory for one million cells:

```{r}
set.seed(42)
cells <- a5_lonlat_to_cell(
  runif(1e6, -180, 180),
  runif(1e6, -80, 80),
  resolution = 10
)

# rcrd: eight contiguous raw vectors (8 × 1 byte × 1M ≈ 7.6 MB)
format(object.size(cells), units = "MB")

# equivalent hex strings would be ~81 MB
# (16 chars + 56-byte SEXP header per string)
hex <- a5_u64_to_hex(cells)
format(object.size(hex), units = "MB")
```

## NA handling

A5 cell IDs use 60 "quintants" (values 0–59) in their top 6 bits. Quintant
63 (binary `111111`) is invalid in the A5 system, so we use
`0xFC00000000000000` as a sentinel value for `NA`. In little-endian, the
last byte (`b8`) is `0xFC`, making NA detection a fast single-byte check.

On the Rust side, the sentinel is detected and mapped to `None`. Standard
R idioms work as expected:

```{r}
cells_with_na <- a5_cell(c("0800000000000006", NA))
is.na(cells_with_na)
```

## Summary

| Aspect | Hex strings | Raw bytes |
|--------|------------|-----------|
| R type | `character` vector | `vctrs_rcrd` (eight `raw` fields) |
| Memory (1M cells) | ~81 MB | ~7.6 MB |
| R-Rust crossing | O(n) hex parse/format | Zero-copy byte access |
| Human-readable | Always | On `format()` / `print()` |
| Lossless | Yes | Yes (exact byte representation) |