Skip to contents

Basics

Note: you should have read vignette("altrepr") in order to understand this one

Another built-in ALTREP class in R is the “Deferred String”, a class that is instantiated when you convert a double to a character vector1:

library(altrepr)
x <- as.character(1)
is_altrep(x)
#> [1] TRUE
alt_classname(x)
#> [1] "deferred_string"

Or when you convert an integer vector to a character vector:

1L |> as.character() |> alt_classname()
#> [1] "deferred_string"

Interestingly the same is not the case for logical conversions:

TRUE |> as.character() |> is_altrep()
#> [1] FALSE

Memory Saving

The purpose of doing this deferred conversion is probably to save memory, since in most cases it’s smaller to store the original numeric vector as numeric, and only convert entries to character on demand.

x <- as.character(1:10)
lobstr::obj_size(x)
#> 1.32 kB

As with compact sequences, we can convert the ALTREP to standard rep using []:

y <- x[]
is_altrep(y)
#> [1] FALSE
lobstr::obj_size(y)
#> 736 B

At 10 entries, the deferred ALTREP representation is actually a bit larger than the standard representation, but anything much larger is vastly more efficient:

x <- as.character(1:1000)
lobstr::obj_size(x)
#> 1.32 kB
x[] |> lobstr::obj_size()
#> 64.05 kB

Data

The first data slot for a deferred string is basically just a copy of the original numeric vector the deferring string was initialised using2. Unfortunately for some reason (a bug?), we can’t actually interrogate this vector directly, or it will crash the R interpreter.

At best we can inspect the deferred string:

x <- as.character(1:5)
alt_inspect(x)
#> @55fc44312200 16 STRSXP g0c0 [REF(3)]   <deferred string conversion>
#>   @55fc443122a8 13 INTSXP g0c0 [REF(65535)]  1 : 5 (compact)

The top line is showing that x itself is a deferred string, and the second line indicates that it contains an integer vector (INTSXP) as explained above.

data2 is a bit easier to interrogate. It’s simply a character vector that caches the “true” result of the conversion as a character vector.

alt_data2(x)
#> NULL

According to the C code, each entry of data2 is set to NULL initially, and each is “expanded” to their final value on-demand, as they are needed3. However this is difficult to demonstrate because accessing single elements doesn’t seem to actually expand them as we might expect.

However, it’s easy to see how changing the character vector in data2 affects the calculated elements of the array:

set_alt_data2(x, LETTERS[1:5])
x
#> [1] "A" "B" "C" "D" "E"

Compact and Expanded

Like compact sequences, deferred strings are considered to be either compact or expanded. We can expand the string and check its state using corresponding utility methods:

x <- as.character(1:5)
deferred_is_expanded(x)
#> [1] FALSE

However, unlike compact sequences, the amount of data in data2 isn’t the deciding factor in a sequence being expanded. It’s actually the absence of data in data1 that does it.

x <- as.character(1:5)
set_alt_data1(x, NULL)
alt_inspect(x)
#> @55fc427853d8 16 STRSXP g0c0 [REF(4)]   <expanded string conversion>
#>   @55fc3e8dde80 00 NILSXP g1c0 [MARK,REF(65535)]