1

I have multiple strings (so-called DOIs) like this:

doi1 <- "10.1057/bp.2009.9"
doi2 <- "10.1057/bp.2015.4"
doi3 <- "10.1057/bp.2008.12"

How do I best extract the common beginnings of the strings?

The correct output should be 10.1057/bp.20.

(My first guess was to use identical(), but that function can only compare two whole strings)

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
anpami
  • 760
  • 5
  • 17

1 Answers1

0

The package ‘Biobase’ has this implemented as lcPrefix.

But implementing this oneself isn’t hard; here’s another quick and dirty version (careful, this was only tested on a handful of cases):

find_longest_prefix = function (strings) {
    stopifnot(is.character(strings) && length(strings) > 0L)

    for (len in seq_len(nchar(strings[1L]))) {
        prefixes = substr(strings, 1L, len)
        if (! Reduce(\(prev, p) prev && p == prefixes[1L], prefixes[-1L], TRUE)) {
            len = len - 1L
            break
        }
    }
    substr(strings[1L], 1L, len)
}
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214