12

Now I am using such a method:

let x_rev = new string(x.Reverse().ToArray())
moinudin
  • 134,091
  • 45
  • 190
  • 216
AndrewShmig
  • 4,843
  • 6
  • 39
  • 68

5 Answers5

12

Here's some code based on Timwi's comment on Nate's answer. There are single logical letters (as displayed on screen) that are made up of more than one actual character. Reversing the order of the characters turns these letters into gibberish.

Timwi helpfully points out that the framework provides a TextElementEnumerator that works in terms of logical text elements rather than characters, and handles these multi-character letters correctly. I hadn't heard of this class before, so I wrote some code that uses a TextElementEnumerator to reverse a string correctly, and compare the results to a naive string reversal.

open System
open System.Globalization

// five characters that render as three letters: "àÆ"
let myString = "\uD800\uDC00\u0061\u0300\u00C6"

// naive reversal scrambles the string: "Æ̀a��"
// (note that while this produces the wrong results, 
//  it probably is faster on large strings than using
//  LINQ extension methods, which are also wrong)
let naive = String(myString.ToCharArray() |> Array.rev)

// use a TextElementEnumerator to create a seq<string> 
// that handles multi-character text elements properly
let elements = 
    seq {
        let tee = StringInfo.GetTextElementEnumerator(myString)
        while tee.MoveNext() do 
            yield tee.GetTextElement() 
    }

// properly reversed: "Æà"
let reversed = elements |> Array.ofSeq |> Array.rev |> String.concat ""
Joel Mueller
  • 28,324
  • 9
  • 63
  • 88
  • thanks for the second method... its the most beautiful and simple way I think: let naive = String(myString.ToCharArray() |> Array.rev) – AndrewShmig Dec 29 '10 at 19:53
  • 1
    check out my latest solution, I found another way of avoiding the `TextElementEnumerator` which this time doesn't result in a performance hit. – Stephen Swensen Jan 01 '11 at 21:37
6

My answer is based on @Joel's answer which was in turn based on @Timwi's answer. I present it as the most beautiful and simple correct answer, though certainly not the best performing (the fold using + kills that; but the main beautifying improvement is using ParseCombiningCharacters and GetNextTextElement instead of that sanity testing TextElementEnumerator. And adding Reverse as an extension of String is nice too):

open System
open System.Globalization

type String with
    member self.Reverse() = 
        StringInfo.ParseCombiningCharacters(self)
        |> Seq.map (fun i -> StringInfo.GetNextTextElement(self, i))
        |> Seq.fold (fun acc s -> s + acc ) ""

Usage:

> "\uD800\uDC00\u0061\u0300\u00C6".Reverse();;
val it : string = "Æà"

Edit:

I dreamed up this novel variation as well on the car-ride home, which likely performs better since we use String.concat. The type extension is omitted:

let rev str =
    StringInfo.ParseCombiningCharacters(str) 
    |> Array.rev
    |> Seq.map (fun i -> StringInfo.GetNextTextElement(str, i))
    |> String.concat ""

Edit (best solution so far):

This solution uses yet another StringInfo method for enumerating text elements which again avoids using the unpleasant to work with TextElementEnumerator but doesn't result in twice as many calls to the internal StringInfo.GetCurrentTextElementLen like the previous solution. I also use in-place array reversal this time which results in a notable performance improvement.

let rev str =
    let si = StringInfo(str)
    let teArr = Array.init si.LengthInTextElements (fun i -> si.SubstringByTextElements(i,1))
    Array.Reverse(teArr) //in-place reversal better performance than Array.rev
    String.Join("", teArr)

The above solution is basically equivalent to the following (which I worked up in hopes that we could squeak out a bit more performance, but I can measure no significant difference):

let rev str =
    let ccIndices = StringInfo.ParseCombiningCharacters(str)
    let teArr = 
        Array.init 
            ccIndices.Length 
            (fun i -> 
                if i = ccIndices.Length-1 then
                    str.Substring(i)
                else
                    let startIndex = ccIndices.[i]
                    str.Substring(startIndex, ccIndices.[i+1] - startIndex))
    Array.Reverse(teArr) //in-place reversal better performance than Array.rev
    String.Join("", teArr)
Stephen Swensen
  • 22,107
  • 9
  • 81
  • 136
  • Your second version is very clever! My cursory tests show no significant performance difference between that version and one that uses `TextElementEnumerator` with a sequence expression. I think I prefer your approach. – Joel Mueller Dec 30 '10 at 01:10
  • I took a look with reflector, and I believe that your approach ends up calling the internal method `StringInfo.GetCurrentTextElementLen` twice as many times as `TextElementEnumerator` does. If performance is critical and strings are large, some variation using the enumerator will probably do best. – Joel Mueller Dec 30 '10 at 01:19
  • Hi @Joel, thanks for reflecting that, I was curious how repeated calls to `GetNextTextElement` would compare to using the `TextElementEnumerator`. It's just as well for `TextElementEnumerator` to perform better, creating a `String` extension member, perhaps a getter property called `TextElements`, which returns a sequence like your `elements` implementation would be easiest to work with in the long run. – Stephen Swensen Dec 30 '10 at 01:32
  • That's at least quadratic time when it could be linear though, right? – J D Dec 31 '10 at 16:14
  • @Jon, I don't think so: the internal calls to `StringInfo.GetCurrentTextElementLen` appear to be O(1) therefore both @Joel's and my optimal solutions are linear. I performed some benchmarks with Unicode strings of (char) length 4, 8, 12, and 16 million which show linear time. – Stephen Swensen Dec 31 '10 at 17:30
  • I mean the `+` in the `fold` in the first version. For 10k, 20k, 40k and 80k strings I get 0.46s, 1.45s, 4.22s and 146s. – J D Dec 31 '10 at 22:39
  • @Jon, ah yeah, definitely. That was just a toy solution, I would never use it in production :). I was thinking of using a `StringBuilder` in the fold with Inserts at 0, but I'm not sure that would be much better since the internal array would need to be rebuilt each time. Alas, some variation using an in-place array reversal would be best here. – Stephen Swensen Jan 01 '11 at 01:25
5

I can't believe that nobody here is providing a generic solution to this!

Generic reverse with O(n) runtime.

Then, just use:

 let rec revAcc xs acc =
    match xs with
    | [] -> acc
    | h::t -> revAcc t (h::acc)

 let rev xs =
    match xs with
    | [] -> xs
    | [_] -> xs
    | h1::h2::t -> revAcc t [h2;h1] 

 let newValues = 
    values
    |> Seq.toList 
    |> rev
    |> List.toSeq

 newValues

That's what F# is all about!

Community
  • 1
  • 1
devinbost
  • 4,658
  • 2
  • 44
  • 57
2

If what you are doing is from MSDN on Enumerable.Reverse() then you've probably got the most simple solution.

If you're not using .NET 3.5 (read LINQ (not sure if F# was around before then anyway)) you could use the Array.Reverse() method, however, the resulting code is very much the same.

Suffice it to say, what you have is the most elegant way I can come up with to reverse a string, I have used Enumerable.Reverse() many times to reverse order of strings in my projects. Obviously, if the String constructor took an IEnumerable<Char> we could skip the .ToArray() bit which would in my opinion make the code a bit better, but as it stands, an extra .ToArray() isn't all that bad.

If you really wanted, you could write an extension method in C# and add a reference to that library in your F# project, that C# extension method would look something like this:

public static String ReverseString(this String toReverse)
{
    return new String(toReverse.Reverse().ToArray());
}

That adds an extra dependency who's only real benefit is making your F# code a bit more simple, if you're reversing strings all over the place, it might be worth it, otherwise, I'd just wrap up what you've got in a normal F# method and use it that way.

Although, someone much smarter than I may have a more beautiful way to do it.

Nate
  • 30,286
  • 23
  • 113
  • 184
  • 3
    Observe that this is actually subtly wrong: strings in .NET are encoded in UTF-16, meaning that if you have any escape sequences (known as surrogate pairs) your reversed string will be incorrect. – Roman Starkov Dec 29 '10 at 17:28
  • @romkyns Thats true, though for typical use cases, this is good enough, provided you know that it wont work when you encounter surrogate pairs, or characters that cannot be represented using a single byte/char. – Nate Dec 29 '10 at 17:33
  • 2
    That depends, yeah. It's because ASCII was deemed "good enough" that lots of people outside the US still suffer poor language support. With UTF-16 it's worse because this time the majority of the 1st world is just fine, and programs break only in obscure countries. So yeah, it's good enough for an internal app, but doesn't cut it for something like an open-source media player. – Roman Starkov Dec 29 '10 at 17:35
  • I wouldn't worry about surrogate pairs, until somebody files a bug report, but I digress. Point is that my answer works exactly like the OPs code, so I made the assumption they are also not worried about surrogate pairs. – Nate Dec 29 '10 at 17:37
  • Fair enough, I appreciate when answerers don't question too much when I ask how to do something weird :) – Roman Starkov Dec 29 '10 at 17:40
  • 3
    @Nate, @romkyns: Actually, surrogate pairs are not your only problem. A much more common case that this would break is combining character sequences. Fortunately, there is [TextElementEnumerator](http://msdn.microsoft.com/en-us/library/system.globalization.textelementenumerator.aspx) to help you out, which takes care of that *and* surrogate pairs. Please use it! – Timwi Dec 29 '10 at 17:46
  • @Timwi makes a very good point, but if you're going to use this approach, it's much faster to use `myString.ToCharArray() |> Array.rev`. The ToCharArray() method simply returns the internal character array inside the string object, so it's very fast, and reversing an array is faster than reversing a sequence, as the sequence must first be turned into an array, and that's a waste of time when we can start with an array instead. – Joel Mueller Dec 29 '10 at 18:28
  • 2
    @romkyns: Comply or reinvent. Most developers will just reverse the characters. If unicode breaks then the fault lies with unicode for all practical purposes. You cannot expect most developers to learn about unicode's obscure-language-specific quirks. Think about the ROI for them... – J D Dec 31 '10 at 16:24
  • @Jon Perfect abstractions are great, but come on! They are a dream that has never existed! Is it also NTFS junctions' fault that a naive recursive file search might never end?... – Roman Starkov Dec 31 '10 at 17:17
  • 1
    @romkyns: I share your ideals but you're fighting a losing battle trying to get ordinary devs to support extra-ordinary features for no significant recompense. I think it is (literally) more likely that another billion people will learn English before most devs start handling multi-character Unicode glyphs correctly. – J D Dec 31 '10 at 23:05
  • @Jon, @romkyns: just a thought on your recent comments: most devs *shouldn't have to* think about multi-character Unicode glyphs. Many developers may need to reverse a string, only one has to think about how to do it correctly and efficiently. Designing and implementing libraries and frameworks is a different kind of development than building business applications, for example. – Stephen Swensen Jan 01 '11 at 19:16
  • @Stephen: Absolutely. The representation must make obscure language features as compliant with the mainstream as possible if they are going to be handled correctly by most real code. – J D Jan 01 '11 at 20:21
1

Combining the best of the previous answers, with a little update:

module String =
  open System.Globalization
  let rev s =
    seq {
      let rator = StringInfo.GetTextElementEnumerator(s)
      while rator.MoveNext() do
        yield rator.GetTextElement()
    }
    |> Array.ofSeq
    |> Array.rev
    |> String.concat ""

String.rev "\uD800\uDC00\u0061\u0300\u00C6"
Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138