Bitwise OR on strings for large strings in c#

Question

I have two strings(with 1's and 0's) of equal lengths(<=500) and would like to apply Logical OR on these strings.

How should i approach on this. I'm working with c#.

When i consider the obvious solution, reading each char and applying OR | on them, I have to deal with apx, 250000 strings each with 500 length. this would kill my performance.

Performance is my main concern.

Thanks in advance!

What have you tried? What's wrong with the most obvious solution (read each character, OR it, and write it to a new string)? — Rotem, Mar 06 '16 at 12:02
How do you end up with strings in first place? If you're seeking performance, it would be good to consider whether you can use another data structure — Kevin Gosse, Mar 06 '16 at 12:25
@Rotem, Yes, i did the most obvious thing but my concern was performance. — BetterLateThanNever, Mar 06 '16 at 12:29
@Reddy For future reference, that is the kind of info you need to put in your questions. It would save currently 3 people from wasting time on giving you an answer you already have. — Rotem, Mar 06 '16 at 12:30
@Reddy Also, if the question is about performance and you're reading ~250MB from file, that code may also be relevant. It's possible the bottleneck is not where you assume it is. — Rotem, Mar 06 '16 at 12:41
It might be faster to read and manipulate the files as binary. Which character set and encoding do they use? UTF-8? Do they have characters other than 0 and 1? CR LR perhaps? How do you read them now? Where does your result go? A similar file? — Tom Blodget, Mar 06 '16 at 16:50
@TomBlodget Yes, UTF-8 is used. Only 0 and 1 are expected. read from a text file and write to a text file. — BetterLateThanNever, Mar 06 '16 at 19:48
You have, for example, 250000 files with 500 bytes each. In an uncontrolled test, it takes over 1/2 hour just to read such files from one directory! — Tom Blodget, Mar 09 '16 at 00:13

NoName · Answer 1 · 2016-03-06T12:54:28.540

4

This is fastest way:

string x="";
string y="";
StringBuilder sb = new StringBuilder(x.Length);
for (int i = 0; i < x.Length;i++ )
{
    sb.Append(x[i] == '1' || y[i] == '1' ? '1' : '0');
}
string result = sb.ToString();

edited Mar 06 '16 at 12:54

answered Mar 06 '16 at 12:06

NoName

7,940
13
56
108

Thank you for the code snippet. Can i use "|" logical OR by any chance? – BetterLateThanNever Mar 06 '16 at 12:12
@Reddy logical or || is defined for booleans not for characters – Giorgi Moniava Mar 06 '16 at 12:14
@Reddy `|` use only for bolean and bitwise number, not for character and string. – NoName Mar 06 '16 at 12:16
@Reddy then you can convert to number or bool first then apply bitwise, but it is much slower than above code. – NoName Mar 06 '16 at 12:18
I’m confused. The logic in this answer does implement a logical OR on characters. You cannot use the `|` operator here, but assuming that those characters are either `0` or `1`, the logic reflects it properly. – poke Mar 06 '16 at 12:34
@Reddy if you have strings and you cant change that i believe this is the fastest way possible – M.kazem Akhgary Mar 06 '16 at 12:38
1

@M.kazemAkhgary Aside from this answer + parallel processing of the strings. – Rotem Mar 06 '16 at 12:40
@M.kazemAkhgary I just change from `&&` to `||`, make it even faster. – NoName Mar 06 '16 at 12:55
@Rotem I just change from `&&` to `||`, make it even faster. – NoName Mar 06 '16 at 12:56
1

@Sakura That depends if the input contains more `0` or more `1`. If both strings are all zeroes, using `||` would be slower. – Rotem Mar 06 '16 at 13:05
@Sakura Also, from my tests, using a `char[]` is faster than using a `StringBuilder` in this case. – Rotem Mar 06 '16 at 13:12
@Rotem that info suprise me! Can you give a good link about that? – NoName Mar 06 '16 at 13:32
@Sakura https://blogs.msdn.microsoft.com/cisg/2008/09/09/performance-analysis-reveals-char-array-is-better-than-stringbuilder/ – Rotem Mar 06 '16 at 13:36
@Rotem sorry, I mean the info: `That depends if the input contains more 0 or more 1. If both strings are all zeroes, using || would be slower` – NoName Mar 06 '16 at 13:38
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/105484/discussion-between-rotem-and-sakura). – Rotem Mar 06 '16 at 13:40

Gediminas Masaitis · Answer 2 · 2016-03-06T13:21:25.297

Since it was mentioned that speed is a big factor, it would be best to use bit-wise operations.

Take a look at an ASCII table:

The character '0' is 0x30, or 00110000 in binary.
The character '1' is 0x31, or 00110001 in binary.

Only the last bit of the character is different. As such - we can safely say that performing a bitwise OR on the characters themselves will produce the correct character.

Another important thing we can do is do to optimize speed is to use a StringBuilder, initialized to the initial capacity of our string. Or even better: we can reuse our StringBuilder for multiple operations, although we have to ensure the StringBuilder has enough capacity.

With those optimizations considered, we can make this method:

string BinaryStringBitwiseOR(string a, string b, StringBuilder stringBuilder = null)
{
    if (a.Length != b.Length)
    {
        throw new ArgumentException("The length of given string parameters didn't match");
    }

    if (stringBuilder == null)
    {
        stringBuilder = new StringBuilder(a.Length);
    }
    else
    {
        stringBuilder.Clear().EnsureCapacity(a.Length);
    }

    for (int i = 0; i < a.Length; i++)
    {
        stringBuilder.Append((char)(a[i] | b[i]));
    }
    return stringBuilder.ToString();
}

Note that this will work for all bit-wise operations you would like to perform on your strings, you only have to modify the | operator.

A `string` is a counted sequence of Unicode/UTF-16 code units (`char`) so you'd better look at a [Unicode table](http://www.unicode.org/charts/nameslist/index.html) and understand the UTF-16 encoding instead of looking at an ASCII table. In this case, the Unicode codepoints "0" and "1" each have one UTF-16 code unit, [U+0030](http://www.fileformat.info/info/unicode/char/0030/index.htm) -> 0x0030 and [U+0031](http://www.fileformat.info/info/unicode/char/0031/index.htm) -> 0x0031. — Tom Blodget, Mar 06 '16 at 16:46

Giorgi Moniava · Answer 3 · 2016-03-06T13:28:21.353

2

I have two strings(with 1's and 0's) of equal lengths(<=500) and would like to apply Logical OR on these strings.

You can write a custom logical OR operator or function which takes two characters as input and produces result (e.g. if at least one of input character is '1' return '1' - otherwise return '0'). Apply this function to each character in your strings.

You can also look at this approach. You'd first need to convert each character to boolean (e.g. '1' corresponds to true), perform OR operation between two boolean values, convert back result to character '0' or '1' - depending if result of logical OR was false or true respectively. Then just append each result of this operation to each other.

edited Mar 06 '16 at 13:28

answered Mar 06 '16 at 12:02

Giorgi Moniava

27,046
9
53
90

I have to deal with apx, 250000 strings each with 500 length. this would kill my performance. – BetterLateThanNever Mar 06 '16 at 12:05
3

@Reddy You should edit your original question and include that information - It's very important to know that you need performance and you're going to run this 250,000 times instead of just one. – Gediminas Masaitis Mar 06 '16 at 12:31
1

@Reddy can you change your data structure? if you can and want performance dont use string. use `int` array with bit-mask algorithm or use `BitArray` – M.kazem Akhgary Mar 06 '16 at 12:33

score 2 · Answer 4 · answered Mar 06 '16 at 13:19

2

I've found this to be faster than all proposed solutions. It combines elements from @Gediminas and @Sakura's answers, but uses a pre-initialized char[] rather than a StringBuilder.

While StringBuilder is efficient at memory management, each Append operation requires some bookkeeping of the marker, and performs more actions than only an index into an array.

string x = ...
string y = ...
char[] c = new char[x.Length];
for (int i = 0; i < x.Length; i++)
{
    c[i] = (char)(x[i] | y[i]);
}
string result = new string(c);

answered Mar 06 '16 at 13:19

Rotem

21,452
6
62
109

Yes, this is the best. What about unsafe code? can it be faster? – NoName Mar 06 '16 at 13:23
From my test, unsafe code can be faster if we don't need to convert back to `string` in the end. Otherwise it is slower, with the unamanged -> managed conversion to `string` taking up half the time. – Rotem Mar 06 '16 at 13:40
From one of asker's comment, he read string from file. For specific case, he may read directly to byte array and perfom bitwise on them. – NoName Mar 06 '16 at 14:09

m1o2 · Answer 5 · 2016-03-06T12:27:55.067

1

You can use a Linq query to zip and then aggregate the results:

var a = "110010";    
var b = "001110";  
var result = a.Zip(b, (i, j) => i == '1' || j == '1' ? '1' : '0')
              .Select(i => i + "").Aggregate((i, j) => i + j);

Basically, the Zip extension method, takes two sequences and apply an action on each corresponding elements of the two sequences. Then I use Select to cast from char to String and finally I aggregate the results from a sequence of strings (of "0" and "1") to a String.

edited Mar 06 '16 at 12:27

answered Mar 06 '16 at 12:20

m1o2

1,549
2
17
27

Bitwise need speed. This way is slow. – NoName Mar 06 '16 at 12:21
@Sakura Do you need bitwise operations on the bytes level of the chars? Well, I'm not sure it will be straight forward since you have to deal with Unicode/Ascii/UTF-8, UTF-16, etc... Are you using a specific formatting/culture? – m1o2 Mar 06 '16 at 12:25
Do we have some kind of bit wise operation on string ?? – BetterLateThanNever Mar 06 '16 at 12:33
@Reddy well, we can't perform bit wise operations on a whole string, but we can on each character of a string. Each character is represented by one or more bytes (depending on the encoding - ASCII/Unicode and then whether it is UTF-8, UTF-16, etc...). So in theory we could perform bit wise operations on the character's bytes. But then we should know the exact encoding so we could mask only the relevant bits that represents the character (the others are for the encoding). – m1o2 Mar 06 '16 at 22:00

Bitwise OR on strings for large strings in c#

5 Answers5