-2

I have a tchar* with the string The system time has changed to ‎2018‎ - ‎09‎ - ‎06T15:13 : 52.257364700Z from ‎2018‎ - ‎09‎ - ‎06T15 : 13 : 52.257364700Z.

When I put that string here I see characters around my date values and when I print it using wPrintf I get question marks in those locations.

Is there a way to iterate through the tchar* and remove non-ASCII characters?

int main() {
    const TCHAR *pText = _T("The system time has changed to ‎2018‎ - ‎09‎ - ‎06T15:13 : 52.257364700Z from ‎2018‎ - ‎09‎ - ‎06T15 : 13 : 52.257364700Z.");
    TCHAR* temp;
    temp = removet((TCHAR*)pText, _tcslen(pText)); 

    wprintf(_T("%s"), temp);
}

TCHAR* removet(TCHAR* text, int len) {
    int offset = 0;
    for (int i = 0; text[i] != 0; ++i) {

        if (text[i] > 127) {
            offset++;
        }
        if (!((i + offset) > len)) {
            wprintf(_T("%d"), i +offset);
            text[i] = text[i + offset];
        }
   }
   return text;
}

Corrected code:

int main() {
    const TCHAR *pText = _T("The system time has changed to ‎2018‎ - ‎09‎ - ‎06T15:13 : 52.257364700Z from ‎2018‎ - ‎09‎ - ‎06T15 : 13 : 52.257364700Z.");
    TCHAR* temp;
    temp = removet((TCHAR*)pText, _tcslen(pText)); 

    wprintf(_T("%s"), temp);
}

TCHAR* removet(TCHAR* text, int len) {
    int offset = 0; 
    TCHAR* str2 = new TCHAR[len+1];
    _tcscpy_s(str2, len+1, text);
    for (int i = 0; str2[i] != 0; ++i) {

        if (str2[i+offset] > 127) {
            offset++;
        }
        if (!((i + offset) >= len)) {
           str2[i] = str2[i + offset];
        }
    }
    return str2;
}
  • Remember that pure ASCII is a *seven* bit character set. If the string is encoded in UTF-8 (which is backwards compatible with ASCII) then the "special" characters should have their high (eight) bit set. – Some programmer dude Sep 06 '18 at 16:05
  • @Someprogrammerdude the reference to `tchar` and `wPrintf` suggests that it's UTF-16 which would make this problem trivial. Simply remove every character that's outside the range 1-127. – Mark Ransom Sep 06 '18 at 16:07
  • When I tried to iterate through the tchar* I got exceptions though how can I iterate through the tchar? @MarkRansom – Thomas A. Bosler Sep 06 '18 at 16:21
  • You don't show any code so it's impossible to see what you're doing wrong. You know that you're supposed to stop iterating when you find a character with a zero value, right? – Mark Ransom Sep 06 '18 at 16:24
  • @MarkRansom I added code. get an exception on text[i] = text[i + offset]; – Thomas A. Bosler Sep 06 '18 at 16:49
  • 1
    Your problem is that when `i` increases to the maximum, `i+offset` will be out of bounds. I see you check for that but you need to use `>=` instead of `>`. – Mark Ransom Sep 06 '18 at 16:54
  • Unrelated: `TCHAR` and friends was a portability hack to make it easier to migrate from the DOS-based early windows version to the NT-based Windows versions used today. Unless you have to support the likes of Windows 98, try to avoid `TCHAR` and use wide characters exclusively. – user4581301 Sep 06 '18 at 17:01
  • @MarkRansom more importantly, the code is trying to modify **read-only** data, since it is passing a pointer to a *string literal* to the `removet()` function. In order for the code to work, the data needs to be copied first, and then the copy can be modified – Remy Lebeau Sep 06 '18 at 17:07
  • @RemyLebeau that's the second time today you've caught me up, I must be asleep. Thanks. – Mark Ransom Sep 06 '18 at 17:31
  • "modify read-only data". Amp up the assistance that the complier gives through warnings. [`/Wall`](https://msdn.microsoft.com/en-us/library/thxezb7y.aspx) then suppress individual ones as needed. – Tom Blodget Sep 06 '18 at 22:33
  • Got it to work Thank you guys for helping! – Thomas A. Bosler Sep 07 '18 at 13:15

1 Answers1

0

If you were using std::string rather than raw character arrays this would be easier but you can still use some c++ features:

#include <iostream>
#include <string>
#include <cstring>
#include <algorithm>

int main()
{
    tchar* test = new tchar[100];
    _tcscpy(test, _T("test string 1235"));
    tchar* end = std::remove_if(test, test + _tcslen(test), [](tchar ch){ return ch >= 127;} );
    *end = '\0';
    std::cout << test << "\n";
}

And using std::basic_string:

#include <iostream>
#include <string>
#include <algorithm>

int main()
{
    std::basic_string<tchar> test = _T("test string 1235");
    auto end = std::remove_if(test.begin(), test.end(), [](tchar ch){ return ch >= 127;} );
    test.erase(end, test.end());
    std::cout << test << "\n";
}
Alan Birtles
  • 32,622
  • 4
  • 31
  • 60