-1

I need a printable character which is not available in the mobile SMS messages. The reason is that I have a file which has a bunch of data, and one of those data fields is SMS-text. It is dummy data ofcourse.

I need to extract this field. The tool I am using for it asks for a field-separator, on the basis of which it separates fields into a CSV file. And it uses a comma character as the default field separator.

Now the problem is that whenever a comma character occurs in SMS text, it separates the rest of the SMS text and makes it a separate field.

So my question is that how do I find a single character which I can use as a field separater in this case?

Shy
  • 542
  • 8
  • 20
  • The CSV file format has a way so you can embed field delimiters as characters within fields: quote the field. `"foo, bar",42,baz,"another,comma"`. That is the proper way to do it, don't look for characters which may not be in use. – deceze Aug 31 '18 at 14:31
  • @deceze Tried that. Then whenever a `"` occurs in the text of the SMS, it considers the rest of it as a separate field – Shy Aug 31 '18 at 14:33
  • Not if you parse the CSV properly with a proper CSV parser…!? – deceze Aug 31 '18 at 14:34
  • @deceze Really? Can you givve me an example of a proper CSV parser? Because I am wonder how any parser would know if the `"` is the enclosing `"` or a part of the SMS text somebody typed. – Shy Aug 31 '18 at 14:36
  • 1
    Of course there's a way to escape `"` inside of `"` so you can use `"` *and* `,` as part of the string: https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules. You just need to *encode* it properly and then *parse* it properly. – deceze Aug 31 '18 at 14:38
  • @deceze Hey I have no control over the input data. The SMS messages come as they are, with embedded double quotes. The input is in `pcap` format. I am using `tshark` utility to extract fields from it and export into the csv format. If I had control over input, I would sanitize it by removing problematic characters – Shy Aug 31 '18 at 14:52
  • So the input is in a format which cannot be parsed unambiguously…?! – deceze Aug 31 '18 at 15:05

2 Answers2

0

I think you can encode the text using Base64 before sending SMS, and then decode after receiving. Please see: https://en.wikipedia.org/wiki/Base64.

fixeria
  • 116
  • 4
0

You may want to have a look at the GSM charset spec. Be aware about the 7bits / 8bits encoding and the encoding of the different (human) languages.

mszmurlo
  • 1,250
  • 1
  • 13
  • 28