4

In this instance I have a cell array of lat/long coordinates that I am reading from file as strings with format:

x = {'27° 57'' 21.4" N', '7° 34'' 11.1" W'}

where the ° is actually a degree symbol (U+00B0).

I want to use strsplit() or some equivalent to get out the numerical components, but I don't know how to specify the degree symbol as a delimiter.

I'm hesitant to simply split at the ',' and index out the number, since as demonstrated above I don't know how many digits to expect.

I found elsewhere on the site the following suggestion:

x = regexp(split{1}, '\D+', 'split')

however this also separates the integer and decimal components of the decimal numbers.

Is there a strsplit() option, or some other equivalent I could use?

4 Answers4

4

You can copy-paste the degree symbol from your data file to your M-file script. MATLAB fully supports Unicode characters in its strings. For example:

strsplit(str, {'°','"',''''})

to split the string at the three symbols.

Alternatively, you could use sscanf (or fscanf if reading directly from file) to parse the string:

str = '27° 57'' 21.4"';
dot( sscanf(str, '%f° %f'' %f"'), [1, 1/60, 1/3600] );
Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • 1
    Minor technical niggle: Matlab doesn't fully support Unicode characters, only characters in the Basic Multilingual Plane that can be represented with a single UCS-2 code point. Characters like emoji that require UTF-16 surrogate pairs are not supported. (I'm not saying this to be pedantic; I actually want to use emoji sometimes and run in to this as a real use case.) – Andrew Janke Jan 28 '20 at 03:48
  • @AndrewJanke: I was not aware of that, thanks for the clarification! – Cris Luengo Jan 28 '20 at 09:46
4

The easiest solution is to copy-paste any Unicode character into your MATLAB editor as Cris suggested by Cris.

You can get these readily from the internet, or from the Windows Character Map

You can also use unicode2native and native2unicode if you want to use byte values for your native Unicode settings.

% Get the Unicode value for '°'
>> unicode2native('°')
ans = uint8(176)

% Check the symbol for a given Unicode value
>> native2unicode(176)
ans = '°'

So

>> strsplit( 'Water freezes at 0°C', native2unicode(176) )
ans =
  1×2 cell array
  {'Water freezes at 0'}    {'C'}

You can get the Unicode value by using hex2dec on the Hex value which you already knew, if you want to avoid unicode2native:

hex2dec('00B0') % = 176
Wolfie
  • 27,562
  • 7
  • 28
  • 55
  • 2
    Nice! I didn't know about that one. – Cris Luengo Jan 27 '20 at 16:07
  • Cheers Wolfie, upvoted for the 'native2unicode' addition. I've marked Cris' answer as the the official answer as he was first to respond to the original question, but if I could I would mark this also. – TheSuperLemming Jan 27 '20 at 16:55
  • 1
    Usage note: saying `hex2dec('00B0')` might be misleading: if those first two digits are anything but zeros, it's going to give the wrong answer (because inputs to `native2unicode` must be in the range 0-255; they're treated as bytes). `char(hex2dec('xxxx'))` will work on a broader range of values, because it does a Unicode code point value conversion instead of a UTF-8 encoding conversion. – Andrew Janke Jan 28 '20 at 04:48
1

You can also improve your regular expression in order to catch the decimal part:

x = {'27° 57'' 21.4" N', '7° 34'' 11.1" W'}
x = regexp(x, '\d+\.?\d?', 'match') 
x{:}

Result:

ans =
{
  [1,1] = 27
  [1,2] = 57
  [1,3] = 21.4
}

ans =
{
  [1,1] = 7
  [1,2] = 34
  [1,3] = 11.1
}

Where \d+\.?\d? means:

\d+  : one or more digit
%followed by
\.?  : zero or one point
%followed by
\d?  : zero or one digit
obchardon
  • 10,614
  • 1
  • 17
  • 33
1

Consider using split and double with string:

>> x = {'27° 57'' 21.4" N'; '7° 34'' 11.1" W'};    
>> x = string(x)

x = 

  2×1 string array

    "27° 57' 21.4" N"
    "7° 34' 11.1" W"

>> x = split(x,["° " "' " '" '])

x = 

  2×4 string array

    "27"    "57"    "21.4"    "N"
    "7"     "34"    "11.1"    "W"

>> double(x(:,1:3))

ans =

   27.0000   57.0000   21.4000
    7.0000   34.0000   11.1000
matlabbit
  • 696
  • 3
  • 4