1

In Ruby, I'm reading an .ifc file to get some information, but I can't decode it. For example, the file content:

"'S\X2\00E9\X0\jour/Cuisine'"

should be:

"'Séjour/Cuisine'"

I'm trying to encode it with:

  • puts ifcFileLine.encode("Windows-1252")
  • puts ifcFileLine.encode("ISO-8859-1")
  • puts ifcFileLine.encode("ISO-8859-5")
  • puts ifcFileLine.encode("iso-8859-1").force_encoding("utf-8")'

But nothing gives me what I need.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Denis Bolomier
  • 350
  • 2
  • 19
  • 1
    ifc encoding is describe here: http://www.buildingsmart-tech.org/implementation/get-started/string-encoding – Denis Bolomier Apr 14 '17 at 18:47
  • Where are you getting `ifcFileLine` from? Rails? – Sagar Pandya Apr 14 '17 at 18:54
  • The OP is not using Rails. – the Tin Man Apr 14 '17 at 19:21
  • 1
    @DenisBolomier according to the docs, every 4 characters between ``\X2\`` and ``\X0\`` represent a unicode codepoint, i.e. `00E9` is [U+00E9](http://unicode.org/cldr/utility/character.jsp?a=00e9). You cannot decode this format using Ruby's built-in encoding methods because it is not a standard character encoding but a wrapper for various encodings. Maybe there's a gem. – Stefan Apr 14 '17 at 19:26
  • 'ifcFileLine' is from my code. Thanks for your answer Stefan !! – Denis Bolomier Apr 14 '17 at 19:44

2 Answers2

3

I don't know anything about IFC, but based solely on the page Denis linked to and your example input, this works:

ESCAPE_SEQUENCE_EXPR = /\\X2\\(.*?)\\X0\\/

def decode_ifc(str)
  str.gsub(ESCAPE_SEQUENCE_EXPR) do
    $1.gsub(/..../) { $&.to_i(16).chr(Encoding::UTF_8) }    
  end
end

str = 'S\X2\00E9\X0\jour/Cuisine'
puts "Input:", str
puts "Output:", decode_ifc(str)

All this code does is replace every sequence of four characters (/..../) between the delimiters, which will each be a Unicode code point in hexadecimal, with the corresponding Unicode character.

Note that this code handles only this specific encoding. A quick glance at the implementation guide shows other encodings, including an \X4 directive for Unicode characters outside the Basic Multilingual Plane. This ought to get you started, though.

See it on eval.in: https://eval.in/776980

Jordan Running
  • 102,619
  • 17
  • 182
  • 182
-2

If someone is interested, I wrote here a Python Code that decode 3 of the IFC encodings : \X, \X2\ and \S\

    import re
    
    def decodeIfc(txt):
        # In regex "\" is hard to manage in Python... I use this workaround
        txt = txt.replace('\\', 'µµµ')
        txt = re.sub('µµµX2µµµ([0-9A-F]{4,})+µµµX0µµµ', decodeIfcX2, txt)
        txt = re.sub('µµµSµµµ(.)', decodeIfcS, txt)
        txt = re.sub('µµµXµµµ([0-9A-F]{2})', decodeIfcX, txt)
        txt = txt.replace('µµµ','\\')
        return txt
    
    def decodeIfcX2(match):
        # X2 encodes characters with multiple of 4 hexadecimal numbers.
        return ''.join(list(map(lambda x : chr(int(x,16)), re.findall('([0-9A-F]{4})',match.group(1)))))
    
    def decodeIfcS(match):
        return chr(ord(match.group(1))+128)
    
    def decodeIfcX(match):
        # Sometimes, IFC files were made with old Mac... wich use MacRoman encoding.
        num = int(match.group(1), 16)
        if (num <= 127) | (num >= 160):
            return chr(num)
        else:
            return bytes.fromhex(match.group(1)).decode("macroman")
  • Hi, I'm new here, it was my first post on stackoverflow... Why did I get to bad votes ? Is it because the problem was solved ? Is my code not good ? Is it because it's not in Ruby ? The question was old and I had the same problem, so I thought it could help someone. – user19800562 Aug 23 '22 at 13:26
  • I think both, try not to add answers for the sake of adding them. And a solution in another language cannot always be converted into a solution, especially when there is already an answer. – Viktor Ivliiev Aug 28 '22 at 09:40