How to decoding IFC using Ruby

Question

In Ruby, I'm reading an .ifc file to get some information, but I can't decode it. For example, the file content:

"'S\X2\00E9\X0\jour/Cuisine'"

should be:

"'Séjour/Cuisine'"

I'm trying to encode it with:

puts ifcFileLine.encode("Windows-1252")
puts ifcFileLine.encode("ISO-8859-1")
puts ifcFileLine.encode("ISO-8859-5")
puts ifcFileLine.encode("iso-8859-1").force_encoding("utf-8")'

But nothing gives me what I need.

ifc encoding is describe here: http://www.buildingsmart-tech.org/implementation/get-started/string-encoding — Denis Bolomier, Apr 14 '17 at 18:47
@DenisBolomier according to the docs, every 4 characters between ``\X2\`` and ``\X0\`` represent a unicode codepoint, i.e. `00E9` is [U+00E9](http://unicode.org/cldr/utility/character.jsp?a=00e9). You cannot decode this format using Ruby's built-in encoding methods because it is not a standard character encoding but a wrapper for various encodings. Maybe there's a gem. — Stefan, Apr 14 '17 at 19:26
'ifcFileLine' is from my code. Thanks for your answer Stefan !! — Denis Bolomier, Apr 14 '17 at 19:44

Jordan Running · Accepted Answer · 2018-04-12T01:24:48.370

3

I don't know anything about IFC, but based solely on the page Denis linked to and your example input, this works:

ESCAPE_SEQUENCE_EXPR = /\\X2\\(.*?)\\X0\\/

def decode_ifc(str)
  str.gsub(ESCAPE_SEQUENCE_EXPR) do
    $1.gsub(/..../) { $&.to_i(16).chr(Encoding::UTF_8) }    
  end
end

str = 'S\X2\00E9\X0\jour/Cuisine'
puts "Input:", str
puts "Output:", decode_ifc(str)

All this code does is replace every sequence of four characters (/..../) between the delimiters, which will each be a Unicode code point in hexadecimal, with the corresponding Unicode character.

Note that this code handles only this specific encoding. A quick glance at the implementation guide shows other encodings, including an \X4 directive for Unicode characters outside the Basic Multilingual Plane. This ought to get you started, though.

See it on eval.in: https://eval.in/776980

edited Apr 12 '18 at 01:24

answered Apr 15 '17 at 20:49

Jordan Running

102,619
17
182
182

@DenisBolomier Glad to help. If my answer solved your problem (and no one posts a better answer), please mark it as accepted. – Jordan Running Apr 17 '17 at 15:33
I achieve to acept your answer ! – Denis Bolomier Apr 18 '17 at 07:06
Can anyone please tell me how to implement this in javascript – Malith Prasanna Rathnasena Mar 28 '22 at 01:18

user19800562 · Answer 2 · 2022-08-19T08:59:42.053

If someone is interested, I wrote here a Python Code that decode 3 of the IFC encodings : \X, \X2\ and \S\

    import re
    
    def decodeIfc(txt):
        # In regex "\" is hard to manage in Python... I use this workaround
        txt = txt.replace('\\', 'µµµ')
        txt = re.sub('µµµX2µµµ([0-9A-F]{4,})+µµµX0µµµ', decodeIfcX2, txt)
        txt = re.sub('µµµSµµµ(.)', decodeIfcS, txt)
        txt = re.sub('µµµXµµµ([0-9A-F]{2})', decodeIfcX, txt)
        txt = txt.replace('µµµ','\\')
        return txt
    
    def decodeIfcX2(match):
        # X2 encodes characters with multiple of 4 hexadecimal numbers.
        return ''.join(list(map(lambda x : chr(int(x,16)), re.findall('([0-9A-F]{4})',match.group(1)))))
    
    def decodeIfcS(match):
        return chr(ord(match.group(1))+128)
    
    def decodeIfcX(match):
        # Sometimes, IFC files were made with old Mac... wich use MacRoman encoding.
        num = int(match.group(1), 16)
        if (num <= 127) | (num >= 160):
            return chr(num)
        else:
            return bytes.fromhex(match.group(1)).decode("macroman")

Hi, I'm new here, it was my first post on stackoverflow... Why did I get to bad votes ? Is it because the problem was solved ? Is my code not good ? Is it because it's not in Ruby ? The question was old and I had the same problem, so I thought it could help someone. — user19800562, Aug 23 '22 at 13:26
I think both, try not to add answers for the sake of adding them. And a solution in another language cannot always be converted into a solution, especially when there is already an answer. — Viktor Ivliiev, Aug 28 '22 at 09:40

How to decoding IFC using Ruby

2 Answers2