I'm trying to load pre-trained word embeddings for the Arabic language (Mazajak embeddings: http://mazajak.inf.ed.ac.uk:8000/). The embeddings file does not have a particular extension and I'm struggling to get it to load. What's the usual process to load these embeddings?
I've tried doing with open("get_sg250", encoding = encoding) as file: file.readlines()
for different encodings but it seems like none of them are the answer (utf-8 does not work at all), if I try windows-1256 I get gibberish:
e.g.
['8917028 300\n',
'</s> Hل®:0\x16ء:؟X§؛R8ڈ؛\xa0سî9K\u200fƒ::m¤9¼»“8¤p\u200c؛tعA:UU¾؛“_ع9‚Nƒ¹®G§¹قفگ؛ww$؛\u200eba:\x14.„:R¸پ:0–\x0b:–ü\x06:×#¦؛Yٍ²؛m ظ:{\x14¦:µ\x01‡:ه\x17S¹Yr¯:j\x03-¹ff€9×£P¸\n',
'W‚؛UUه9¼»é¹""§؛\u200c¶د:UU؟:\u200eb؟¹{\x14\u200d¸,ù19ïî\u200d؛ئ\x12¯؛\x00\x00ا:\u200c6°7A§a؛ذé„؛ذi†؛®G\x14:حجŒ8\x03\u200cè9ه\x17¸؛ق]¦؛ڈآ5¸قفا9حج^:\x00€ٹ؛q=²:\x00\x00¢9\x14®أ9×£T¹لz‚:\x1bèG؛®G7؛ڑ™<:m\xa0ƒ¹""´9\x14®\x1d:"¢²؛®G-؛ڑ™~:±ن¸:\x18ث«:¸\x1e…؛`,8؛Hل\u200d¹±ن.:\x1f…¥؛لْ‚:ڑ™s:R¸\x0b؛ئ’\x07؛0–C؛ڈآ¸:ذéھ:ة/خ¹A\'¸:ڑ™ز:m\xa0\x1e:è´ظ::ي‡؛\n',
'×\x05؛Œ%8؛ش\x06~؛أُu:\x00\x00\n',
":‰ˆ\x149\x14®?؛ِ(\x05:«ھ…:)\\‡833G:Haط؛\x1f…¼:¼»'9\x00\x00 ؛=\n",
'6؛R¸‚¹¼;€؛\x1bè¾؛\x1bèw؛قف؛:A§\x1a؛""j؛K~J:Hل\x14؛ىرد:\u200c6\x0c؛–|ب؛‚Nm:cةد·:mک؛‰ˆھ9\x00\x00ü9DD(¹ذi\x1f:ذé¬؛,ù™9¼»\x1e:wwƒ؛\x03\u200cF87ذ©·×£Q؛\x1f…w؛ئ\x12ح؛\x00\x00\x007ٍ‹U8\x0etZ6“ك«؛cةط؛Haد؛–ü¼؛33?¹Œ%َ9أُخ9=\n',
'‹؛ق]ع:ڈآ/؛0–ق¹¤pُ¹Dؤخ:¤p¤؛\x1bèت9\u200ebé¹ùE‹:–üb7=ٹ؛:؟Xv؛×£c؛ِ(·؛è4\xa0؛cة‹؛0\x16ˆ؛ئ’U:""#؛ة/j:R8،:أُى9ذé€:ىQX:\x1f…L:""›؛K\u200f•؛ڈآں؛‰ˆ8¸ww´:""o؛è´…؛\n',
'W·؛¤pگ:{”¶؛\x0etJ¹\u200eb>:ùإة؛`¬أ؛ِ(ü9K\u200f™:‚N؛:لz;:ِ(ٹ:Œ¥ˆ؛§\n',
'ں؛ِ¨\xad:ڑ™q؛\u200c6\x19:×£H9¤p\x1c:\x03\u200cخ¹–üٹ8UU\x13؛Hلؤ¹è´ء؛ïnژ؛®Gک:è´¯9\x0etN؛O\x1b\x0b؛\x00\x00Z:\n',
'Wڑ؛""J؛؟طخ:\x03\u200c¹:لْ¬؛\u200c6ک9ڑ™D؛\x1bèT8ق]ƒ:¼»س:0–-:~±³:,y‰:è´،¸jƒأ:m\xa0]:A\'د:j\x03\x15؛Haد:""½:wwù¹ه\x17ء؛×#س:&؟œ9×£5؛Hلz¹\\ڈ€¹)\\¨؛O\x1bْ¹ه\x17\x1b¹ڈB×؛\x03\u200c™؛ىQز¹لz¤¹ذi\x1c:\\ڈژ9ùإV¹R¸€:ùإü9ww?9‰\x08\u200d:~±ؤ¹‚Nù¹‰ˆ\x10¹UUn؛\x11\x11ƒ؛ٍ‹چ8‰ˆ½:\x1bèî¹O\x1bè¶`¬´؛=\n',
'¢:\n',
I've also tried using pickle but that also doesn't work.
Any suggestions on what I could try out?