4

I'm encountering a very mysterious error that occurs intermittently whilst I'm working within a virtualenv.

Although it has now happened to me 3-4 times, I'm finding it frustratingly difficult to find the conditions that will reproduce the problem. It has occurred whilst executing totally different pieces of code, and the same piece of code that raised the error may have been executed without problems many times both before and after the error. It's therefore currently impractical for me to systematically narrow down which libraries are causing the problem, so I apologise in advance if my description seems vague or incomplete.

Symptoms

During an IPython session I occasionally encounter a relocation error which results in a load of unicode gibberish being dumped into my terminal. Here is some terminal output from the most recent time the problem occurred:

In [435]: figs = clustering.make_plots(d[~d.dark_reared], which='dcentre_dangle')
Sliding median, window=10.00, 10116 x-values
     --> Completed: 00:05.17                                               
[Parallel(n_jobs=-1)]: Done   1 jobs       | elapsed:    4.8s
[Parallel(n_jobs=-1)]: Done   3 out of   8 | elapsed:    4.9s remaining:    8.1s
[Parallel(n_jobs=-1)]: Done   8 out of   8 | elapsed:    5.3s finished
/home/alistair/.venvs/rfmap/bin/python/home/alistair/.venvs/rfmap/bin/python: relocation error: /home/alistair/.venvs/rfmap/bin/python: symbol Øv�l��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�é����▒└X⎽�▮���\������┴��6�⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�F��ޚ��7�┴�┌�
�⎼R�A?B�!���≤�├_←▮�┐O��°e�ǐ�┤!�=�'!F─┤�
                                       <b3±_�┴����⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�é����▒└X⎽�▮���\������┴��6�⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�F��ޚ��7�┴�┌��⎼R�A?B�!���≤�├_←▮�┐O��°e�ǐ�┤!�=�'!F─┤�
                                                                                                                                                                                           <b3±_�┴����⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�é����▒└
X⎽�▮���\������┴��6�⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�F��ޚ��7�┴�┌��⎼R�A?B�!���≤�├_←▮�┐O��°e�ǐ�┤!�=�'!F─┤�

... and it continues like this for several pages. In this case the error did not actually kill my IPython session, although my prompt was now completely messed up:

I┼ [436]: ⎼e┌⎺▒d(c┌┤⎽├e⎼☃┼±)

Presumably along with the error message some binary data containing a control character must have been dumped into the terminal, hence all the gibberish.

Now whenever I try to launch Python from within the same virtualenv, I see a similar relocation error:

(rfmap)alistair@MAGICPAVINGSLAB:~/src/python/rfmap_pipeline⟫ python
python: relocation error: python: symbol v�l��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�é����▒└X⎽�▮���\������┴��6�⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�F��ޚ��7�┴�┌��⎼R�A?B�!���≤�├_←▮�┐O��°e�ǐ�┤!�=�'!F─┤�
                                                                                                                                                                               <b3±_�┴����⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�Ø┴�┌��⎼�2≥��^Y�(߰⎻≠─�3�C�*��P≤U┼☃c⎺deUCS4_T⎼▒┼⎽┌▒├eC▒▒⎼└▒⎻← ┴e⎼⎽☃⎺┼ GLIBC_2↓2↓5 ┼⎺├ de°☃┼ed ☃┼ °☃┌e ┌☃bc↓⎽⎺↓6 ┬☃├▒ ┌☃┼┐ ├☃└e ⎼e°e⎼e┼ce
(⎼°└▒⎻)127 ▒┌☃⎽├▒☃⎼@MAGICPAVINGSLAB:·/⎽⎼c/⎻≤├▒⎺┼/⎼°└▒⎻_⎻☃⎻e┌☃┼e⟫ 

If I deactivate this virtualenv or switch to a different one, Python works correctly.

Recovery

The first few times this happened I simply deleted my virtualenv and rebuilt it from scratch. I eventually discovered through trial and error that I could also recover a broken virtualenv by re-initializing it, i.e.

~$ virtualenv $VIRTUAL_ENV

which creates a new Python binary in $VIRTUAL_ENV/bin (as well as giving me new copies of pip and setuptools).

Possible causes?

The fact that reinitializing the virtualenv fixed the problem led me to believe that the Python binary itself was somehow being corrupted. Although I see no difference in file size between a healthy and a corrupted copy, their MD5 hashes do indeed differ.

Has anyone else encountered this sort of error before? What could possibly cause the Python binary to become corrupted? The machine in question has otherwise been perfectly stable - I have no reason to suspect memory or disk errors, and I can't find anything suspicious in the system logs.

Here are some version details that might be relevant:

  • Ubuntu 15.04 (3.19.0-21-generic)
  • python 2.7.9
  • virtualenv 1.11.6
  • virtualenvwrapper 4.3.1

Update:

I tried diffing the corrupted and non-corrupted binary files and found a block of bytes between 0xD000 and 0xDFF0 which had been overwritten by a repeating pattern:

Working copy:

0000 CFC0: 65 74 67 72 6E 61 6D 00  67 65 74 67 72 67 69 64  etgrnam. getgrgid
0000 CFD0: 00 66 73 79 6E 63 00 67  65 74 68 6F 73 74 62 79  .fsync.g ethostby
0000 CFE0: 61 64 64 72 5F 72 00 5F  5F 68 5F 65 72 72 6E 6F  addr_r._ _h_errno
0000 CFF0: 5F 6C 6F 63 61 74 69 6F  6E 00 68 73 74 72 65 72  _locatio n.hstrer
0000 D000: 72 6F 72 00 67 65 74 68  6F 73 74 6E 61 6D 65 00  ror.geth ostname.
0000 D010: 67 65 74 70 77 6E 61 6D  00 73 65 74 75 69 64 00  getpwnam .setuid.
0000 D020: 75 74 69 6D 65 73 00 75  74 69 6D 65 00 73 79 73  utimes.u time.sys
0000 D030: 74 65 6D 00 73 74 72 63  6F 6C 6C 00 77 63 73 63  tem.strc oll.wcsc
0000 D040: 6F 6C 6C 00 61 73 63 74  69 6D 65 00 73 6F 63 6B  oll.asct ime.sock
0000 D050: 65 74 70 61 69 72 00 61  63 63 65 73 73 00 74 65  etpair.a ccess.te
0000 D060: 6D 70 6E 61 6D 00 74 6D  70 66 69 6C 65 36 34 00  mpnam.tm pfile64.
0000 D070: 74 6D 70 6E 61 6D 5F 72  00 66 63 68 64 69 72 00  tmpnam_r .fchdir.
0000 D080: 66 63 68 6D 6F 64 00 66  63 68 6F 77 6E 00 66 64  fchmod.f chown.fd
0000 D090: 61 74 61 73 79 6E 63 00  66 70 61 74 68 63 6F 6E  atasync. fpathcon
0000 D0A0: 66 00 66 73 74 61 74 76  66 73 36 34 00 74 63 67  f.fstatv fs64.tcg
0000 D0B0: 65 74 70 67 72 70 00 74  63 73 65 74 70 67 72 70  etpgrp.t csetpgrp
0000 D0C0: 00 74 74 79 6E 61 6D 65  00 73 65 74 65 67 69 64  .ttyname .setegid
0000 D0D0: 00 73 65 74 65 75 69 64  00 73 65 74 67 69 64 00  .seteuid .setgid.
0000 D0E0: 63 74 65 72 6D 69 64 00  67 65 74 6C 6F 61 64 61  ctermid. getloada
0000 D0F0: 76 67 00 67 65 74 67 72  6F 75 70 73 00 67 65 74  vg.getgr oups.get
0000 D100: 70 70 69 64 00 63 6F 6E  66 73 74 72 00 67 65 74  ppid.con fstr.get
0000 D110: 72 65 73 67 69 64 00 67  65 74 72 65 73 75 69 64  resgid.g etresuid
0000 D120: 00 69 6E 69 74 67 72 6F  75 70 73 00 67 65 74 70  .initgro ups.getp
0000 D130: 77 75 69 64 00 6C 63 68  6F 77 6E 00 73 65 74 72  wuid.lch own.setr
0000 D140: 65 73 67 69 64 00 73 65  74 72 65 73 75 69 64 00  esgid.se tresuid.
0000 D150: 61 6C 61 72 6D 00 73 65  74 70 77 65 6E 74 00 67  alarm.se tpwent.g
0000 D160: 65 74 70 77 65 6E 74 00  65 6E 64 70 77 65 6E 74  etpwent. endpwent
...

Broken copy:

0000 CFC0: 65 74 67 72 6E 61 6D 00  67 65 74 67 72 67 69 64  etgrnam. getgrgid
0000 CFD0: 00 66 73 79 6E 63 00 67  65 74 68 6F 73 74 62 79  .fsync.g ethostby
0000 CFE0: 61 64 64 72 5F 72 00 5F  5F 68 5F 65 72 72 6E 6F  addr_r._ _h_errno
0000 CFF0: 5F 6C 6F 63 61 74 69 6F  6E 00 68 73 74 72 65 72  _locatio n.hstrer
0000 D000: BE 4A A4 0A 2A 17 AE 18  F9 91 B8 BF 27 5A D7 C9  .J..*... ....'Z..
0000 D010: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D020: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D030: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D040: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D050: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D060: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D070: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D080: A9 F9 A9 BE 8D 68 6D 58  73 1C 16 C2 E3 95 30 B4  .....hmX s.....0.
0000 D090: 95 96 D3 E0 EA 01 5C B7  CC CE 9C 86 AA AB E9 31  ......\. .......1
0000 D0A0: 98 E5 4B E2 CB 76 84 D3  55 BC 36 85 0E 72 A2 C3  ..K..v.. U.6..r..
0000 D0B0: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D0C0: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D0D0: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D0E0: EE C9 C4 46 D9 4F A6 04  D6 CB BF DE 9A 93 B4 37  ...F.O.. .......7
0000 D0F0: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 52 FC  ..K..v.. Ul...rR.
0000 D100: CA 59 41 3F 42 94 21 9B  89 80 79 9B 74 D3 1F 5F  .YA?B.!. ..y.t.._
0000 D110: 2C 30 AE 6B 1A 4F F7 B3  14 66 65 BE C7 90 BE 75  ,0.k.O.. .fe....u
0000 D120: 21 8D 3D 1C B6 27 E0 49  15 F3 98 35 21 46 E9 AE  !.=..'.I ...5!F..
0000 D130: D6 71 75 B4 07 0B 3C 62  33 D2 DE 02 67 5F C3 D3  .qu...<b 3...g_..
0000 D140: 98 E5 4B E2 CB 76 84 D3  2A 93 BB 8D 0E 72 A2 C3  ..K..v.. *....r..
0000 D150: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
0000 D160: 98 E5 4B E2 CB 76 84 D3  55 6C BB 8D 0E 72 A2 C3  ..K..v.. Ul...r..
...

I don't really know enough to interpret this any further, but it seems like the location and the pattern might be clues.

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • did you try to run a recursive diff on your virtualenv after reinit and after corruption? – yohann.martineau Jun 29 '15 at 21:47
  • @yohann.martineau No, unfortunately I only looked at the Python binary. Next time the error occurs I'll grab a full copy of the broken virtualenv so that I can diff it. – ali_m Jun 29 '15 at 21:49
  • I wonder is the fact you are using Ubuntu 15.04 causing any of your problems – Padraic Cunningham Jun 29 '15 at 21:58
  • @PadraicCunningham Why? Is there some particular reason to suspect the OS? – ali_m Jun 29 '15 at 22:00
  • have you tried updating virtualenv? Your version was released in may 2014 – Padraic Cunningham Jun 29 '15 at 22:13
  • @PadraicCunningham I can certainly try that – ali_m Jun 29 '15 at 22:19
  • @ali_m, might be a longshot but at least it will rule that possibility out. – Padraic Cunningham Jun 29 '15 at 22:20
  • 1
    It could be hardware problems too. Drive going bad, or memory going bad. I've seen some pretty crazy stuff in both those situations. – John Szakmeister Nov 19 '15 at 17:00
  • @jszakmeister I think that's unlikely - the error occurred on a laptop with a fairly new SSD that has never shown any signs of failure before or since (I ran some SMART tests and a memory diagnostic for good measure). In fact, since posting this question I have not experienced the error again. I can't really pinpoint anything in particular that might have gotten rid of the error, but I did update my version of virtualenv. I'll leave the question open in case anyone else comes across the same problem. – ali_m Nov 19 '15 at 17:06
  • 1
    All I can say is that I saw a similar style of error when some memory went bad. After doing some testing, it was a single bit in a stick of memory. *shrug* Doubt it or not, crazy stuff can happen. – John Szakmeister Nov 19 '15 at 22:42

0 Answers0