3

I have a text file that has the following data:

5298    10036   4   360 8
6128    11947   2   385 7
9472    18930   0   233 4
5056    9790    1   293 6

I read this file using the following code:

file1 = open("test.txt","r")
lines = file1.readlines()       
BF=[map(float, line.split()) for line in lines]

This gives me the following error:

could not convert string to float: ÿþ5

Why do I see this error?

Update:

print lines 

shows:

['\xff\xfe5\x002\x009\x008\x00\t\x001\x000\x000\x003\x006\x00\t\x004\x00\t\x003\x006\x000\x00\t\x008\x00\r\x00\n', '\x006\x001\x002\x008\x00\t\x001\x001\x009\x004\x007\x00\t\x002\x00\t\x003\x008\x005\x00\t\x007\x00\r\x00\n', '\x009\x004\x007\x002\x00\t\x001\x008\x009\x003\x000\x00\t\x000\x00\t\x002\x003\x003\x00\t\x004\x00\r\x00\n', '\x005\x000\x005\x006\x00\t\x009\x007\x009\x000\x00\t\x001\x00\t\x002\x009\x003\x00\t\x006\x00\r\x00\n', '\x001\x005\x000\x006\x004\x00\t\x003\x000\x001\x006\x000\x00\t\x001\x00\t\x003\x001\x002\x00\t\x008\x00']
Abhinav Kumar
  • 1,613
  • 5
  • 20
  • 33
  • I think you have utf-8 BOM, try `file1 = open("test.txt","r", "utf-8")` – EdChum Mar 24 '15 at 11:08
  • I cannot reproduce this error with that file content. Are you sure you dont have any other non numeric character in your file? – lapinkoira Mar 24 '15 at 11:08
  • You see that because you have the text `ÿþ5` somewhere in the file and it can't be parsed to a float. You should change the aggregation to a loop and handle such cases with `try`/`except`. – Klaus D. Mar 24 '15 at 11:09
  • 2
    Actually I think it's utf-16 try `file1 = open("test.txt","r", "utf-16")` – EdChum Mar 24 '15 at 11:10

3 Answers3

8

You have a utf-16 BOM, this is 0xFE 0xFF which is interpreted as ÿþ, you need to open the file and pass the encoding.

file1 = open("test.txt","r", encoding = "utf-16")

As you using python 2 you could try this:

import io
file1 = io.open("test.txt","r", encoding = "utf-16")
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Hi, this gives me the following error: TypeError: 'encoding' is an invalid keyword argument for this function – Abhinav Kumar Mar 24 '15 at 11:16
  • Try this: `file1 = open("test.txt","r", "utf-16")` – EdChum Mar 24 '15 at 11:16
  • What version python are you using? – EdChum Mar 24 '15 at 11:17
  • 1
    Try `import io file1 = io.open("test.txt","r", encoding = "utf-16")` – EdChum Mar 24 '15 at 11:18
  • Pyhton version: 2.7.6 | 32-bit Tried io: reads the file but when I print the "lines", I get the following: [u'5298\t10036\t4\t360\t8\n', u'6128\t11947\t2\t385\t7\n', u'9472\t18930\t0\t233\t4\n', u'5056\t9790\t1\t293\t6\n', u'15064\t30160\t1\t312\t8'] – Abhinav Kumar Mar 24 '15 at 11:21
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/73657/discussion-between-abhinav-kumar-and-edchum). – Abhinav Kumar Mar 24 '15 at 11:25
  • Late to the game, but this is the perfect answer I needed! Was messing around with type conversions for ~30 minutes before stumbling across this. – Mark Moretto May 07 '19 at 11:21
1
import io
file1 = io.open("test.txt","r",encoding='utf-16')
lines = file1.readlines()
BF=[map(float, line.split()) for line in lines]
print BF

Result:

[[5298.0, 10036.0, 4.0, 360.0, 8.0], [6128.0, 11947.0, 2.0, 385.0, 7.0], [9472.0, 18930.0, 0.0, 233.0, 4.0], [5056.0, 9790.0, 1.0, 293.0, 6.0]]
Aaron
  • 2,383
  • 3
  • 22
  • 53
1

There could be a possibility that there is a line break included at the end if each line, why dont you print line.split() for each line in lines; just to confirm if the numbers split correctly or not....

fazkan
  • 352
  • 3
  • 11