Read txt file with string and number columns

Question

I have tab-separated file (city-data.txt):

Alabama Montgomery  32.361538   -86.279118
Alaska  Juneau  58.301935   -134.41974

Is it possible to read somehow first two columns as strings and last two as floats?

My output should look like this:

[(Alabama,Montgomery,32.36,-86.28),
 (Alaska,Juneau,58.30,-134.42)]

I tried:

mylist2=np.genfromtxt(r'city-data.txt', delimiter='\t',  dtype=("<S15","
<S15", float, float)).tolist()

Which gives me first two columns in byte type:

[(b'Alabama', b'Montgomery', 32.361538, -86.279118),
 (b'Alaska', b'Juneau', 58.301935, -134.41974)]

I also tried:

with open('city-data.txt') as f:
mylist = [tuple(i.strip().split('\t')) for i in f]

Which gives me all columns in string type:

[('Alabama', 'Montgomery', '32.361538', '-86.279118'),
 ('Alaska', 'Juneau', '58.301935', '-134.41974')]

I can't come up with any idea how to implement what I need...

What's wrong with the latter method? Just convert everything to np.float64 as needed. — Jon, Feb 13 '18 at 16:47
Also, for beginners, a better interface may be [pandas](https://pandas.pydata.org/pandas-docs/stable/io.html) instead of numpy. — Jon, Feb 13 '18 at 16:49
Have you tried adding to your second solution by converting the last two items to floats? `for item in line: if item.isdigit(): item = float(item); apend item to new container.` What is the question? — wwii, Feb 13 '18 at 16:50
In Py3 the default string type is unicode, which `numpy` labels with `U`. `b'one'` is a bytestring, a `S` dtype, which is the default in Py2. — hpaulj, Feb 13 '18 at 17:29

pault · Answer 1 · 2018-02-13T17:02:39.610

You can use pandas read_csv to read the contents of the file into a dataframe. Then convert the rows to a list as you specified using df.values.tolist().

Example:

import pandas as pd

df = pd.read_csv(filename, sep="\t", header=None)

print(df.values.tolist())
#[['Alabama', 'Montgomery', 32.361538, -86.27911800000001],
# ['Alaska', 'Juneau', 58.301935, -134.41974]]

If you need them as tuples, just use map():

print(map(tuple, df.values.tolist()))
#[('Alabama', 'Montgomery', 32.361538, -86.27911800000001),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]

Edit

If you want to use numpy, this slight modification to your existing code should work. Change the dtype for the text fields to "O":

mylist2=np.genfromtxt(filename delimiter='\t', dtype=("O","O", float, float)).tolist()
#[('Alabama', 'Montgomery', 32.361538, -86.279118),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]

Thank you! Is seems much easier. Do you know the way how to get rid of quotes in first two columns? — elenaby, Feb 13 '18 at 17:07
@elenaby I'm not sure what you mean by quotes. The first two columns are strings so the quotes will be there when you display them. Also try out my recent edit which shows you how to use your existing `numpy` code. Perhaps you will find that easier to implement. — pault, Feb 13 '18 at 17:09

score 3 · Accepted Answer · answered Feb 13 '18 at 17:21

Another option is to use the 'U' dtype, which stands for unicode.

>>> import numpy as np
>>> mylist = np.genfromtxt('city-data.txt', delimiter='\t', dtype=('U10','U10',float,float)).tolist()
>>> mylist
[('Alabama', 'Montgomery', 32.361538, -86.279118), ('Alaska', 'Juneau', 58.301935, -134.41974)]

score 1 · Answer 3 · answered Feb 13 '18 at 17:25

After you split a line, create a new line by trying to convert the items to floats then append the new line to the final container.

import io
from pprint import pprint

s = '''Alabama Montgomery  32.361538   -86.279118
Alaska  Juneau  58.301935   -134.41974'''
f = io.StringIO(s)
stuff = []
for line in f:
    line = line.strip()
    line = line.split()
    new_line = []
    for item in line:
        try:
            item = float(item)
        except ValueError as e:
            pass
        new_line.append(item)
    #print(f'line:{line}, new_line:{new_line}')
    stuff.append(new_line)
pprint(stuff)

Read txt file with string and number columns

3 Answers3