0

I have tab-separated file (city-data.txt):

Alabama Montgomery  32.361538   -86.279118
Alaska  Juneau  58.301935   -134.41974

Is it possible to read somehow first two columns as strings and last two as floats?

My output should look like this:

[(Alabama,Montgomery,32.36,-86.28),
 (Alaska,Juneau,58.30,-134.42)]

I tried:

mylist2=np.genfromtxt(r'city-data.txt', delimiter='\t',  dtype=("<S15","
<S15", float, float)).tolist()

Which gives me first two columns in byte type:

[(b'Alabama', b'Montgomery', 32.361538, -86.279118),
 (b'Alaska', b'Juneau', 58.301935, -134.41974)]

I also tried:

with open('city-data.txt') as f:
mylist = [tuple(i.strip().split('\t')) for i in f]

Which gives me all columns in string type:

[('Alabama', 'Montgomery', '32.361538', '-86.279118'),
 ('Alaska', 'Juneau', '58.301935', '-134.41974')]

I can't come up with any idea how to implement what I need...

martineau
  • 119,623
  • 25
  • 170
  • 301
elenaby
  • 167
  • 2
  • 11
  • What's wrong with the latter method? Just convert everything to np.float64 as needed. – Jon Feb 13 '18 at 16:47
  • 1
    Also, for beginners, a better interface may be [pandas](https://pandas.pydata.org/pandas-docs/stable/io.html) instead of numpy. – Jon Feb 13 '18 at 16:49
  • just read everything in and then convert as needed – SuperStew Feb 13 '18 at 16:49
  • Have you tried adding to your second solution by converting the last two items to floats? `for item in line: if item.isdigit(): item = float(item); apend item to new container.` What is the question? – wwii Feb 13 '18 at 16:50
  • In Py3 the default string type is unicode, which `numpy` labels with `U`. `b'one'` is a bytestring, a `S` dtype, which is the default in Py2. – hpaulj Feb 13 '18 at 17:29

3 Answers3

6

You can use pandas read_csv to read the contents of the file into a dataframe. Then convert the rows to a list as you specified using df.values.tolist().

Example:

import pandas as pd

df = pd.read_csv(filename, sep="\t", header=None)

print(df.values.tolist())
#[['Alabama', 'Montgomery', 32.361538, -86.27911800000001],
# ['Alaska', 'Juneau', 58.301935, -134.41974]]

If you need them as tuples, just use map():

print(map(tuple, df.values.tolist()))
#[('Alabama', 'Montgomery', 32.361538, -86.27911800000001),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]

Edit

If you want to use numpy, this slight modification to your existing code should work. Change the dtype for the text fields to "O":

mylist2=np.genfromtxt(filename delimiter='\t', dtype=("O","O", float, float)).tolist()
#[('Alabama', 'Montgomery', 32.361538, -86.279118),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]
pault
  • 41,343
  • 15
  • 107
  • 149
  • Thank you! Is seems much easier. Do you know the way how to get rid of quotes in first two columns? – elenaby Feb 13 '18 at 17:07
  • @elenaby I'm not sure what you mean by quotes. The first two columns are strings so the quotes will be there when you display them. Also try out my recent edit which shows you how to use your existing `numpy` code. Perhaps you will find that easier to implement. – pault Feb 13 '18 at 17:09
3

Another option is to use the 'U' dtype, which stands for unicode.

>>> import numpy as np
>>> mylist = np.genfromtxt('city-data.txt', delimiter='\t', dtype=('U10','U10',float,float)).tolist()
>>> mylist
[('Alabama', 'Montgomery', 32.361538, -86.279118), ('Alaska', 'Juneau', 58.301935, -134.41974)]
Bill Bell
  • 21,021
  • 5
  • 43
  • 58
1

After you split a line, create a new line by trying to convert the items to floats then append the new line to the final container.

import io
from pprint import pprint

s = '''Alabama Montgomery  32.361538   -86.279118
Alaska  Juneau  58.301935   -134.41974'''
f = io.StringIO(s)
stuff = []
for line in f:
    line = line.strip()
    line = line.split()
    new_line = []
    for item in line:
        try:
            item = float(item)
        except ValueError as e:
            pass
        new_line.append(item)
    #print(f'line:{line}, new_line:{new_line}')
    stuff.append(new_line)
pprint(stuff)  
wwii
  • 23,232
  • 7
  • 37
  • 77