0

I have a csv file that looks like this:

name1, id1, email1, uID1
name2, id2, email2, uID2
name3, id3, email3, uID3
name4, id4, email4, uID4
name5, id5, email5, uID5
name6, id6, email6, uID6

And I want to grab a random email from this. For ex. I want email4 and only email4. How do I read that in? I don't want name4 id4 and uID4 with it, Just email4.

Note: I am writing a method to do this and want to return email4 not print it.

I have seen lots of info on how to get an entire row or an entire column, but not how to get one piece of a row. How do I do this?

I have looked through and tried all options on this thread: How can I get a specific field of a csv file? But the answers were not working for me. So new solutions or fixes to their solutions would be great!

Here is where I am currently at now:

num = random.randint(1,11)
    with open('Accounts_details.csv', 'rb') as f:
        reader = csv.reader(f)
        reader = list(reader)
        text = reader[num][2]
        print(text)

And this throws an error:

 reader = list(reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

3 Answers3

0

pandas is probably the easiest way to work with this. It's not clear from your post whether you want a specific field or a random one, but both are pretty straightforward.

import pandas as pd
df = pd.read_csv(<your_file>)

print(df.iloc[4]['email'])

You can either generate your 4 randomly and use that to index, or use random.sample from pandas directly if you want multiple.

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
  • I want it to be random, but I have the function for choosing a random one already somewhere else. And the 'email' part is that the header? or what does the email point to? – HoldenMalinchock Jul 31 '17 at 19:21
  • @HoldenMalinchock yup, the `email` is just the header. – Slater Victoroff Jul 31 '17 at 19:22
  • What do I do if I don't have headers. My csv file looks just like the one above, just with real info. – HoldenMalinchock Jul 31 '17 at 19:23
  • Do I need go and create headers for it? – HoldenMalinchock Jul 31 '17 at 19:23
  • @HoldenMalinchock you should, otherwise you can index by the column index, but that's bad for a lot of reasons. Mostly introducing magic numbers into code is never a good call. – Slater Victoroff Jul 31 '17 at 19:25
  • I went and created headers in my csv file and put in your code and I am getting this error: File "pandas\_libs\tslib.pyx", line 923, in pandas._libs.tslib.get_value_box (pandas\_libs\tslib.c:18843) File "pandas\_libs\tslib.pyx", line 932, in pandas._libs.tslib.get_value_box (pandas\_libs\tslib.c:18477) TypeError: 'str' object cannot be interpreted as an integer – HoldenMalinchock Jul 31 '17 at 19:44
  • @HoldenMalinchock That seems to be an error in your data file. Usually that error gets thrown when your csv is not properly formatted. I tested this on the example csv that you included with your question and it worked just fine. – Slater Victoroff Jul 31 '17 at 20:10
  • I got it another way since I could not find the problems in mine. – HoldenMalinchock Jul 31 '17 at 20:50
0

You opened it as a text file.

This:

with open('Accounts_details.csv', 'rb') as f:  

should be this:

with open('Accounts_details.csv', 'r') as f:  
Pang
  • 9,564
  • 146
  • 81
  • 122
SagemOg
  • 5
  • 5
-1

You could use numpy-library, full example:

import io
import numpy as np

test = """name1, id1, email1, uID1
name2, id2, email2, uID2
name3, id3, email3, uID3
name4, id4, email4, uID4
name5, id5, email5, uID5
name6, id6, email6, uID6"""

with open("test.txt", "w") as f:
    f.write(test)

data = np.genfromtxt("test.txt", delimiter="," ,dtype='unicode', autostrip=True)
# np.random.choice(data[:,2]) <-- random choice
data[:,2][4] # <--- index

update: time comparison between numpy and pandas

%timeit np.genfromtxt("test.txt", delimiter="," ,dtype='unicode', autostrip=True)
# 1000 loops, best of 3: 404 µs per loop

%timeit pd.read_csv("test.txt", header=None, skipinitialspace=True)
1000 loops, best of 3: 954 µs per loop
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • Numpy is part of pandas? AKA I don't need to install it?> – HoldenMalinchock Jul 31 '17 at 19:49
  • @HoldenMalinchock Nope, sorry that was a misstype. I am timing the two alternatives. – Anton vBR Jul 31 '17 at 19:51
  • Also can I see it without the random choice method? So like if I had the random number already and just plugged it in? Also I am using a csv file not a text file so the genfromtxt is not reconized. – HoldenMalinchock Jul 31 '17 at 19:52
  • @HoldenMalinchock A csv file is a txt file (with a *.csv ending) - just use genfromtxt("filename.csv"..). If you need the index just input it as **data[:,2]** is the column with all emails. – Anton vBR Jul 31 '17 at 20:00
  • @AntonvBR very deceptive benchmark. You aren't doing the same thing in those two code snippets at all. You should A) decouple from IO and B) have the two snippets do the same thing. – Slater Victoroff Jul 31 '17 at 20:08
  • @AntonvBR The IO is happening within your timeit loop. `np.random.choice(data[:,2])`, and `df["email"].sample(1).values[0]` are not doing the same thing. You can also use `np.random.choice` directly on a pandas dataframe. – Slater Victoroff Jul 31 '17 at 20:13
  • @SlaterTyranus those were the options presented. I do however think you are right and have therefor adjusted my answer only to show the loading times. – Anton vBR Jul 31 '17 at 20:16
  • @AntonvBR, you should also remove the `names` kwarg and the `skipinitialspaces` kwarg, as this functionality isn't being done on the numpy side at all. Similarly on the `numpy` side you should remove the unicode dtype (or pass it into pandas), and the `autostrip` – Slater Victoroff Jul 31 '17 at 20:17
  • @SlaterTyranus These are necessary things to pass or you pass header=None. I am not comparing numpy vs pandas in general but for this specific task. – Anton vBR Jul 31 '17 at 20:20
  • @AntonvBR but you're asking pandas to associate additional metadata with your dataframe, and removing a type inference for numpy by directly passing in the dtype. You should in fact pass `header=None` into `pandas`, as the docs state very clearly. – Slater Victoroff Jul 31 '17 at 20:23