1

Suppose a question is asked in Stackoverflow and example data is given, what is the best way to import the example data

In order to explain this question which I am asking, let me take to a real StackOverflow post.

This post has given the following data to solve the query.


    Access Type     host/IP address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10

Now what will be the best way to import the data into Python

The methodology should be in such a way that the import is done into the python environment in a single step.

I am aware that, I can copy-paste the data into a spreadsheet -> then save it as CSV file and then import it into Python. But I need a more elegant way.

Other solution as suggested by one developer,

import pandas as pd ;

df = pd.read_clipboard()

pd.read_clipboard(), is not a universal solution across all operating system. My system (Ubuntu 19.10), I did a clipboard copying and then the result came like this. The column names were wrongly imported.

enter image description here

This post deals with R solution, but I am interested in a solution for Python

Praveen Kumar-M
  • 223
  • 2
  • 10
  • To be honest, the original post should have added the data in a machine-readable format, e.g. CSV, not the human-readable table output you get by printing a dataframe. – AKX May 10 '20 at 12:14
  • You are right. It will be best sometimes to give as CSV file. But CSV file also poses risk sometimes of sending malicious file concealed as CSV. Hence posting a small part of data (current practice) seems better for transparency – Praveen Kumar-M May 12 '20 at 07:18
  • I meant pasting a CSV-formatted excerpt in the post, not an attachment. – AKX May 12 '20 at 12:41

3 Answers3

1

Edited answer: this is slightly more work, but it's another viable option in my opinion.

from io import StringIO
import pandas as pd

s = StringIO("""    Access Type     host/IP address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10""")

df = pd.read_fwf(s,index_col=0,infer_nrows=10)

Result:

In [175]: df
Out[175]: 
    Access Type  host/IP address  Count
0    Authorized   206.196.21.129     23
1    Authorized     207.30.238.8     46
2    Authorized     208.62.55.75     23
3    Authorized   216.12.111.241     23
4    Authorized    63.197.98.106     23
5    Authorized     67.95.49.172     23
6  Unauthorized  207.243.167.114     23
7  Unauthorized  209.152.168.249     10
8  Unauthorized    65.166.159.14     10
9  Unauthorized    68.143.156.89     10
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
  • And then you'll find that the first column, "Access Type", has been split into two columns, as has "Host/IP address"... – AKX May 10 '20 at 11:46
  • This might work better than read_clipboard() as this will give strong control over import. We can also easily correct the column names before clicking "Enter" button. – Praveen Kumar-M May 10 '20 at 13:21
1

As others have pointed out, The best solution seems to be pd.read_clipboard() In order not to get messed up columns, just replace the spaces in the column names with _ and then to get the actual columns simply do:

df.columns = [name.replace('_',' ') for name in df.columns.values]

So copy the input when you replaced spaces with _ : input:

    Access_Type     host/IP_address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10

run the code:

import pandas as pd 
df = pd.read_clipboard()
df.columns = [name.replace('_',' ') for name in df.columns.values]

and you'll get:

Output:

    Access Type  host/IP address  Count
0    Authorized   206.196.21.129     23
1    Authorized     207.30.238.8     46
2    Authorized     208.62.55.75     23
3    Authorized   216.12.111.241     23
4    Authorized    63.197.98.106     23
5    Authorized     67.95.49.172     23
6  Unauthorized  207.243.167.114     23
7  Unauthorized  209.152.168.249     10
8  Unauthorized    65.166.159.14     10
9  Unauthorized    68.143.156.89     10
Hossein
  • 24,202
  • 35
  • 119
  • 224
0

Try to use pandas function read_clipboard()

This method is as simple as you want: it reads copy-pasted tabular data and parses it into a Data Frame.

By the documentation this method read text from clipboard and pass to read_csv.

You can find more information in this blog post

EDIT: I think that your problem didn't cause by your OS, it could be because of the white spaces in the first line of the data.

Yanirmr
  • 923
  • 8
  • 25