-4

I am opening a txt file using pandas and where there should be a column break in the file there is a \t instead.

I am reading in the file like this:

df=pd.read_csv(r'file.txt')

The dataframe looks like this:

1           Band 1\t 0.428944\t0.843916\t0.689923\t0                    
2           Band 2\t-0.000000\t0.689320\t0.513170\t0                   
3           Band 3\t 0.336438\t0.743478\t0.592622\t0                    
4           Band 4\t 0.313259\t0.678561\t0.525667\t0                     
5           Band 5\t 0.374522\t0.746828\t0.583513\t0

and I want it to look like this:

1           Band 1   0.428944  0.843916  0.689923                     
2           Band 2  -0.000000  0.689320  0.513170                  
3           Band 3   0.336438  0.743478  0.592622                    
4           Band 4   0.313259  0.678561  0.525667                    
5           Band 5   0.374522  0.746828  0.583513

I am new to using txt files in python, do I perhaps have to set a delimiter of some sort?

Using print(repr(open(r'D:\Sheyenne\Statistics\NDVI_allotment\Text\A_Annex2.txt').read(42))) returns:

'\n\n     Band 1\t 0.428944\t0.843916\t0.689923\t

EDIT:

The original dataframes I posted are simplified and in reality there are more columns of data.

`print(repr(open(r'D:\Sheyenne\Statistics\NDVI_allotment\Text\A_Annex2.csv').read(500)))

returns:

'\nBasic Stats\t      Min\t     Max\t    Mean\t   Stdev\t  Num\tEigenvalue\n     Band 1\t 0.428944\t0.843916\t0.689923\t0.052534\t    1\t  0.229509\n     Band 2\t-0.000000\t0.689320\t0.513170\t0.048885\t    2\t  0.119217\n     Band 3\t 0.336438\t0.743478\t0.592622\t0.052544\t    3\t  0.059111\n     Band 4\t 0.313259\t0.678561\t0.525667\t0.048047\t    4\t  0.051338\n     Band 5\t 0.374522\t0.746828\t0.583513\t0.055989\t    5\t  0.027913\n     Band 6\t-0.000000\t0.749325\t0.330068\t0.314351\t    6\t  0.022561\n     Band 7\t-0.000000\t0.819288\t0.6001'
Stefano Potter
  • 3,467
  • 10
  • 45
  • 82
  • Could you show us a sample of the file too please? `print(repr(open('file.txt').read(100)))` would be helpful here. – Martijn Pieters Aug 24 '15 at 17:39
  • 1
    I Googled your question's title and came up with a few helpful results, like [this one](http://stackoverflow.com/questions/2585337/how-to-use-tab-space-while-writing-in-text-file) (Java, but still relevant). – TigerhawkT3 Aug 24 '15 at 17:55
  • @Martijn Pieters, I'm sorry but what do you mean by a sample? The first block of code I showed is a sample of what it looks like, do you mean something different? – Stefano Potter Aug 24 '15 at 17:59
  • I mean something different; I'd like to see the raw data. I gave you a Python command that would produce the first 100 characters from the file. – Martijn Pieters Aug 24 '15 at 18:11
  • That returns `'Filename: F:\\Sheyenne\\Atmospherically Corrected Landsat\\Indices\\Main\\NDVI\\NDVI_stack\nROI: EVF: Layer'`. But all that is is the first line of the txt file – Stefano Potter Aug 24 '15 at 18:31
  • The [docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv) state that the default separator is a comma, your file contains tabs and no commas, you can try this: `df = pd.read_csv(r'file.txt', sep='\t')` as Martijn has answered already, it should then load your csv correctly, otherwise you have additional formatting issues, I suggest posting a link to your file or editing your question and pasting the exact input text to avoid abiguity – EdChum Aug 24 '15 at 19:12
  • @StefanoPotter: please add that to your question, although if you could increase the number of characters that would be greatly appreciated. – Martijn Pieters Aug 24 '15 at 19:18
  • @Martijn Pieters I made it so it contains data in the frame now, I had just omitted the first two lines of the file in my example above for simplicity – Stefano Potter Aug 24 '15 at 19:50
  • Thanks for adding raw file data; I note that it is not 100 characters however. Still, there is enough info there that we can work with, I think. – Martijn Pieters Aug 24 '15 at 20:41
  • yes, there are four more columns of data that I am going to end up removing from the file anyways, I can add the full 100 characters if you think it would help though – Stefano Potter Aug 24 '15 at 21:19
  • I added 500 characters of data from the original file in addition to the 42 characters I initially posted – Stefano Potter Aug 24 '15 at 21:25

3 Answers3

9

It is a tab character. It means your pandas.read_csv() call failed to automatically determine the correct delimiter in the file.

You could try and specify it explicitly with the sep argument:

df = pd.read_csv(r'file.txt', sep='\t')

or you could set the delim_whitespace argument to true for general whitespace-as-delimiter support:

df = pd.read_csv(r'file.txt', delim_whitespace=True)

From your sample it looks like you have extra empty lines, as well as spaces after the delimiter, so perhaps you need to have the reader skip those:

df = pd.read_csv(r'file.txt', sep='\t',
                 skipinitialspace=True, skip_blank_lines=True)

See the documentation on handling CSV files.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Note that `delim_whitespace=True` would likely cause problems with the OP's data (the space in "Band 1"). – DSM Aug 24 '15 at 17:47
  • @DSM: Yup, unless `read_csv` can handle quoting like the Python `csv` module can *and* that first column is using quoting (which may not be the case since the `\t` is included in that column). – Martijn Pieters Aug 24 '15 at 17:48
  • @DSM: at any rate, without a sample of the file we are all just guessing here anyway. I asked for one. – Martijn Pieters Aug 24 '15 at 17:48
  • using `sep='\t'` returns the error `CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 7 ` – Stefano Potter Aug 24 '15 at 19:17
  • @StefanoPotter: your 100 character sample doesn't contain any tabs, but then again it doesn't contain any of the data in your frame output either. – Martijn Pieters Aug 24 '15 at 19:19
2

\ is an escape character. It alters the representation of the following character. In the case of \t, it becomes a tabspace. https://en.wikipedia.org/wiki/Escape_character

Spirine
  • 1,837
  • 1
  • 16
  • 28
1

\t is the escape sequence for a <tab> character.

Josh J
  • 6,813
  • 3
  • 25
  • 47
  • I guess answering the question "What does \t represent in txt file?" is worthy of a down vote today. – Josh J Aug 24 '15 at 19:52
  • 1
    Or maybe just 1 upvote. (But just one) Ps. I see adding the Wikipedia link is worth 10xp :) – Reg Aug 24 '15 at 19:55