0

I'm new to R and I'm having trouble understanding what all the parameters in read.table() do. I have a text file with a header, and roughly 50 rows. Columns are separated by tabs. I did the following.

data <- read.table("/Accounts/changy/Desktop/GreekProject/outputWithoutQuantity.txt",header=TRUE,sep="\t", quote = "")

Now, I want to create a matrix, but omit the header (first row). Also, read.table generates numbered rows for each of my already numbered rows, and I don't want my matrix to be numbered at all, so I would need to omit the first two columns as well. Can anyone point me to the right direction? I know

matrixData <- as.data.frame.matrix(data)

does it, but it doesn't format the rows and columns as I would like. Thanks for any help in advance, from a complete beginner to R!

here's a snapshot of my data set, upon request.

http://postimg.org/image/b7h97rd7d/

Yuen Hsi
  • 143
  • 1
  • 4
  • 11
  • how about `as.matrix(data, ncol = ncol(data))` – Ben Aug 19 '13 at 15:26
  • Could you please post a snapshot of your dataset? – Mayou Aug 19 '13 at 15:28
  • my dataset is too large; but i'll give it a try. http://postimg.org/image/b7h97rd7d/ the first 9 lines are supposed to be 1 line; they collectively form the first row. – Yuen Hsi Aug 19 '13 at 15:31
  • Just the first 2-3 rows will do. I just need to see the variable types you have in your dataset, and the set up in the text file. If you could please provide a snapshot of `data`, as well as a snapshot of your text file (a print screen will do)! – Mayou Aug 19 '13 at 15:33
  • Have you tried converting the file to `csv`? It is easily done by just changing the extension from `.txt` to `.csv` in Windows Explorer. How does your data look then? – Mayou Aug 19 '13 at 15:36
  • 2
    `header=FALSE, skip=1` and `data.matrix()` should help – baptiste Aug 19 '13 at 15:37
  • @mariam I could, but I'm actually loading the data I showed you from a text file generated by Excel, and I have some back end Java code that reformats my data. Changing it to csv is still simple, I just have to change my delimeters in my Java code from "\t" to ",", and I could do that, but would that ultimately help with me getting it into a matrix in the desired format? – Yuen Hsi Aug 19 '13 at 15:38
  • @baptiste that still shows the first two columns, and I think the column names are replaced by V[n], but are not removed. :/ – Yuen Hsi Aug 19 '13 at 15:40
  • column names are just an attribute in the resulting matrix, it shouldn't matter. You can remove them with `colnames(m) = NULL`. Same with row names. – baptiste Aug 19 '13 at 15:47
  • @baptiste got it, now it just displays NA. One last question, is there a way to remove the first column? it was helpful to have entry numbers in the text file, but I'm trying to use SVD to fill in missing data, so i don't need to row identifiers. my matrix is currently 45 by x, i would like to make it 45 by (x-1), omitting the first (leftmost) column. – Yuen Hsi Aug 19 '13 at 15:52
  • Check my answer below. To get rid of your first column ("personID"), you do the following: `data.mat = as.matrix(data.txt[,-1]) dimnames(data.mat) <-list(rep("", dim(data.mat)[1]), rep("", dim(data.mat)[2]))` – Mayou Aug 19 '13 at 15:57

1 Answers1

4

Here is a suggestion. Does it work as you wish?

 ## Test dataset
 data = data.frame(col1 = c(1,2,3,4), col2 = c(0,0, 1, 0), col3 = c(1,0,0,1))
 write.table(data, "data.txt", row.names = FALSE)
 data.txt = read.table("data.txt", header = TRUE)

 data.mat = as.matrix(data.txt[,-1])  # gets rid of the leftmost column(as you requested)
 dimnames(data.mat) <-list(rep("", dim(data.mat)[1]), rep("", dim(data.mat)[2]))

This would be the output

0 1
0 0
1 0
0 1

instead of:

1 0 1
2 0 0
3 1 0
4 0 1
Mayou
  • 8,498
  • 16
  • 59
  • 98
  • I think that's almost what I'm trying to do. The matrix you generated is, however, 4 by 3, with the row and column names omitted. Since I'm trying to remove the first column, the resulting matrix I would like to generate in my case is actually 4 by 2, with the first vector (c(1,2,3,4)) removed. Thanks for your help and patience, have an upvote! – Yuen Hsi Aug 19 '13 at 16:00
  • Well, I just modified the answer to exclude the first vector. Try again the modified version above. (with `data.mat = as.matrix(data.txt[,-1])`). Note the "-1" – Mayou Aug 19 '13 at 16:01
  • That's perfect. Thank you! Does your last function, dimnames, just modify data.mat so to remove the header columns and rows? – Yuen Hsi Aug 19 '13 at 16:04
  • Yes, all it does is remove the dimension names, i.e. rownames and colnames of `data.mat`. (technically, it replaces the dimension names, rownames and colnames, by a list of which the element is a vector of "", with length = nrow(mat) or ncol(mat) respectively) – Mayou Aug 19 '13 at 16:06