-1

I have SPSS data, which I have to migrate to R. The data is large with 202 columns and thousands of rows

v1 v2    v3     v4 v5
1  USA   Male   21 Married
2  INDIA Female 54 Single
3  CHILE Male   33 Divorced ...and so on...

The data file contains variable labels "Identification No", "Country of origin", "Gender", "(Current) Year", "Marital Status - Candidate"

I read my data from SPSS with following command

data<-read.spss(file.sav,to.data.frame=TRUE,reencode='utf-8')

The column name is read as v1,v2,v3,v4 etc, but I want variable labels as my column name in data frame. I used following command to find the variable labels and set it as names

vname<-attr(data,"variable.labels")
for(i in 1:202){vl[i]<-vname[[i]]}
names(data)<-vl

Now the problem is that I have to address that column like data$"Identification number", which is not very nice. I want to remove quotation marks around the column names. How can I do that?

Prabhu
  • 5,296
  • 4
  • 37
  • 45
  • 2
    I doubt you really have quotation marks around your column names. It is just how R represents character values IMO. The problem with your colnames is that they contain spaces and `(` – David Arenburg Sep 21 '14 at 19:49
  • 1
    It's actually more the case that all calls to "$" do have implied quotes around the second argument but their printing is suppressed. The syntactic sugar of the "$" function obscures what is really happening. "$" is really "[[" with non-standard evaluation of the expression that follows. Everyone should take a moment to read the relevant section in the Details of `?'[['` – IRTFM Sep 21 '14 at 22:01

3 Answers3

4

You can't. An unquoted space is a syntactic symbol that breaks the grammar up.

An option is to change the names to ones without spaces in, and you can use the make.names function to do that.

> N = c("foo","bar baz","bar baz")
> make.names(N)
[1] "foo"     "bar.baz" "bar.baz"

You might want to make sure you have unique names:

> make.names(N, unique=TRUE)
[1] "foo"       "bar.baz"   "bar.baz.1"
Spacedman
  • 92,590
  • 12
  • 140
  • 224
4

The quotation marks were there because the names had spaces in them. print(vl,quotes=FALSE) displayed text without quotation marks. But I had to use quotation marks in order to use it as a single variable name. Without quotation marks, the spaces would break the variable names.

This could be solved by removing spaces in the name. I solved this by substituting all the spaces in between the names by using gsub command

vl<-gsub(" ","",vl)
names(data)<-vl

Now most of the column names can be accessed without using quotation marks. But the names containing other punctuation marks couldn't be used without quotation.

Alos the solution by Spacedman worked fine and seems easier to use.

make.names(vl, unique=TRUE)

But I liked the solution by David Arenburg.

gsub("[ [:punct:]]", "" , vl)

It removed all punctuation marks and made the column name clean and better.

Prabhu
  • 5,296
  • 4
  • 37
  • 45
1

Spaces are okay in data.table column names without much fuss. But, no, there's no way to avoid using quotation marks for the reason Spacedman gave: spaces break up the syntax.

require(data.table)
DT <- data.table(a = c(1,1), "bc D" = c(2,3))

# three identical results:
DT[['bc D']]
DT$bc
DT[,`bc D`]

Okay, so partial matching with $ (which also works with data.frames) gets you out of using quotes. But it will bring trouble if you get it wrong.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • `DT$"bc D"` and `DT$bc` gives identical result. It came out to be true. I didn't knew that until you pointed out. But in my case, I don't want to use quotation marks and using `$` without quotation does not point to unique column. – Prabhu Sep 21 '14 at 21:21