0

I have an xlsx file that contains let's say 4 columns:

Thingie1
Thingie1
Thingie2
Thingie2

(I know it's stupid... it's because I am combining data from 2 different databases, and the 1 and 2 denote a difference between survey 1 and survey 2 and chose to dump out into a xlsx file.

When I import this into R using the xlsx package, it changes the column titles to:

Thingie1
Thingie3
Thingie2
Thingie4

Obviously my real-world example is much more complex and it then becomes very difficult to identify what is what. I would like to have something like:

Thingie1
Thingie11
Thingie2
Thingie21

Now, I can resolve this by opening the file in excel and changing the columns there... so I can solve my immediate problem, but I hate breaking my workflow... would there be some way to import this data in a better way into R itself?

Amit Kohli
  • 2,860
  • 2
  • 24
  • 44
  • It appears that read.xlsx() is changing the column names before calling data.frame() since data.frame() would change them to "Thingie1", "Thingie1.1" which is pretty close to what you want. If the column names are the same for all of the files, just create a vector of the names you want: cols <- c("Thingie1", "Thingie11", . . . ) and then use names(yourdataframe) <- cols. – dcarlson Aug 02 '16 at 15:29

1 Answers1

1

I think you should read the first line (containing column names) alone of your file with header=FALSE for proper naming of columns. Then use make.unique function upon it. Later read the whole file with header=TRUE and set the column names using colnames function.

x <- c("Thingie1", "Thingie1", "Thingie2", "Thingie2")
x
#[1] "Thingie1" "Thingie1" "Thingie2" "Thingie2"
make.unique(x)
#[1] "Thingie1"   "Thingie1.1" "Thingie2"   "Thingie2.1"
user2100721
  • 3,557
  • 2
  • 20
  • 29