0

I am parsing a csv with multiple columns. The number of columns is not fixed in the csv file. It varies from 5 to 10. I need to recreate a data.frame with these columns inside a function. I am wondering if there is any multiple arguments functionality in R like one in Ruby(*args). If not, How to achieve this??? I searched a bit and found that if I have a col name as

col1
col2

I can use:

list <- ls(pat="^col\\d$")

and pass this list as an argument to a function, but it will pass just column names, as characters, not the values these column names are carrying.

Any suggestions????

Edit: I am parsing a file from RoR app and using RinRuby gem to call R functions. So parsing a csv from ruby and passing individual column contents as a single variable in R. Now in R, I need to create a data.frame. So actually its not a data frame originally. So in the method cal_norm below I am assigning variables in R using a loop with names col1, col2, col3....and so on.

here is the rails code:

 class UploadsController < ApplicationController

  attr_accessor :calib_data, :calib_data_transpose, :inten_data, :pr_list

  def index
    @uploads = Upload.all

    @upload = Upload.new

  respond_to do |format|
  format.html 
  format.json { render json: @uploads }   
  end
 end

 def create
  @upload = Upload.new(params[:upload]) 

 directory = "public/"
 io_calib = params[:upload][:calib]
 io_inten = params[:upload][:inten]   

 name_calib = io_calib.original_filename
 name_inten = io_inten.original_filename
 calib_path = File.join(directory, "calibs", name_calib)
 inten_path = File.join(directory, "intens", name_inten)

respond_to do |format|
  if @upload.save
    @calib_data, @calib_data_transpose = import(calib_path)
    @inten_data = import_ori(inten_path)
    #probe list of the uploaded file
    @probe_list = calib_data_transpose[0]
    logger.debug @probe_list.to_s
    flash[:notice] = "Files were successfully uploaded!!"
    format.html
    #format.js #{ render json: @upload, status: :created, location: @upload }
  else
    flash[:notice] = "Error in uploading!!"
    format.html { render action: "index" }
    format.json { render json: @upload.errors, status: :unprocessable_entity }
    end
  end
 end

def cal_norm
   #ajax request
   data = params['data'].split(',') 

  for i in 0..@calib_data_transpose.length - 1
  R.assign "col#{i}", @calib_data_transpose[i] 
  end

  R.assign "cells", @inten_data
  R.assign "pr", data
  R.eval <<-EOF

# make sure to convert them in character and numeric vectors

#match the selected pr in the table

#convert the found row of values from data.frame to numeric

#divide each column of the table by the respective pr values and create a new table repat it with different pr.

#make a new table with the ce count and different probe normalization and calculate  for individual pr

#finally return a data.frame with pr names and cell counts

#return individual columns as an array not in the form of matrix/data.frame

EOF

end

def import(file_path)
  array = import_ori(file_path)
  array_splitted = array.map {|a| a.split(",")} 
  array_transpose = array_splitted.transpose
  return array_splitted, array_transpose
end

 def import_ori(file_path)
  string = IO.read(file_path)
  array = string.split("\n")
  array.shift
  return array
 end

end
JstRoRR
  • 3,693
  • 2
  • 19
  • 20
  • 1
    I don't understand the question. `read.csv` returns a data.frame. – Roland Apr 14 '14 at 13:06
  • 1
    Me neither. You have one CSV with anything from 5 to 10 items in each line? How are you going to put that in a rectangular data frame? You can pad it out with NA markers using the `fill` parameter to `read.csv`. Otherwise... what? – Spacedman Apr 14 '14 at 13:17
  • Just read your csv using `read.csv` or `read.table`. Both functions don't care how many columns your csv has. Maybe you could edit your question to be a little more explicit, as in what exactly you mean by "recreats a data.frame with these columns". – Stephan Kolassa Apr 14 '14 at 13:18
  • sorry I did not mention one thing, I am editing my question. – JstRoRR Apr 14 '14 at 13:38
  • Question Edited. Please check the bottom lines. I guess this was important. – JstRoRR Apr 14 '14 at 13:41
  • waoo.. downvote for just some missing part...!! – JstRoRR Apr 14 '14 at 13:44
  • 1
    I think you need to post an example of what is coming in from Ruby - otherwise it is too hard to figure out what is going on – John Paul Apr 14 '14 at 13:53
  • John, I added the codes. – JstRoRR Apr 14 '14 at 14:09
  • @JstRoRR That's only useful to the small subset of SO contributers that know both Ruby and R. Can't you show what R is getting? E.g., do you pass a bunch of vectors in a loop and want to add them to a data.frame sequentially? `cbind.data.frame` might be useful. Maybe put them all in a list and use `do.call(cbind.data.frame, listOfVectors)`. – Roland Apr 14 '14 at 14:12
  • Roland, this is what I Have so far..people here asked me to put complete code so I posted, and that is why initially I just abstracted the question keeping rails stuff aside. Goshhhh newaz. – JstRoRR Apr 14 '14 at 14:22
  • I mentioned in the question, in the method cal_norm, I am assigning some ruby variables to R. Actully what you said is right, I want to pass a bunch of vectors in a loop (looping just to create a set of similar variables names differing just in number like col1, col2...) and add them to a data.frame. I tried do.call but again , How R would know the number of variables I passed. I think I need to check RinRuby if I can pass an array to R instead of just variables. – JstRoRR Apr 14 '14 at 14:25
  • Roland, I cant create listOfVectors as I do not know how many columns the file is going to have. This is the actual issue I am facing... – JstRoRR Apr 14 '14 at 14:27
  • @JstRoRR, Just as an aside, last month our flight had an emergency case where a woman complained that her 87 year old husband was suddenly very drowsy, luckily we had few doctors on board and they tried to wake him up, after 20 mins of delay and drama, to our utter surprise the old geezer said "I have taken 2 sleeping pills", the relieved wife exclaimed "You fool!, why didn't you tell me about it!". End of story. No offence to you but **details are very important** :D – Silence Dogood Apr 14 '14 at 15:08
  • @Vivek: thanks for the reply..guess you are my darling angel who luckily knows ruby and R ... by the way I just checked RinRuby and I can pass arrays to R from Ruby and not just variables. So I guess iterating over an object is cool as I need not to worry about number of columns a file has. I tried that and it worked. But many thanks indeed :) – JstRoRR Apr 14 '14 at 15:25
  • @JstRoRR, thank you for the kind words. It's just the simple logic of Ruby that even an total n00b like me can make sense out of it, kudos to the designers! and I am glad that you found an elegant way than the hackish solution we were exploring! – Silence Dogood Apr 14 '14 at 15:35
  • @JstRoRR, could you please answer your question with a new post and close it so that it helps anyone who stumbles on the same issue – Silence Dogood Apr 14 '14 at 15:46

1 Answers1

2

Post updated question:

I am utter newbie of Ruby but found this example here: col wise data

Here column wise data is read into col_data, the 0 here is the (col) index (no Ruby for testing :( )

require 'csv'
col_data = []
CSV.foreach(filename) {|row| col_data << row[0]}

Assign the col data to a variables col1...coln, and create a counter for number of columns (syntax might not be 100% correct)

for i in 0..@calib_data_transpose.length - 1
 #R.assign "col#{i}", @calib_data_transpose[i] 
 CSV.foreach(filename) {|row| "col#{i}" << row[i]}
end

R.col_count=@calib_data_transpose.length - 1

And once col1..coln are created, combine the column data one index at a time starting at i = 1. The result will a data.frame with order of columns as col1.... coln.

R.eval <<-EOF

for(i in 1:col_count) { 
  if (i==1) { 
   df<-data.frame(get(paste0("col",i))) 
  } 
  else { 
   df<-cbind(df,get(paste0("col",i))) 
 } 

 names(df)[i]<-paste0("col",i)
}

EOF

Let us know if this helps...


Not very relevant to updated question anymore but retaining it for posterity.

Subset data.frame for a given pattern

As Roland stated above read.csv will read the entire file, since you wish to control which columns are retained in the data.frame you could do the following:

Using data(mtcars) as sample data.frame

Code:

Read in the data:

> data(mtcars)
> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Subset the data for some condition, say columns beginning with alphabet 'c'

> head(mtcars[,grep("^c",colnames(mtcars))])
                   cyl carb
Mazda RX4           6    4
Mazda RX4 Wag       6    4
Datsun 710          4    1
Hornet 4 Drive      6    1
Hornet Sportabout   8    2
Valiant             6    1

Here '^c' is similar to the pattern pat="^col\\d$" from your question. You could substitute '^c' with any regular expression of your choice e.g '^col'.The '^c' will match any pattern beginning with alphabet 'c', to match at the end of the string use '$c'

Community
  • 1
  • 1
Silence Dogood
  • 3,587
  • 1
  • 13
  • 17