0

I have written a script to read a file from disk and check the values in it and write 3 other files on disk. Unfortunately, something that seemed to be very straight forward came out being a headache. The code is:

Arqcodnegs ="result/lista_de_codnegs.txt"
dirout   = "./result/"
Codnegs_fornecidos = c("ABC", "A1B2", "PETR3")

Verifica_codneg = function (Codnegs_fornecidos, Arqcodnegs) {

  if (!file.exists(Arqcodnegs)) {
    stop("Falta arquivo lista_de_codnegs.txt")
  }

  Codnegs_lidos = read.table(Arqcodnegs,header=FALSE, sep='\t', quote='\"', stringsAsFactors=TRUE)

  Codnegs_negativos = c(setdiff (Codnegs_fornecidos, Arqcodnegs))

  Codnegs_positivos = c(intersect (Codnegs_fornecidos, Arqcodnegs))

  write.table(Codnegs_lidos, paste(dirout, "lista_de_codnegs_lidos.txt", sep=''), col.names=FALSE, row.names=FALSE, sep='\t')

  write.table(Codnegs_negativos, paste(dirout, "lista_de_codnegs_negativos.txt", sep=''), col.names=FALSE, row.names=FALSE, sep='\t')

  write.table(Codnegs_positivos, paste(dirout, "lista_de_codnegs_positivos.txt", sep=''), col.names=FALSE, row.names=FALSE, sep='\t')

}

The file "lista_de_codnegs.txt" has the following values in it:

"PDGR3" "PETR3" "PETR4"

As expected, the file "lista_de_codnegs_lidos.txt" returns the appropriate values in one single column, meaning "PDGR3", "PETR3" and "PETR4".

But, the main problems are:

  1. The file "lista_de_codnegs_negativos.txt" returns "ABC", "A1B2", "PETR3", but the it should have returned "ABC" and "A1B2", only.

  2. The file "lista_de_codnegs_positivos.txt" returns no values, but it should have returned "PETR3".

What am I doing wrong?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    You are ocmparing to `Arqcodnegs`, which contains none of the variables from `Codnegs_fornecidos`. So, expected behaviour. What do you really want to compare it to? colnames of codnegs_lidos? – Heroka Sep 15 '15 at 15:52
  • I want to check if Codnegs_fornecidos can be found in the file Arqcodnegs. The Codnegs_fornecidos found are written in the file Codnegs_positivos, and the ones not found are written in the file Codnegs_negativos. – Newbie1971 Sep 15 '15 at 16:08
  • You could do `codnegs_negativos=setdiff(codnegs_fornecidos, codnegs_lidos)`, but depends on what your data looks like. – Heroka Sep 15 '15 at 16:14
  • Yes, I could compare Codnegs_fornecidos with Codnegs_lidos, as you have indicated. However, I prefer to compare with Arqcodnegs because I use Arqcodnegs as a form of control, and check for discrepancies later. – Newbie1971 Sep 15 '15 at 16:25
  • @Newbie1971 You understand that you're working toward `Arqcodnegs ="result/lista_de_codnegs.txt"`, i.e a character string, and not to the content of the file loaded into your `Codnegs_lidos` variable ? Please confirm me this, if you changed your code, please [edit] your question – Tensibai Sep 15 '15 at 16:35
  • @Tensibai I want to compare Codnegs_fornecidos = c("ABC", "A1B2", "PETR3") with the content of the file Arqcodnegs ="result/lista_de_codnegs.txt" and generate 3 output files. – Newbie1971 Sep 15 '15 at 17:00
  • If you're working with the contents of a file, you need to read it. Right now, all you have is the address of the file. You did read it, but stored it somewhere else. – Heroka Sep 15 '15 at 17:24
  • @Heroka I am reading the file by: Codnegs_lidos = read.table(Arqcodnegs,header=FALSE, sep='\t', quote='\"', stringsAsFactors=TRUE) – Newbie1971 Sep 15 '15 at 19:08
  • Yes you are, but the file contents (which you apparently want to do something) are stored under another variable. – Heroka Sep 15 '15 at 21:05
  • @Heroka so, what is your suggestion? – Newbie1971 Sep 16 '15 at 14:59
  • @Newbie compare things to something you actually want to compare them to, and not to the address. So for this example you should probably compare to Codnegs_lidos. – Heroka Sep 16 '15 at 15:02

1 Answers1

0

Your issue is here, as mentioned by @Heroka Change Arqcodnegs to Codnegs_lidos and it will function correctly as Arqcodnegs is a character string referring to a file path. As such setdiff() is looking to find the difference between the character string and the object Codnegs_fornecidos

  Codnegs_negativos = c(setdiff (Codnegs_fornecidos, Arqcodnegs))

  Codnegs_positivos = c(intersect (Codnegs_fornecidos, Arqcodnegs))

**EDIT: This code should get what you're after.

Arqcodnegs ="result/lista_de_codnegs.txt"
dirout   = "./results/"
dir.create(dirout)
Codnegs_fornecidos = c("ABC", "A1B2", "PETR3")

Verifica_codneg = function (Codnegs_fornecidos, Arqcodnegs) {


  Codnegs_lidos = read.table(Arqcodnegs,header=FALSE, sep='\t', quote='\"', stringsAsFactors=TRUE)

  Codnegs_negativos = c(setdiff (Codnegs_fornecidos, Codnegs_lidos))

  Codnegs_positivos = c(intersect (Codnegs_fornecidos, Codnegs_lidos))

  write.table(Codnegs_lidos, paste(dirout, "lista_de_codnegs_lidos.txt", sep=''), col.names=FALSE, row.names=FALSE, sep='\t')

  write.table(Codnegs_negativos, paste(dirout, "lista_de_codnegs_negativos.txt", sep=''), col.names=FALSE, row.names=FALSE, sep='\t')

  write.table(Codnegs_positivos, paste(dirout, "lista_de_codnegs_positivos.txt", sep=''), col.names=FALSE, row.names=FALSE, sep='\t')

}

Verifica_codneg(Codnegs_fornecidos = Codnegs_fornecidos, Arqcodnegs = Arqcodnegs)
Badger
  • 1,043
  • 10
  • 25
  • I have changed as pointed out, and it is not returning the appropriate answer. What I got is: > Codnegs_positivos character(0) > Codnegs_negativos [1] "result/lista_de_codnegs.txt" > – Newbie1971 Sep 15 '15 at 16:13
  • 1
    The character string, within the context of the question is a destination. I do see where you are coming from and will adjust my wording as technically it is simply a character string, thank you. – Badger Sep 15 '15 at 16:40
  • @HoneyDippedBadger I tried the code you provided above, and it returned: > Codnegs_negativos [1] "ABC" "A1B2" "PETR3" > Codnegs_positivos list() The expected answers would be Codnegs_negativos "ABC" "A1B2" and Codneg_positivos "PETR4" – Newbie1971 Sep 15 '15 at 16:57
  • @Tensibai I am sorry, but I am not understanding your comment above. – Newbie1971 Sep 15 '15 at 17:01
  • Very strange, as I am getting what you are looking for. – Badger Sep 15 '15 at 17:05
  • @HoneyDippedBadger I ended RStudio, tried a new script with the code you provided and also a new file "lista_de_codnegs.txt", and I am still having the same results as before. No clue why we are getting different results.... – Newbie1971 Sep 15 '15 at 19:06
  • @HoneyDippedBadger I have tried everything again, and in a different computer, and the results are still the same: > Codnegs_lidos V1 1 PDGR3 2 PETR3 3 PETR4 > Codnegs_negativos [1] "ABC" "A1B2" "PETR3" > Codnegs_positivos list() I dont know what could be wrong. – Newbie1971 Sep 16 '15 at 14:36
  • Try using just this: `Arqcodnegs = c("PDGR3", "PETR3" , "PETR4") Codnegs_fornecidos = c("ABC", "A1B2", "PETR3") Codnegs_lidos = Arqcodnegs Codnegs_negativos = c(setdiff (Codnegs_fornecidos, Codnegs_lidos)) Codnegs_positivos = c(intersect (Codnegs_fornecidos, Codnegs_lidos))` Then work it back to your original intent. – Badger Sep 16 '15 at 14:39
  • @HoneyDippedBadger now it worked! It seems the problem is related with the read.table function. Any clue about how to work this around?> Arqcodnegs [1] "PDGR3" "PETR3" "PETR4" > Codnegs_lidos [1] "PDGR3" "PETR3" "PETR4" > Codnegs_negativos [1] "ABC" "A1B2" > Codnegs_positivos [1] "PETR3" – Newbie1971 Sep 16 '15 at 15:05
  • @Newbie1971 Is your data separated by a tab or a comma? "\t" indicates a tab, if they are not a tab then you would have all the items in a single column and row, resulting in the wrong output. Glad it worked :) – Badger Sep 16 '15 at 15:35
  • @HoneyDippedBadger By tab. – Newbie1971 Sep 16 '15 at 16:00
  • @HoneyDippedBadger I found the solution. Just replace the line Codnegs_lidos = read.table(Arqcodnegs,header=FALSE, sep='\t', quote='\"', stringsAsFactors=TRUE) by Codnegs_lidos = as.matrix(read.table(Arqcodnegs,header=FALSE, sep='\t', quote='\"', stringsAsFactors=TRUE)). – Newbie1971 Sep 16 '15 at 19:17