1

I have a water balance text file of which first 20 lines are not necessary for analysis. Then I have a column names line which I want to preserve and again a line with units and then hyphens I want to ignore. Then I have the data which I want to get right after column names. There are 17 lines of unnecessary data before the column names line and the file generally looks like below:

Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
---------------------------------------
Column_names Column_names Column_names
unit         unit         unit
---------------------------------------
Data Data Data
Data Data Data
Data Data Data
Data Data Data

First, I thought I will use read. table and skip lines above column names and just delete the rows with hyphens below but always got an error called "Error in the scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 20 elements"

So far i have tried:

data1 <- read.table("2_wat.txt", skip = 17, sep = '\t')

If I do following I get the data but lose column names

data1 <- read.table("2_wat.txt", skip = 22)

If anyone has suggestions for then I will greatly appreciate your help.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Samrat
  • 79
  • 1
  • 10

2 Answers2

1

An easy way would be to set the column names per hand like:

data1 <- read.table("2_wat.txt", skip = 22, col.names=c("col1", "col2", "col3"))

Or you read two times. Once the header and the other one the data like:

tt <- read.table("2_wat.txt", skip = 17, sep = '\t', nrows=1, as.is = TRUE)
data1 <- read.table("2_wat.txt", skip = 22)
colnames(data1)  <- tt
GKi
  • 37,245
  • 2
  • 26
  • 48
  • Hello, Thank you so much for your help. Doing first sounds easy two but i have many colomns so looking for much simpler process. Even though this 2nd approach looks easy and each line of code worked well. colnames(data1) <- tt , didnt bring the expected result. It only returned me colomn with name like 1.1 1.2 1.3 1.4 ..... – Samrat Jul 24 '19 at 07:15
  • @Samrat Maybe then the header is not in line 17? I updated the answer and included `as.is`. – GKi Jul 24 '19 at 07:21
  • Thank you very much that also worked out very well. I very much appreciate you taking your time to help me :) – Samrat Jul 25 '19 at 16:51
0

It should be easier to read all lines as a vector of characters using readLines.

Then you can treat each element separately.

# preapare data
txt_path <- tempfile(fileext = "txt")
con <- file(txt_path)
txt <-"Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
---------------------------------------
Column_names Column_names Column_names
unit         unit         unit
---------------------------------------
Data Data Data
Data Data Data
Data Data Data
Data Data Data"
writeLines(txt, con)
close(con)

# read txt file line by line, it returns a vector of characters
txt_vec <- readLines(con = txt_path)

headers <- unlist(strsplit(txt_vec[8]," "))

out <- as.data.frame(strsplit(txt_vec[11:14]," "),col.names = headers)
> print(out)

  Column_names Column_names.1 Column_names.2  NA.
1         Data           Data           Data Data
2         Data           Data           Data Data
3         Data           Data           Data Data
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
yusuzech
  • 5,896
  • 1
  • 18
  • 33