How to read certain lines from a text files while ignoring few lines in between using R (also delimit those lines into columns)?

Question

I have a water balance text file of which first 20 lines are not necessary for analysis. Then I have a column names line which I want to preserve and again a line with units and then hyphens I want to ignore. Then I have the data which I want to get right after column names. There are 17 lines of unnecessary data before the column names line and the file generally looks like below:

Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
---------------------------------------
Column_names Column_names Column_names
unit         unit         unit
---------------------------------------
Data Data Data
Data Data Data
Data Data Data
Data Data Data

First, I thought I will use read. table and skip lines above column names and just delete the rows with hyphens below but always got an error called "Error in the scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 20 elements"

So far i have tried:

data1 <- read.table("2_wat.txt", skip = 17, sep = '\t')

If I do following I get the data but lose column names

data1 <- read.table("2_wat.txt", skip = 22)

If anyone has suggestions for then I will greatly appreciate your help.

You can read the header with `readLines` and set the names – akrun Jul 24 '19 at 03:33 — akrun, Jul 24 '19 at 03:33

GKi · Answer 1 · 2019-07-24T07:21:22.677

1

An easy way would be to set the column names per hand like:

data1 <- read.table("2_wat.txt", skip = 22, col.names=c("col1", "col2", "col3"))

Or you read two times. Once the header and the other one the data like:

tt <- read.table("2_wat.txt", skip = 17, sep = '\t', nrows=1, as.is = TRUE)
data1 <- read.table("2_wat.txt", skip = 22)
colnames(data1)  <- tt

edited Jul 24 '19 at 07:21

answered Jul 24 '19 at 06:05

GKi

37,245
2
26
48

Hello, Thank you so much for your help. Doing first sounds easy two but i have many colomns so looking for much simpler process. Even though this 2nd approach looks easy and each line of code worked well. colnames(data1) <- tt , didnt bring the expected result. It only returned me colomn with name like 1.1 1.2 1.3 1.4 ..... – Samrat Jul 24 '19 at 07:15
@Samrat Maybe then the header is not in line 17? I updated the answer and included `as.is`. – GKi Jul 24 '19 at 07:21
Thank you very much that also worked out very well. I very much appreciate you taking your time to help me :) – Samrat Jul 25 '19 at 16:51

score 0 · Answer 2 · edited Aug 14 '19 at 18:28

It should be easier to read all lines as a vector of characters using readLines.

Then you can treat each element separately.

# preapare data
txt_path <- tempfile(fileext = "txt")
con <- file(txt_path)
txt <-"Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
Unnecessary lines
---------------------------------------
Column_names Column_names Column_names
unit         unit         unit
---------------------------------------
Data Data Data
Data Data Data
Data Data Data
Data Data Data"
writeLines(txt, con)
close(con)

# read txt file line by line, it returns a vector of characters
txt_vec <- readLines(con = txt_path)

headers <- unlist(strsplit(txt_vec[8]," "))

out <- as.data.frame(strsplit(txt_vec[11:14]," "),col.names = headers)

> print(out)

  Column_names Column_names.1 Column_names.2  NA.
1         Data           Data           Data Data
2         Data           Data           Data Data
3         Data           Data           Data Data

Thank you very much! This worked out pretty well. And i learnt a new thing. — Samrat, Jul 24 '19 at 07:25

How to read certain lines from a text files while ignoring few lines in between using R (also delimit those lines into columns)?

2 Answers2