Questions tagged [data-manipulation]

Data manipulation is the process of altering data from a less useful state to a more useful state.

Data manipulation is the process of taking data from either a source or format that isn't easy to read or search into a format or data storage solution that can be quickly read and/or searched. For example, a log's output could be split into rows of a database to make it easier to pull out just the entries that pertain to a situation, or simply reordered to make locating entries based on the ordered field easier. Data manipulation can make data mining easier.

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent usable form for further processing or input to an algorithm or system.

3845 questions
17
votes
6 answers

Vim search and replace, adding a constant

I know this is a long shot, but I have a huge text file and I need to add a given number to other numbers matching some criteria. Eg. identifying text 1.1200 identifying text 1.1400 and I'd like to transform this (by adding say 1.15) to identifying…
Internet man
  • 1,115
  • 3
  • 13
  • 21
17
votes
5 answers

How to find differences between two JavaScript arrays of objects?

I have two JavaScript arrays orig (the original array of objects) and update (the updated orig array of objects) that have the same length and contain objects, and I want to output the differences between the each pair of objects. Example: var orig…
Valip
  • 4,440
  • 19
  • 79
  • 150
17
votes
5 answers

Extract letters from a string in R

I have a character vector containing variable names such as x <- c("AB.38.2", "GF.40.4", "ABC.34.2"). I want to extract the letters so that I have a character vector now containing only the letters e.g. c("AB", "GF", "ABC"). Because the number of…
Moose
  • 275
  • 1
  • 2
  • 7
16
votes
4 answers

REAL() can only be applied to a 'numeric', not a 'integer'

Though question seems to be duplicate, i'm posting this as non of them gave a solution and relevant to my problem. dtrain<-xgb.DMatrix(data=data.matrix(train),label=data[t,c(31)]) Error in xgb.DMatrix(data = data.matrix(train), label = data[t,…
Shankar Pandala
  • 969
  • 2
  • 8
  • 28
16
votes
3 answers

Assign value to group based on condition in column

I have a data frame that looks like the following: > df = data.frame(group = c(1,1,1,2,2,2,3,3,3), date = c(1,2,3,4,5,6,7,8,9), value = c(3,4,3,4,5,6,6,4,9)) > df group date value 1 1 1 3 2 1 2 …
Boudewijn Aasman
  • 1,236
  • 1
  • 13
  • 20
16
votes
3 answers

Implicit sorting in tidyr::spread and dplyr::summarise

My data are ordered observations and I want to keep the ordering as much as possible while doing manipulations. Take the answer for this question, I put "B" ahead of "A" in the dataframe. The resulting wide data are sorted by the column "name",…
Dong
  • 481
  • 4
  • 15
15
votes
2 answers

How to remove groups of observation with dplyr::filter()

For the following data ds <- read.table(header = TRUE, text =" id year attend 1 2007 1 1 2008 1 1 2009 1 1 2010 1 1 2011 1 8 2007 3 8 2008 NA 8 2009 3 8 2010 NA 8 2011 3 9 2007 2 9 2008 3 9…
andrey
  • 2,029
  • 2
  • 18
  • 23
15
votes
8 answers

Flatten a column with value of type list while duplicating the other column's value accordingly in Pandas

Dear power Pandas experts: I'm trying to implement a function to flatten a column of a dataframe which has element of type list, I want for each row of the dataframe where the column has element of type list, all columns but the designated column to…
Yu Shen
  • 2,770
  • 3
  • 33
  • 48
15
votes
3 answers

Check python string format?

I have a bunch of strings but I only want to keep the ones with this format: x/x/xxxx xx:xx What is the easiest way to check if a string meets this format? (Assuming I want to check by if it has 2 /'s and a ':' )
user1487000
  • 1,161
  • 3
  • 12
  • 17
15
votes
2 answers

Sliding time intervals for time series data in R

I am trying to extract interesting statistics for an irregular time series data set, but coming up short on finding the right tools for the job. The tools for manipulating regularly sampled time series or index-based series of any time are pretty…
Iterator
  • 20,250
  • 12
  • 75
  • 111
14
votes
8 answers

Shifting non-NA cells to the left

There are many NA's in my dataset and I need to shift all those cells (at row level) to the left. Example- my dataframe: df=data.frame(x=c("l","m",NA,NA,"p"),y=c(NA,"b","c",NA,NA),z=c("u",NA,"w","x","y")) df x y z 1 …
sidpat
  • 735
  • 10
  • 26
13
votes
1 answer

Replacing NAs in a column with the values of other column

I wonder how to replace NAs in a column with the values of other column in R using dplyr. MWE is below. Letters <- LETTERS[1:5] Char <- c("a", "b", NA, "d", NA) df1 <- data.frame(Letters, Char) df1 library(dplyr] df1 %>% mutate(Char1 =…
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
13
votes
3 answers

Normalize (reformat) cross-tab data for Tableau without using Excel

Tableau generally works best when input data is in "normalized" format, rather than cross-tab. This is also referred to as converting from "wide format" to "long format". That is, converting from: To: Tableau provides a "reshaping tool" for Excel…
Steve Bennett
  • 114,604
  • 39
  • 168
  • 219
13
votes
3 answers

Passing strings as arguments in dplyr verbs

I would like to be able to define arguments for dplyr verbs condition <- "dist > 50" and then use these strings in dplyr functions : require(ggplot2) ds <- cars ds1 <- ds %>% filter (eval(condition)) ds1 But it throws in error Error: filter…
andrey
  • 2,029
  • 2
  • 18
  • 23
11
votes
8 answers

Means across vectors of different lengths

I have 5 vectors of different lengths a <- c(1) #with length of 1 b <- c(4.4,3.5) #length 2 c <- c(5.6,7.8,6.0) #length 3 d <- c(0.8,6.9,8.8,5.8) #length 4 e <- c(1.8,2.5,2.3,6.5,1.1) #length is 5 I am trying to get the mean of elements across all…
Bella_18
  • 624
  • 1
  • 14