Questions tagged [data-manipulation]

Data manipulation is the process of altering data from a less useful state to a more useful state.

Data manipulation is the process of taking data from either a source or format that isn't easy to read or search into a format or data storage solution that can be quickly read and/or searched. For example, a log's output could be split into rows of a database to make it easier to pull out just the entries that pertain to a situation, or simply reordered to make locating entries based on the ordered field easier. Data manipulation can make data mining easier.

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent usable form for further processing or input to an algorithm or system.

3845 questions
7
votes
5 answers

How do you handle the fetchxml result data?

I have avoided working with fetchxml as I have been unsure the best way to handle the result data after calling crmService.Fetch(fetchXml). In a couple of situations, I have used an XDocument with LINQ to retrieve the data from this data structure,…
Luke Baulch
  • 3,626
  • 6
  • 36
  • 44
6
votes
3 answers

Expand a matrix of rankings (1 ~ 4) to a bigger binary matrix

I have a matrix which I want to convert to one with binary output (0 vs 1). The matrix to be converted contains four rows of rankings (1 to 4): mat1.data <- c(4, 3, 3, 3, 3, 2, 2, 1, 1, 1, 3, 4, 2, 4, 2, 3, 1, 3, 3,…
cliu
  • 933
  • 6
  • 13
6
votes
7 answers

R: Recursive Averages

I am working with the R programming language. I have the following data: library(dplyr) my_data = data.frame(id = c(1,1,1,1,2,2,2,3,4,4,5,5,5,5,5), var_1 = sample(c(0,1), 15, replace = TRUE) , var_2 =sample(c(0,1), 15 , replace = TRUE) ) my_data =…
stats_noob
  • 5,401
  • 4
  • 27
  • 83
6
votes
7 answers

Python - Push forward weekend values to Monday

I have a dataframe (called df) that looks like this: I'm trying to take all weekend 'Volume' values (the ones where column 'WEEKDAY'=5 (saturday) or 6(sunday)) and sum them to the subsequent monday(WEEKDAY=0). I tried a few things but nothing…
6
votes
1 answer

R: Extracting Rules from a Decision Tree

I am working with the R programming language. Recently, I read about a new decision tree algorithm called "Reinforcement Learning Trees" (RLT) which supposedly has the potential to fit "better" decision trees to a dataset. The documentation for…
stats_noob
  • 5,401
  • 4
  • 27
  • 83
6
votes
1 answer

Filter nan values out of rows in pandas

I am working on a calculator to determine what to feed your fish as a fun project to learn python, pandas, and numpy. My data is organized like this: As you can see, my fishes are rows, and the different foods are columns. What I am hoping to do,…
6
votes
0 answers

Is there an R package which can modify an existing PDF file?

I need a package that can convert or give me all of the information contained on a pdf file and that later I can produce a new pdf file with some of that information being changed. pdftools can give me a lot of the information, but I was not able to…
6
votes
1 answer

R dplyr - Rearrange columns by pattern of names

I've got some long format data that 1) needs to be reshaped to wide and then 2) needs the columns resorted according pattern of their names. The example data is below: #Orignial data set.seed(100) long_df <- tibble(id = rep(1:5, each = 3), …
Paryl
  • 201
  • 2
  • 11
6
votes
1 answer

Identify only non duplicated rows

I have a dataset with many duplicated rows, and I would like to isolate only non duplicated values. my df looks something like this df <- data.frame("group" = c("A", "A", "A","A","A","B","B","B"), "id" = c("id1", "id2", "id3",…
Alex
  • 1,207
  • 9
  • 25
6
votes
1 answer

Any workaround to find optimal threshold for filtering raw features based on correlation matrix in R?

I intended to extract highly correlated features by measuring its Pearson correlation, and I got a correlation matrix by doing that. However, for filtering high correlated features, I selected correlation coefficient arbitrarily, I don't know the…
Jerry07
  • 929
  • 1
  • 10
  • 28
6
votes
3 answers

Split a data frame into overlapping dataframes

I'm trying to write a function that behaves as follows, but it is proving very difficult: DF <- data.frame(x = seq(1,10), y = rep(c('a','b','c','d','e'),2)) > DF x y 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e 6 6 a 7 7 b 8 8 c 9 9 d 10 10…
Zach
  • 29,791
  • 35
  • 142
  • 201
6
votes
1 answer

R: create a data frame out of a rolling window

Lets say I have a data frame with the following structure: DF <- data.frame(x = 0:4, y = 5:9) > DF x y 1 0 5 2 1 6 3 2 7 4 3 8 5 4 9 what is the most efficient way to turn 'DF' into a data frame with the following structure: w x y 1 0 5 1 1 6 2 1…
Zach
  • 29,791
  • 35
  • 142
  • 201
6
votes
1 answer

R - Using purrr to replace NULL elements with NA in a list of lists

I am trying to replace the NULL elements of the list below with NAs inside a map() before using rbindlist on the cleaned list: m = list(min = list(id = "min", val = NULL), max = list(id = "max", val = 7), split = list(id = "split", val =…
user51462
  • 1,658
  • 2
  • 13
  • 41
6
votes
1 answer

Cumulative minimum value by group

I want to calculate cumulative min within a given group. My current data frame: Group <- c('A', 'A', 'A','A', 'B', 'B', 'B', 'B') Target <- c(1, 0, 5, 0, 3, 5, 1, 3) data <- data.frame(Group, Target)) My desired output: Desired.Variable <- c(1,…
Lisa
  • 63
  • 2
6
votes
1 answer

Any workaround to construct temperature distribution over multi-layers raster in R

Here I found a very interesting blog:critical threshold in temperature effects and empirical approach is very interesting, so I want to implement its idea in R. However, I have multi-layer raster data of German' historical daily temperatures (15…
jyson
  • 245
  • 1
  • 8
  • 27