Questions tagged [data-manipulation]

Data manipulation is the process of altering data from a less useful state to a more useful state.

Data manipulation is the process of taking data from either a source or format that isn't easy to read or search into a format or data storage solution that can be quickly read and/or searched. For example, a log's output could be split into rows of a database to make it easier to pull out just the entries that pertain to a situation, or simply reordered to make locating entries based on the ordered field easier. Data manipulation can make data mining easier.

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent usable form for further processing or input to an algorithm or system.

3845 questions

votes

5 answers

How do you handle the fetchxml result data?

I have avoided working with fetchxml as I have been unsure the best way to handle the result data after calling crmService.Fetch(fetchXml). In a couple of situations, I have used an XDocument with LINQ to retrieve the data from this data structure,…

asked Jul 31 '09 at 00:17

Luke Baulch

3,626
6
36
44

votes

3 answers

Expand a matrix of rankings (1 ~ 4) to a bigger binary matrix

I have a matrix which I want to convert to one with binary output (0 vs 1). The matrix to be converted contains four rows of rankings (1 to 4): mat1.data <- c(4, 3, 3, 3, 3, 2, 2, 1, 1, 1, 3, 4, 2, 4, 2, 3, 1, 3, 3,…

r matrix data-manipulation data-conversion ranking

asked Jul 16 '22 at 14:03

cliu

votes

7 answers

R: Recursive Averages

I am working with the R programming language. I have the following data: library(dplyr) my_data = data.frame(id = c(1,1,1,1,2,2,2,3,4,4,5,5,5,5,5), var_1 = sample(c(0,1), 15, replace = TRUE) , var_2 =sample(c(0,1), 15 , replace = TRUE) ) my_data =…

r dplyr data-manipulation

asked Jun 06 '22 at 02:15

stats_noob

5,401
4
27
83

votes

7 answers

Python - Push forward weekend values to Monday

I have a dataframe (called df) that looks like this: I'm trying to take all weekend 'Volume' values (the ones where column 'WEEKDAY'=5 (saturday) or 6(sunday)) and sum them to the subsequent monday(WEEKDAY=0). I tried a few things but nothing…

python dataframe datetime data-manipulation weekend

asked Mar 22 '22 at 17:01

Bruno Di Franco Albuquerque

votes

1 answer

R: Extracting Rules from a Decision Tree

I am working with the R programming language. Recently, I read about a new decision tree algorithm called "Reinforcement Learning Trees" (RLT) which supposedly has the potential to fit "better" decision trees to a dataset. The documentation for…

r tree data-manipulation prediction decision-tree

asked Nov 02 '21 at 02:46

stats_noob

5,401
4
27
83

votes

1 answer

Filter nan values out of rows in pandas

I am working on a calculator to determine what to feed your fish as a fun project to learn python, pandas, and numpy. My data is organized like this: As you can see, my fishes are rows, and the different foods are columns. What I am hoping to do,…

python pandas dataframe data-manipulation

asked Feb 17 '21 at 12:33

Bigglesworth95

votes

0 answers

Is there an R package which can modify an existing PDF file?

I need a package that can convert or give me all of the information contained on a pdf file and that later I can produce a new pdf file with some of that information being changed. pdftools can give me a lot of the information, but I was not able to…

r file pdf data-manipulation

asked Jul 20 '20 at 01:16

waltertheves

votes

1 answer

R dplyr - Rearrange columns by pattern of names

I've got some long format data that 1) needs to be reshaped to wide and then 2) needs the columns resorted according pattern of their names. The example data is below: #Orignial data set.seed(100) long_df <- tibble(id = rep(1:5, each = 3), …

r dplyr tidyverse reshape data-manipulation

asked May 06 '20 at 17:14

Paryl

votes

1 answer

Identify only non duplicated rows

I have a dataset with many duplicated rows, and I would like to isolate only non duplicated values. my df looks something like this df <- data.frame("group" = c("A", "A", "A","A","A","B","B","B"), "id" = c("id1", "id2", "id3",…

r unique rows data-manipulation

asked Sep 27 '19 at 15:48

Alex

1,207
9
25

votes

1 answer

Any workaround to find optimal threshold for filtering raw features based on correlation matrix in R?

I intended to extract highly correlated features by measuring its Pearson correlation, and I got a correlation matrix by doing that. However, for filtering high correlated features, I selected correlation coefficient arbitrarily, I don't know the…

r correlation data-manipulation feature-extraction

asked Jul 20 '19 at 21:58

Jerry07

votes

3 answers

Split a data frame into overlapping dataframes

I'm trying to write a function that behaves as follows, but it is proving very difficult: DF <- data.frame(x = seq(1,10), y = rep(c('a','b','c','d','e'),2)) > DF x y 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e 6 6 a 7 7 b 8 8 c 9 9 d 10 10…

r dataframe data-manipulation data-management

asked Apr 13 '11 at 18:23

Zach

29,791
35
142
201

votes

1 answer

R: create a data frame out of a rolling window

Lets say I have a data frame with the following structure: DF <- data.frame(x = 0:4, y = 5:9) > DF x y 1 0 5 2 1 6 3 2 7 4 3 8 5 4 9 what is the most efficient way to turn 'DF' into a data frame with the following structure: w x y 1 0 5 1 1 6 2 1…

r data-manipulation data-management rolling-computation

asked Apr 04 '11 at 19:32

Zach

29,791
35
142
201

votes

1 answer

R - Using purrr to replace NULL elements with NA in a list of lists

I am trying to replace the NULL elements of the list below with NAs inside a map() before using rbindlist on the cleaned list: m = list(min = list(id = "min", val = NULL), max = list(id = "max", val = 7), split = list(id = "split", val =…

r list data-manipulation purrr

asked Feb 28 '19 at 23:50

user51462

1,658
2
13
41

votes

1 answer

Cumulative minimum value by group

I want to calculate cumulative min within a given group. My current data frame: Group <- c('A', 'A', 'A','A', 'B', 'B', 'B', 'B') Target <- c(1, 0, 5, 0, 3, 5, 1, 3) data <- data.frame(Group, Target)) My desired output: Desired.Variable <- c(1,…

r data-manipulation

asked Feb 01 '19 at 16:50

Lisa

votes

1 answer

Any workaround to construct temperature distribution over multi-layers raster in R

Here I found a very interesting blog:critical threshold in temperature effects and empirical approach is very interesting, so I want to implement its idea in R. However, I have multi-layer raster data of German' historical daily temperatures (15…

r time-series raster data-manipulation

asked Jun 20 '18 at 04:42

jyson

Prev 1 2 3

…

99 100 Next