Questions tagged [data-manipulation]

Data manipulation is the process of altering data from a less useful state to a more useful state.

Data manipulation is the process of taking data from either a source or format that isn't easy to read or search into a format or data storage solution that can be quickly read and/or searched. For example, a log's output could be split into rows of a database to make it easier to pull out just the entries that pertain to a situation, or simply reordered to make locating entries based on the ordered field easier. Data manipulation can make data mining easier.

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent usable form for further processing or input to an algorithm or system.

3845 questions

votes

1 answer

Pivot cateorical values into boolean columns SQL

I'm looking to 'flatten' my dataset in order to facilitate data mining. Each categorical column should be changed to multiple Boolean columns. I have a column with categorical values, e.g.: ID col1 1 A 2 B 3 A I'm looking for…

asked Aug 21 '12 at 05:47

Omri374

2,555
3
26
40

votes

8 answers

awk command: if line doesn't starts with a character remove new line on before line

Trying to use awk command to implement this rule: if line doesn't starts with "O|" or "A|" or "S|" I want to remove new line on before line I have this file in input…

string unix awk data-manipulation

asked Dec 19 '22 at 16:30

Luca L

votes

5 answers

Conditionally Concatenating Strings in R

I have this dataset in R: id = 1:5 col1 = c("12 ABC", "123", "AB", "123344567", "1345677.") col2 = c("gggw", "12", "567", "abc 123", "p") col3 = c("abw", "abi", "klo", "poy", "17df") col4 = c("13 AB", "344", "Huh8", "98", "b") my_data =…

r data-manipulation

asked Nov 18 '22 at 19:27

stats_noob

5,401
4
27
83

votes

6 answers

Making Combinations of Items

Suppose I have the following lists of factor: factor_1 = c("A1", "A2", "A3") factor_2 = c("B1", "B2") factor_3 = c("C1", "C2", "C3", "C4") factor_4 = c("D1", "D2", "D3") I made the following data frame that contains all (3 * 2 * 4 * 3 = ) 72…

r random integer data-manipulation

asked Apr 20 '22 at 03:27

stats_noob

5,401
4
27
83

votes

3 answers

Solving Logic Puzzles Using R

I came across the following logic problem: In this problem, you are required to match the real names of basketball players to their nicknames, and sort the basketball players by their heights. Normally, this problem would require you to manually…

r list sorting data-manipulation

asked Dec 30 '21 at 06:03

stats_noob

5,401
4
27
83

votes

3 answers

Create a variable capturing the most frequent occurence by group

Define: df1 <-data.frame( id=c(rep(1,3),rep(2,3)), v1=as.character(c("a","b","b",rep("c",3))) ) s.t. > df1 id v1 1 1 a 2 1 b 3 1 b 4 2 c 5 2 c 6 2 c I want to create a third variable freq that contains the most frequent observation…

r count frequency data-manipulation data-management

asked Jun 28 '11 at 21:38

Fred

1,833
3
24
29

votes

5 answers

Windows command for cutting columns from a text

The following content is stored in a file: chrome.exe 512 Console 0 73,780 K chrome.exe 800 Console 0 11,052 K chrome.exe 1488 Console 0 …

windows command-line data-manipulation cut

asked Dec 14 '10 at 16:56

Vineel Kumar Reddy

4,588
9
33
37

votes

1 answer

pandas merge on date column issue

I am trying to merge two dataframes on date column (tried both as type object or datetime.date, but fails to give desired merge output: import pandas as pd df1 = pd.DataFrame({'amt': {0: 1549367.9496070854, 1: 2175801.78219801, 2:…

python pandas merge data-manipulation

asked Mar 15 '17 at 14:10

muon

12,821
11
69
88

votes

3 answers

Cumulative Sum of a division with varying denominators R

Ok, here is the problem that I would love to solve using an efficient, elegant solution such as data.table or dplyr. Define: DT = data.table(group=c(rep("A",3),rep("B",5)),value=c(2,9,2,3,4,1,0,3)) time group value 1: 1 A 2 2: …

r data.table dplyr data-manipulation

asked Sep 29 '16 at 23:28

EdM

votes

2 answers

dplyr's filter function: how to return every value (or «cancel» the effect of filter)?

This may seem like a weird question, but is there a way to pass a value to filter() that basically does nothing? data(cars) library(dplyr) cars %>% filter(speed==`magic_value_that_returns_cars?`) And you'd get the whole data frame cars back. I'm…

r dplyr data-manipulation

asked Jul 18 '16 at 20:54

brodrigues

1,541
2
14
19

votes

3 answers

Clean R data frame so that in a column no row value is bigger than 2 times next row value

I have a data frame exemplified by the following dist <- c(1.1,1.0,10.0,5.0,2.1,12.2,3.3,3.4) id <- rep("A",length(dist)) df<-cbind.data.frame(id,dist) df id dist 1 A 1.1 2 A 1.0 3 A 10.0 4 A 5.0 5 A 2.1 6 A 12.2 7 A 3.3 8 A 3.4 I…

r dataframe data-manipulation data-cleaning

asked Jan 29 '15 at 17:21

Kristian

votes

1 answer

Fast way to split string and convert to long format in data.table

I do the following library(data.table) library(stringr) dt <- data.table(string_column = paste(sample(c(letters, " "), 500000, replace = TRUE) , sample(c(letters, " "), 500000, replace = TRUE) …

r substring data.table data-manipulation

asked Mar 27 '14 at 04:20

RInatM

1,208
1
17
39

votes

3 answers

Generating a moving sum variable in R

I suspect this is a somewhat simple question with multiple solutions, but I'm still a bit of a novice in R and an exhaustive search didn't yield answers that spoke well to what I'm wanting to do. I'm trying to create, for lack of better term,…

r data-manipulation

asked Jul 10 '13 at 14:25

steve

votes

3 answers

Can I access an object in C++ other than using an expression?

According to C++03 3.10/1 every expression is either an lvalue or an rvalue. When I use = to assign a new value to a variable the variable name on the left of the assignment is an lvalue expression. And it looks like whatever I try to do with a…

c++ expression language-lawyer data-manipulation

asked Dec 10 '12 at 11:51

sharptooth

167,383
100
513
979

votes

5 answers

Efficiently center a large matrix in R

I have a large matrix that I would like to center: X <- matrix(sample(1:10, 5e+08, replace=TRUE), ncol=10000) Finding the the means is quick and efficient with colMeans: means <- colMeans(X) But what's a good (fast and memory efficient) way to…

r center data-manipulation

asked Sep 08 '12 at 16:09

Zach

29,791
35
142
201

Prev 1 2 3

…

99 100 Next