Questions tagged [data-manipulation]

Data manipulation is the process of altering data from a less useful state to a more useful state.

Data manipulation is the process of taking data from either a source or format that isn't easy to read or search into a format or data storage solution that can be quickly read and/or searched. For example, a log's output could be split into rows of a database to make it easier to pull out just the entries that pertain to a situation, or simply reordered to make locating entries based on the ordered field easier. Data manipulation can make data mining easier.

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent usable form for further processing or input to an algorithm or system.

3845 questions
10
votes
6 answers

insert missing category for each group in pandas dataframe

I need to insert missing category for each group, here is an example: import pandas as pd import numpy as np df = pd.DataFrame({ "group":[1,1,1 ,2,2], "cat": ['a', 'b', 'c', 'a', 'c'] , "value": range(5), …
muon
  • 12,821
  • 11
  • 69
  • 88
10
votes
3 answers

Iteratively and hierarchically cycle through rows till a condition is met

I'm trying to solve a data management problem in R. Suppose my data looks as follows: id <- c("123", "414", "606") next.up <- c("414", "606", "119") is.cond.met <- as.factor(c("FALSE", "FALSE", "TRUE")) df <- data.frame(id, next.up, is.cond.met) >…
Thomas Speidel
  • 1,369
  • 1
  • 14
  • 26
10
votes
2 answers

R multiple statistics for multiple columns with data.table

I want the same results as in R summarizing multiple columns with data.table but for several summary functions. Here is an example data <- as.data.table(list(x1 = runif(200), x2 = 10*runif(200), group = factor(sample(letters[1:2])))) res <- data[,…
RInatM
  • 1,208
  • 1
  • 17
  • 39
9
votes
3 answers

Javascript JSON data manipulation library

I'm currently working on a project where I'm dealing with a fair amount of JSON data being transmitted backwards and forwards and stored by the browser as lists of javascript objects. For example: person: { // Primary Key key: "id", // The…
Steven de Salas
  • 20,944
  • 9
  • 74
  • 82
9
votes
4 answers

How can we check if any 2 intervals of a unique ID overlaps?

I have data of patient prescription of oral DM drugs, i.e. DPP4 and SU, and would like to find out if patients had taken the drugs concurrently (i.e. whether there are overlapping intervals for DPP4 and SU within the same patient ID). Sample data: …
HNSKD
  • 1,614
  • 2
  • 14
  • 25
9
votes
1 answer

dplyr : how-to programmatically full_join dataframes contained in a list of lists?

Context and data structure I'll share with you a simplified version of my huge dataset. This simplified version fully respects the structure of my original dataset but contains less list elements, dataframes, variables and observations than the…
pokyah
  • 163
  • 1
  • 9
9
votes
3 answers

Select nth element from multidimensional JSON array with jq

How can I use jq to transform this array of arrays: [ [ "sequence", "int" ], [ "time", "string" ], ... ] Into an array that contains the first (0) element from every subarray? Meaning to produce output like this: [ …
Dreen
  • 6,976
  • 11
  • 47
  • 69
9
votes
3 answers

Transpose data by groups in R

I have data in the following structure: x <- read.table(header=T, text=" X Y D S a e 1 10 a e 2 20 a f 1 50 b c 1 40 b c 2 30 b c 3 60 b d 1 10 b d 2 20") And I want to get the following result: X Y 1 2 3 a e 10 20 a f 50 b c 40 30 …
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155
8
votes
6 answers

R Count unique values without specific symbol

I have a dataframe 'df' that has categorical and POSIXct columns. The data look like: Category DateTime A 2022-08-29 00:00:00 A 2022-08-29 00:00:00 A 1 2022-08-29 00:00:00 A 1 2022-08-29 00:00:00 A 1 2022-08-29…
Jacob
  • 329
  • 2
  • 10
8
votes
3 answers

Looking for a sequential pattern with condition

I have a df as Id Event SeqNo 1 A 1 1 B 2 1 C 3 1 ABD 4 1 A 5 1 C 6 1 A 7 1 CDE 8 1 D 9 1 B 10 1 ABD 11 1 D 12 1 B 13 1 CDE …
No_body
  • 832
  • 6
  • 21
8
votes
2 answers

Reshape data frame from wide to panel with multiple variables and some time invariant

This is a basic problem in data analysis which Stata deals with in one step. Create a wide data frame with time invariant data (x0) and time varying data for years 2000 and 2005 (x1,x2): d1 <- data.frame(subject = c("id1", "id2"), x0 = c("male",…
Fred
  • 1,833
  • 3
  • 24
  • 29
8
votes
1 answer

Using group by and tidy to run several models and extract results to dataframe

I would like to use group_by %>% do(tidy(*)) to run several linear regression models and to extract model results to the data frame. The data frame should include the following for each model: outcome variable, exposure variable, sample size, beta…
aelhak
  • 441
  • 4
  • 14
8
votes
2 answers

Postgres: convert single row to multiple rows (unpivot)

I have a table: Table_Name: price_list --------------------------------------------------- | id | price_type_a | price_type_b | price_type_c | --------------------------------------------------- | 1 | 1234 | 5678 | 9012 | |…
skybunk
  • 833
  • 2
  • 12
  • 17
8
votes
5 answers

Converting all occurrence of True/False to 1/0 in a dataframe with mixed datatype

I have a dataframe that has about 100 columns, There are some Boolean columns and some chars. I want to replace all Boolean having values True/False and also -1 with 1/0. I want to apply it on whole dataframe instead of single column. I saw some…
muni
  • 1,263
  • 4
  • 22
  • 31
8
votes
1 answer

Transpose only certain columns in data.frame

Here is the data I have: am group v1 v2 v3 v4 1 2015-10-31 A 693 803 700 17% 2 2015-10-31 B 524 859 302 77% 3 2015-10-31 C 266 675 86 7% 4 2015-10-31 D 376 455 650 65% 5 2015-11-30 A 618…
Ken
  • 863
  • 3
  • 13
  • 24