Questions tagged [data-munging]

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.

236 questions
0
votes
0 answers

Data Munging Challenge. How do I join the correct coefficients to the correct observation in a summarized table

Before I start, the a basic answer to this question can be found here: Correctly binding coefficients to summarized table This question is different in the fact that I need to correctly join the correct coefficients to the correct position in the…
Jordan
  • 1,415
  • 3
  • 18
  • 44
0
votes
1 answer

Correctly binding coefficients to summarized table

I have a glm model and a summarized dataset that requires I bind the coefficients, standard error and p.value from the summary of the model to the summarized dataset. For an example, I used the mtcars data set. I added columns to the final unioned…
Jordan
  • 1,415
  • 3
  • 18
  • 44
0
votes
1 answer

data cleaning - conversion to tidyverse

I am curious if the following code could be converted to tidyverse code. I've tried dplyr::mutate and haven't been able to get it to work quite right. df$Gender[df$Gender == "M"] <- "Man" df$Gender[df$Gender == "Male"] <- "Man" df$Gender[df$Gender…
AB Cross
  • 13
  • 1
0
votes
1 answer

Replacing elements in a dataframe not contained in a vector

Simple problem, but I couldn't find a solution: How to replace all elements in a dataframe not contained in a vector with a specific string? My dataframe looks like this: ID <- sample(1:8) Country <- c("USA", "RUS", "Unknown", "Not specified",…
guillem
  • 23
  • 3
0
votes
1 answer

MySQL add period to name initials if no period exists

I have a dataset like this: Juan Corona Jane L Doe John Q. Public R S Fitzgerald I need to clean this up so it's: Juan Corona Jane L. Doe John Q. Public R. S. Fitzgerald But since MySQL doesn't support regex search and replace I feel like I'm in a…
Slam
  • 3,125
  • 1
  • 15
  • 24
0
votes
1 answer

Getting the counts of the number of months a user has been using a particular service in SQL

I have data like below: user_id month_id service 895 201612 S 262 201612 V 5300 201612 BB Now there can be users who have used more than one service in a year, and I would like to have a query which gives me that. For…
Shuvayan Das
  • 1,198
  • 3
  • 20
  • 40
0
votes
1 answer

Regular Expressions not working with Pandas Dataframe

I've got a Pandas Dataframe that's made up of emails that I'm needing to clean using regex. However, my attempts to clean the column, aren't actually being applied to the text. Example data is below: |subeject | description …
James C
  • 23
  • 3
0
votes
1 answer

Elegently reposition values within a dataframe

I'm working with the text layer of a PDF and have some minor corrections to make... The tidy dataframe I've generated has one or two data values that are off by a row. I have the 'coordinates' of the incorrectly positioned values (defined by a…
joga
  • 207
  • 2
  • 4
  • 10
0
votes
3 answers

split vector or data.frame into intervals by condition and print interval's first and last value

I have data.frame which looks like this: v1 <- c(1:10) v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) dfb <- data.frame(v1, v2) > dfb v1 v2 1 1 FALSE 2 2 FALSE 3 3 TRUE 4 4 FALSE 5 5 FALSE 6 6 FALSE 7…
Wakan Tanka
  • 7,542
  • 16
  • 69
  • 122
0
votes
1 answer

One to Many merge row level

I have run into a data problem that I sense many have encountered. I Currently have a data set which contains transaction information. Based on the Transaction_Number I will find how long each person involved in transaction spent on their part. The…
CodeNoob
  • 9
  • 4
0
votes
0 answers

Apply factor column values to new columns in R

Did some extensive searching but couldn't find a solution. I have a dataframe that looks like this: FAC | NUM | VAL A | 1 | 100 A | 2 | 200 B | 1 | 300 B | 2 | 200 And I want it to look like this: NUM | A | B 1 | 100 | 300 2 …
Stu Richards
  • 141
  • 1
  • 11
0
votes
2 answers

Regex help for find and replace

I have a file in notepad with this text as example: *Given* I get an user ID from XXX *And* I set header "Authorization" with value "invalid_token" *When* I send a POST request to api/endpoint/"documentlibraryID"/"identity_id"/root/"new_name" *Then*…
ryoishikawa74
  • 177
  • 3
  • 11
0
votes
3 answers

Data munging with python: transforming string into rows

I'm fairly new to python, and I need to perform some data munging. I want some advice on the best practice for this: libraries, modules, better code to implementment, or just direction. So I have text file with data organised in the following…
tyrfingnir
  • 11
  • 3
0
votes
3 answers

Retrieve a flattened array of values taken from an object array that exists within a top-level object of which I am given an array

Please feel free to modify title, it was rather hard for me to explain and thus search. var booking = [ { x: "1", y: "2", days: [ { hours: 8 }, ] }, {...} ] var hoursBooked = [8, 2, 4, 8, 2, 8, 3,…
aspirant_sensei
  • 1,568
  • 1
  • 16
  • 36
0
votes
0 answers

Tika messed up the structure of my document, how to fix it?

After extract the textual content from some PDF files I noted that tika misaligned my document's text, for example my original PDF doc looks like this: Animal name: Cat Food stock: …
tumbleweed
  • 4,624
  • 12
  • 50
  • 81