Questions tagged [data-munging]

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.

236 questions
1
vote
3 answers

Pluggable/dynamic data processing/munging/transforming perl module?

Cross-posted from perlmonks: I have to clean up some gross, ancient code at $work, and before I try to make a new module I'd love to use an existing one if anyone knows of something appropriate. At runtime I am parsing a file to determine what…
Randy Stauner
  • 696
  • 5
  • 6
1
vote
1 answer

Managing duplicates that are not entered as duplicates in R

I have a data set from a state agency and am trying to clean it up. One obstacle is that there are no input standards for titles (e.g., DIR, DIRECTOR, DIR., are all allowable inputs). Another obstacle is that an individual may have several job…
Brian Holt
  • 75
  • 9
1
vote
1 answer

'Stack()' output with all Individual index's filled in Pandas DataFrame

I have the following DataFrame: import pandas as pd import numpy as np dates = pd.date_range('20130101',periods=6) df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD')) which is: out[]:df A B C…
mari
  • 167
  • 4
  • 15
1
vote
1 answer

Mapping PHP associative arrays into PDO prepared statements

I'm doing some cleanup and transformation on data (that part is done, whew), and need to insert it into a MySQL table. Having done this kind of thing in Perl previously, I assumed that, as part of processing, it would make sense for me to structure…
1
vote
1 answer

Replace NAs in a Single Column of a Data Table in R

I'm trying to replace NAs in a single column of a datatable in R with "-999" and I can quite get it. There is related question here on Stackoverflow but I think this can be done without iterating through the table. I have a column, column_to_check…
Windstorm1981
  • 2,564
  • 7
  • 29
  • 57
1
vote
1 answer

Python, Pandas - Issue applying function to a column in a dataframe to replace only certain items

I have a dictionary of abbreviations of some city names that our system (for some reason) applies to data (i.e. 'Kansas City' is abbreviated 'Kansas CY', and Oklahoma City is spelled correctly). I am having an issue getting my function to apply to…
1
vote
2 answers

How to assign a value for a column based on another column value in R?

I have a dataframe df <- data.frame(structure(list(col1= c("A", "B", "C", "D", "A"), col2= c(1, 1, 1, 1, 5), col3 = c(2L, 1L, 1L, 1L, 1L)), .Names = c("col1", "col2", "col3"), row.names = c(NA, -5L), class =…
Chris
  • 1,248
  • 4
  • 17
  • 25
1
vote
1 answer

"Doing work" on csv DictReader fails

I am writing a script where I need to read a CSV into a DictReader, do some work on the fields (data munging), then output the DictReader to a csv via DictWriter. If I read the CSV then write the Dict, the process works. #Create the sample…
mikebmassey
  • 8,354
  • 26
  • 70
  • 95
1
vote
1 answer

server side Adobe AIR apps

This might sound like a really stupid question, but is there anyway to run an Adobe AIR application in a headless server side mode on a non-UI server (i.e. Linux)? I'm trying to build server side bots to interact with an API (grapevinetalk.com) and…
Robbie
  • 209
  • 1
  • 3
  • 10
1
vote
1 answer

Data munging in R: Subsetting and arranging vectors of uneven length

I am sorry I could not make a more specific title. I am trying to wean myself off of spreadsheets for the more difficult tasks and this one is giving me particular trouble - I can do it in Excel but I don't really know how to begin in R. It is…
syntonicC
  • 371
  • 3
  • 17
1
vote
1 answer

Saving image urls to database table

I am trying to save image urls to a MySQL database table The column field is long enough. The table and database are using UTF-8 CI-general collation (IIRC) The urls look something like…
Stick it to THE MAN
  • 5,621
  • 17
  • 77
  • 93
1
vote
2 answers

data wrangling with Flask: how to do this using SQL language? Does it make sense to use pandas?

Quite new to SQL, and working with flask and sqlalchemy here is my issue (I hope it's not too long) Overview: I have a SQL table structured like this: name vector axis value unit ref…
Nic
  • 3,365
  • 3
  • 20
  • 31
1
vote
2 answers

How do I export a df as.character in R?

How do I export a data frame completely as.character in r? I have digits that need to be treated as text in large dataframes, and I'm using write.csv, but even though I imported digits into r as characters, they are exporting as numbers (not…
1
vote
1 answer

data munging in python compared with R (from an excel sheet)

I have a hypothetical example here with file attached (Excel File Link) where I'm loading in a file from excel and formatting it into something I can work with to either analyse or store more permanently. In R I would use the following few lines to…
Tahnoon Pasha
  • 5,848
  • 14
  • 49
  • 75
1
vote
5 answers

Python program that generate a list from a given list according to a mapping

E.g. org_list : aa b2 c d mapping : aa 1 b2 2 d 3 c 4 gen_list: 1 2 4 3 What is the Python way to implement this? Suppose org_list and the mapping are in files org_list.txt and mapping.txt, while the gen_list will be written into…
JackWM
  • 10,085
  • 22
  • 65
  • 92