Questions tagged [data-munging]

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.

236 questions
1
vote
1 answer

Is there a way to read columns with strings as strings when using RGoogleDocs

I use RGoogleDocs a lot. I use it to read in data that is private or only shared with a few people. I know that read.table and read.csv allow one to use stringsAsFactors=FALSE. I want to do something similar in RGoogleDocs. Here is my typical…
Farrel
  • 10,244
  • 19
  • 61
  • 99
0
votes
1 answer

PHP: How to match all occurences of a regex pattern in a document

I am doing some data munging on documents which may (or may not - as the case may be) have ocurrence(s) of a regular expression pattern in their content. I would like to write a PHP function to use to process the documents - the job of the function…
Homunculus Reticulli
  • 65,167
  • 81
  • 216
  • 341
0
votes
2 answers

How to select only the last hour of weather data from each week in R?

I have a weather dataset with observations collected at 15-minute intervals for several weeks. I would like to extract only the last hour of weather data for each week and disregard the rest. In the week 15for example, I only want to keep rows from…
Ahsk
  • 241
  • 1
  • 7
0
votes
3 answers

pandas series mark all the rows between two values

I have a series ( a single col in a df) with 3 possible values: Stable, Increase, Decresae , and I want to mark all the areas between a Increase to the subsequent Decrease. So for the…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
4 answers

pandas dataframe get rows when list values in specific columns meet certain condition

I have a dataframe: df = A B 1 [0.2,0.8] 2 [0.6,0.9] I want to get only rows where all the values of B are >= 0.5 So here: new_df = A B 2 [0.6, 0.9] What is the best way to do it?
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
1 answer

Running AppleScript Regex using sed problems

I'm trying to use sed under AppleScript to execute Regex commands. The goal is to run a series of Regex commands to cleanup text. I've tried to use the TextEdit app as the source for the data to be passed on to the Regex commands. It will not run…
jeff
  • 9
0
votes
1 answer

Pandas conditional join and calculation

I have two Pandas dataframes, df_stock_prices and df_sentiment_mean. I would like to do the following: Left join/merge these two dataframes into one dataframe, joined by Date and by ticker. In df_stock_prices, ticker is the column name, for…
billv1179
  • 323
  • 5
  • 15
0
votes
1 answer

R code that iteratively creates a "rank_order" column for every column in a given dataframe

Given a data frame such as the following, how do I get a rank order (e.g. integer column ranking the value in order from descending as "1,2,3") column output for every single column without writing out ever single column? df <- data.frame( col1 =…
jaykay
  • 41
  • 1
0
votes
1 answer

Adding a (fixed) new row to the top of each dataset in a list of N datasets using apply

I have N data sets which were loaded into RStudio and stored in the list object "datasets". The problem is what I want to be the top row in each of them or the headers for each of them, either way is in their third rows. The initial version of this…
0
votes
1 answer

How to apply the equivalent of standard sub setting operations but to a list of dataframes instead of to a single dataframe

I have a set of 40 different datasets within a file folder which I have loaded into my WorkSpace in RStudio with: datasets <- lapply(filepaths_list, read.csv, header = FALSE) This object datasets is a list of 40 dataframes. I would like to run code…
Marlen
  • 171
  • 11
0
votes
2 answers

pandas apply subtractions on columns function when indexes are not equal, based on alignment in another columns

I have two dataframes: df1 = C0 C1. C2. 4 AB. 1. 2 5 AC. 7 8 6 AD. 9. 9 7 AE. 2. 6 8 AG 8. 9 df2 = C0 C1. C2 8 AB 0. 1 9 AE. 6. 3 10 AD. 1. 2 I want to apply a subtraction between these two dataframes,…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
0 answers

Pandas dataframe sort and groupby columns and add columns that are based on calclations from previous group

I have a df: df = Date id1 amount is_winner 2022-07-14 02:34:20.348. A. 87.11. False 2022-07-14 02:34:20.348. B. 77.12. True 2022-07-14 02:37:20.348. A 89.11. False 2022-07-14 02:37:20.348. B. 87.12. True 2022-07-14…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
1 answer

Pandas add column of count of another column across all the datafram

I have a dataframe: df = C1 C2 E 1 2 3 4 9 1 3 1 1 8 2 8 8 1 2 I want to add another columns that will have the count of the value that is in the columns 'E' in all the dataframe (in the column…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
0
votes
0 answers

Lable a variable in R from a Survey

I'm trying to analyze a Survey in R. When I import the data and look at the table, I can see in the heading very nicely that there is the variable name (e.g. 'gender') and below R put the question originally asked in the survey (e.g.' What gender do…
Sofia
  • 21
  • 3
0
votes
1 answer

Pandas histogram of number of occurences of other columns after groupby

I have a dataframe: df = Batch_ID DateTime Code A1 A2 ABC. '2019-01-02 17:03:41.000' 230 2. 4 ABC. '2019-01-02 17:03:41.000' 230 1. 5 ABC. '2019-01-02 17:03:42.000' 231 1. 4 …
Cranjis
  • 1,590
  • 8
  • 31
  • 64