Questions tagged [data-manipulation]

Data manipulation is the process of altering data from a less useful state to a more useful state.

Data manipulation is the process of taking data from either a source or format that isn't easy to read or search into a format or data storage solution that can be quickly read and/or searched. For example, a log's output could be split into rows of a database to make it easier to pull out just the entries that pertain to a situation, or simply reordered to make locating entries based on the ordered field easier. Data manipulation can make data mining easier.

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent usable form for further processing or input to an algorithm or system.

3845 questions

vote

1 answer

How can I add a column with mutate () to each of the multiple data sets I read?

I am a beginner in R and currently learn how to do the data wrangling job in multiple data sets. Right now I read 55 csv.file data sets with 300 rows using the following code: Rawdata <- list.files(pattern = "*.csv") for(i in 1:length(Rawdata)){ …

r data-manipulation

asked Dec 08 '18 at 01:12

Meijuan Zeng

vote

2 answers

Create column with dplyr based on value and also frequency of another column, in R

I will edit the post name shortly as I think up a better title, but for the time being, a short example below highlights what I am struggling with: dput(mydf) structure(list(gameID = c("34", "34", "34", "34", "34", "25", "25", "25")), class =…

r dplyr data-manipulation

asked Dec 07 '18 at 22:21

Canovice

9,012
22
93
211

vote

1 answer

Extract part of URL in R

dput(mydf) structure(list(urls = c("/players/a/abdulma02.html", "/players/a/abdulta01.html", "/players/a/abdursh01.html", "/players/a/alexaco01.html", "/players/a/alexaco02.html" ), names = c("Mahmoud Abdul-Rauf", "Tariq Abdul-Wahad", "Shareef…

r data-manipulation

asked Dec 07 '18 at 18:19

Canovice

9,012
22
93
211

vote

2 answers

Group by two columns and get sum?

x1 = [{'id1': 'Africa', 'id2': 'Europe', 'v': 1}, {'id1': 'Europe', 'id2': 'North America', 'v': 5}, {'id1': 'North America', 'id2': 'Asia', 'v': 2,}, {'id1': 'North America', 'id2': 'Asia', 'v': 3}] df = pd.DataFrame(x1) How…

python python-3.x pandas data-manipulation python-applymap

asked Nov 26 '18 at 08:55

Chipmunkafy

vote

2 answers

SAS set value less than mean to missing

Let's say I have data that look like this: DATA temp; INPUT id a1 b2 d1 f8; DATALINES; 1 2.3 2.1 4.2 1.2 2 5.3 2.3 1.5 3.2 3 1.2 5.4 6.6 6.6 ; run; What I want to do is use the data and set statements to say that if the values in a1 and f8 are…

sas data-manipulation

asked Nov 25 '18 at 21:58

user10703531

vote

1 answer

Applying a function only works for one column instead of multiple?

x = [{'list1':'[1,6]', 'list2':'[1,1]'}, {'list1':'[1,7]', 'list2':'[1,2]'}] df = pd.DataFrame(x) Now I'm going to transform it from string to list type: df[['list1','list2']].apply(lambda x: ast.literal_eval(x.strip())) >> ("'Series' object…

python python-3.x apply data-manipulation

asked Nov 25 '18 at 21:44

Chipmunkafy

vote

1 answer

Is there an efficient way to display time chart in dc.js with date range data?

I am trying to create a timechart to show the number of rooms occupied in a different scenarios using dc.js. To reduce data transmission, my room occupancy data is represented by discrete start and end times. [{"room": "1", "start":"10/13/2018…

dc.js data-manipulation date-range

asked Nov 25 '18 at 19:20

Jernigan

vote

2 answers

Assign date to all lines below until the next date

df index col1 ------------------------ 0 2017-01-01 1 a 2 b 3 c 4 2017-01-02 5 d 6 e 7 f 8 2017-01-03 9 g 10 h 11 i expected df index …

python dataframe data-manipulation

asked Nov 18 '18 at 00:52

Chipmunkafy

vote

1 answer

Calculate how many reports are running at a certain time

I am trying to calculate how many reports are running at a certain time. The data is like: ReportID StartTime Duration 1 2018-11-02 13:00:00 240 seconds 2 2018-11-02 14:00:00 300 seconds 3 2018-11-02 14:01:15 300 seconds …

r algorithm data-manipulation

asked Nov 08 '18 at 19:45

ProgrammerOliv

vote

2 answers

Get sum of values from last nth row by group id

I just want to know how to get the sum of the last 5th values based on id from every rows. df: id values ----------------- a 5 a 10 a 10 b 2 c 2 d 2 a 5 a 10 a 20 a 10 a …

python database pandas dataframe data-manipulation

asked Oct 29 '18 at 21:06

Mike

vote

2 answers

R: Counting occurrences in each column and replacing that column's value with the count (SQL?)

Here is an example of the original data: ID Test1 Test2 Test3 Test4 1 0 0 NA 1.2 1 0 NA NA 3.0 1 NA NA NA 0 2 0 …

r data-manipulation sqldf

asked Oct 26 '18 at 23:09

aspratle

vote

3 answers

How to group this dataframe in python?

I have this problem: import pandas as pd stripline = "----------------------------" rawData = { 'order number': ['11xa', '11xa', '11xa', '21xb', '31xc'], 'working area': ['LLA', 'LLE', 'LLS', 'MLA', 'MLE'], 'time': [1, 6, 13, 35,…

python pandas dataframe grouping data-manipulation

asked Oct 24 '18 at 09:43

ScienceLover

vote

1 answer

Create columns based on bins

I have a data: # dt Column1 1 2 3 4 5 6 7 8 9 I want to create a new column by bins' average of min and max. # dt Column1 Column2 1 2 2 2 3 …

python pandas data-manipulation

asked Oct 24 '18 at 05:08

Peter Chen

1,464
3
21
48

vote

1 answer

Replace values to row above based on condition

I want to replace values to row above based on condition as follows: If pc_no = DELL, assign to value of pc_no and cust_id into row above to event_rep and loc_id. After that want to delete the row which has "DELL". id pc_no cust_id event_id…

r replace data-manipulation data-cleaning

asked Oct 18 '18 at 14:50

kimi

vote

2 answers

proportion data frame for each factor level based on another column

I would like to summarize a data frame by month where each column is the proportion of each factor level based on the Records column in the data frame below. I have been attempting to use dplyr but haven't quite figured it…

r dplyr data-manipulation

asked Sep 30 '18 at 00:37

alleyway

Prev 1 2 3

…

100