Questions tagged [data-wrangling]
1242 questions
0
votes
2 answers
What is the easiest way to split a string into a first name and last name?
The dataset has 14k rows and has many titles, etc.
I am a beginner in Pandas and Python and I'd like to know how to proceed with getting the output of first name and last name from this dataset.
Dataset:
0 Pr.Doz.Dr. Klaus Semmler Facharzt…

jerof
- 73
- 1
- 8
0
votes
1 answer
Reshaping data.frame with a by-group where id variable repeats
I want to reshape/ rearrange a dataset, that is stored as a data.frame with 2 columns:
id (non-unique, i.e. can repeat over several rows) --> stored as character
value --> stored as numeric value (range 1:3)
Sample data:
id <-…

pjot.r
- 1
- 2
0
votes
2 answers
Formatting grouped data for tables in R
I'm trying to display my data in table format and I can't figure out how to rearrange my data to display it in the proper format. I'm used to wrangling data for plots, but I'm finding myself a little lost when it comes to preparing tables. This…

Corey
- 405
- 2
- 6
- 18
0
votes
0 answers
partial transpose - data wrangling - python
I have a dataset(code below)- that looks like below -
d = pd.DataFrame({
'Year': [
2019,
2020,
2021,
2022,
2019,
2020,
2020,
2021,
2019,
2020,
2021,
…

Swapnil
- 25
- 1
- 9
0
votes
1 answer
The well-defined dimension of a tf.tensor is inexplicably `None`
The example below is extracted from the official TensorFlow tutorial on data pipelines. Basically, one resizes a bunch of JPGs to be (128, 128, 3). For some reason, when applying the map() operation, the colour dimension, namely 3, is turned into a…

Tfovid
- 761
- 2
- 8
- 24
0
votes
1 answer
pandas group_by dataframe outputs only the aggregation column when written to excel; how to get the entire output on excel?
I am trying to group and sum-aggregate a specific column in my dataframe and then write this entire output to excel; however, when i check the excel file after using the below code, it only contains the one aggregated column as the output and does…

Vic
- 43
- 1
- 6
0
votes
1 answer
Getting KeyError while grouping my dataset into 2 samples
I am taking an online course. 'bikesharing_data' is the name of the pandas object and 'workingday' is the name of the column in that data frame. The tutor wants to divide the dataset into two samples and divides the 'workingday' into ([0, 1])…
0
votes
1 answer
How to combine data in rows into a new column and into a new data frame
I have a data frame that has multiple entries on the same day with a TSS score.
athlete workoutday tss
1 Athlete_1 2020-03-20 30
2 Athlete_1 2020-03-20 21
3 Athlete_1 2020-03-20 64
I would like some help in knowing how to…

BigBird
- 5
- 3
0
votes
1 answer
how to find the distinct names (in column 1) whose weight (in column 2) always increased over the the weeks (in column 3) in Big Query?
I have a big query result which shows the weight of each person over many weeks, and I want to find the names of the people whose weight always increased over the weeks. Below is the a sample data.
name week weight
tom …

Savybossman
- 5
- 1
0
votes
1 answer
How to add variables from the same record id but with multiple names in R?
I have a question when I was trying to arrange the data. I have a data frame like below:
ID price location
1 10.2 A
2 9.0 B
2 9.0 C
3 8.5 F
3 8.5 G
For each unique ID, all the columns are the same except for the location.…

Stella
- 65
- 4
0
votes
1 answer
how to deal with -inf values in data wrangling problems using Python pandas
While data wrangling in python, using pandas from a csv file, how to deal with -inf values which might arise when making a column for percentage change calculations?
Suppose you have a data which you loaded to python using pandas as dataframe.…

Pranjita Chakraborty
- 13
- 3
0
votes
2 answers
Not able to extract nested table body with pandas from webpage
I am trying to extract nested table from the url 'http://gsa.nic.in/report/janDhan.html' using pandas with code:
import pandas as pd
url…

Dhirendra Sinha
- 11
- 1
0
votes
1 answer
How can I add new rows from a dataframe to another one based on key column
My df1 is something like first table in the below image with the key column being Name. I want to add new rows from another dataframe, df2, which has only Name, Year, and Value columns. The new rows should get added based on Name. Other columns…

BigP
- 3
- 4
0
votes
1 answer
Converting numeric column (Difference between arrival and departure time ) of dataframe to minutes
Dear R community members,
i would like to create a new variable (commute time) based on the difference between the departure and arrival time of commuters (Arrivaltime - Departuretime) from the Origin to their destination (24 hour format).…

Xaviermoros
- 131
- 10
0
votes
1 answer
Averaging duplicates in a pandas DataFrame instead of using drop_duplicates to keep first
Assume that I have a Pandas DataFrame of the form:
id price dur
1 153 80.0 0.0
2 153 130.0 0.0
3 153 95.0 0.0
4 156 115.0 0.0
5 156 165.0 0.0
6 156 130.0 …

JA-pythonista
- 1,225
- 1
- 21
- 44