Questions tagged [data-wrangling]
1242 questions
-1
votes
1 answer
How to transform this into a dataframe and save it as a csv?
This data was earlier provided as a .txt file. I converted it to .csv format and tried to sort it into the wanted form, but failed. I am trying to find ways to convert this data structure (as displayed below):
bakeryA
77300 Baker Street
bun:…

karan
- 85
- 6
-1
votes
1 answer
Dealing with Dataset that has separator value inside text on parentheses on R
I've encountered a simple issue on a dataset i'm working on right now that uses written text a you would see much of social media where people sensibly use commas on their writing process. The whole text is in column 1 on the dataset followed by a…

Hayguneys
- 23
- 4
-1
votes
1 answer
How to separate a JSON string into multiple columns in R
I have a column with strings in dictionary format, like this:
{'district': 'Ilha do Retiro', 'city': 'RECIFE', 'state': 'PE', 'country': 'BR', 'latitude': -8.062004, 'longitude': -34.908081, 'timezone': 'Etc/GMT+3', 'zipCode': '50830000',…

dantaspg
- 1
- 1
-1
votes
1 answer
Count 30 variables based on condition?
Hello guys I am new to R, I basically got a data frame made out of 31 variables (ID, and 30 items coded 1,2,3)
I would like to create a new variable based on a specific condition.. I want to be like this:
(because 2 was present only in those 2…
user14335155
-1
votes
2 answers
How to split data in a column into some separate columns in Python?
So, I have a data frame given below:
import pandas as pd
df = pd.DataFrame(
{
"id": [8233037, 8233313],
"geometry": [
"{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303],…

Anwar San
- 93
- 10
-1
votes
1 answer
How to parse a object?
I have an object that prints this to console:
{
"symbol": "GOOGL",
"annualReports": [
{
"fiscalDateEnding": "2020-12-31",
"reportedCurrency": "USD",
"operatingCashflow": "65124000000",
…

thesmashten
- 37
- 3
-1
votes
1 answer
How to rename timestamp column names to string/object in multiindex dataframe using python
DataFrame :df
None | A B volumeshare volumeshare
X | 2020-10-1 2020-11-1
---------------------------------------
0 | e1 f1 12 65
1 | e1 f2 23 20
2 | e1 f3 0 …

Prasad Patil
- 45
- 1
- 8
-1
votes
1 answer
Data wrangling into a time-series format
Here you will find the sample csv file and below is the python code I was using.
Company…

DataJam
- 1
- 1
-1
votes
2 answers
Count Total Number of NAs per Column in R
I am currently trying to count the number of NAs found in each of my dataset's columns.
I am running the following code:
function(x, df1, df2, ncp, log = FALSE)
apply(Total_HousingData, 2, function(x) {sum(is.na(x))})
Here is my output:
…

Jamie Warren
- 79
- 8
-1
votes
2 answers
Manipulating DataSet
I have a dataset with 3 columns and over 300,000 rows. The first two column shows the symptoms a patient is experiencing and the last column shows the vaccine they were given. I want a dataset which will count the number of combinations of symptoms…

Faraz Ali Khan
- 83
- 7
-1
votes
3 answers
Using powerquery M code data wrangling, how to fill in missing value from grouped rows
Using powerquery M code, how can I fill in missing values which are required, using the most common value in a group of rows?
For example, starting with this table:
id
group
attribute 1
attribute 2
attribute…

J. M. Becker
- 2,755
- 30
- 32
-1
votes
1 answer
Python: If column Address1, Address2,Address3,Address4 string contains 'x' then write 'x' in column Address4
I'm new to python and i'm not sure where to start with wrangling my dataset,
i have customer e-commerce sales data and need one of the columns to contain the county part of the address. The county is in most cases already in the Address4 column but…

Peter Snee
- 129
- 1
- 11
-1
votes
1 answer
How to create a new variable that is the sum of a column, by group, in R?
I am trying to create a new variable in my dataframe that is the group-specific sum of a variable. For example:
df <- data.frame (group = c(1, 1, 1, 2, 2, 2),
variable = c(1, 2, 1, 3, 4, 5)
)
df
group variable
1 1 1
2…

PotterFan
- 3
- 1
- 3
-1
votes
3 answers
How to reconcile two different IDs as one, then apply to a df with both IDs but count the subject only once in R?
I have two different IDs for the same subject(patient).
In this other vector of IDs, the two IDs are both in there that indicate the same patient. How do I only count the patient once(by ID1), instead of two different patients with different…

Tong Claire Xu
- 59
- 5
-1
votes
1 answer
Removing upper and lower quintiles from data in R
I'm trying to remove the upper and lower quintiles from a data set. I can see there is a quartile function but not one for quintiles.
Any advice on how to do this?

Con Des
- 359
- 1
- 2
- 9