Questions tagged [data-analysis]

Data Analysis involves extracting meaning and insights from raw data. It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions.

Data Analysis involves extracting meaning and insights from raw data.

It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions and insights.

Typically, data analysis involves a series of steps. Starting with measuring some parameters of interest, collecting the data, cleaning it, storing it in meaningful ways, then summarizing and examining it, and also testing various hyoptheses about the data.

More information can be found the Wikipedia's Data Analysis page.

4642 questions
1
vote
2 answers

How do I replace the similar looking values in a pandas dataframe?

I am new to Pandas. I have the following data types in my dataset. (The dataset is Indian Startup Funding downloaded from Kaggle.) Date datetime64[ns] StartupName object IndustryVertical object CityLocation …
user9149054
1
vote
1 answer

Python - Matplotlib plots incorrect graph when using pandas dataframe

(My first ever StackOverflow question) I'm trying to plot bitcoin's market-cap against the date using pandas and matplotlib in Python. Here is my code: %matplotlib inline import pandas as pd import numpy as np import matplotlib.pyplot as plt #read…
AjithA
  • 11
  • 1
1
vote
1 answer

How to add an extra number on top of the each bar on barchart

According to the explanation why this question is different from this link this link get the height from the diagram as far as I understood, but in my case I do not have this column numpatients6month in the diagram at all, I just have that on the…
sariii
  • 2,020
  • 6
  • 29
  • 57
1
vote
2 answers

Aggregate a bunch of different data in a single groupby with multiple columns

I have large dataframe of data in Pandas (let's say of courses at a university) looking like: ID name credits enrolled ugrad/grad year semester 1 Math 4 62 ugrad 2016 …
charlieshades
  • 323
  • 1
  • 3
  • 16
1
vote
1 answer

plot a stacked bar chart matplotlib pandas

I want to plot this data frame but I get an error. this is my df: 6month final-formula Question Text 166047.0 1 0.007421 bathing 166049.0 1 0.006441 dressing 166214.0 1 0.001960 …
sariii
  • 2,020
  • 6
  • 29
  • 57
1
vote
1 answer

Predicting Customer Activity Absence

Could you please assist me with to following question? I have a customer activity dataframe that looks like this: It contains at least 500.000 customers and a "timeseries" of 42 months. The ones and zeroes represent customer activity. If a customer…
Andrei
  • 133
  • 1
  • 8
1
vote
1 answer

how to calculate a ratio formula based on some conditions in pandas data frame

I have a data frame, my data frame is like this: except the last column is not there. I mean I do not have formula column and here my purpose is to calculate that column. but how it has been calculated? the formula for the last column is: for each…
sariii
  • 2,020
  • 6
  • 29
  • 57
1
vote
1 answer

could not convert string to float w/ HUGE TXT FILE

i have a huge text file It is the second txt file labeled hhrr1996221.txt.zip I am trying to analyze the data by counts vs time, the times start at 2 ms and then 6 data sets (Counts) are given and repeat. I have not used python since last year,…
Fahad
  • 11
  • 2
1
vote
0 answers

movielense popularity recommender code with R

I'm now studying R, and now doing project about movie recommend algorithm. I used movielense 100k data with recommenderlab library, and use these…
MS.K
  • 25
  • 6
1
vote
1 answer

How using python to groupby and scaling values?

I would like to rescaled column 'w'. I have averaged 'w'. aveData_set = Data_Set.groupby(['buildingid', pd.Grouper(key='reporttime',freq='15T')])['w'].mean().reset_index() aveData_set result: Then I would like each 24H rescaling column…
Linminxiang
  • 325
  • 2
  • 14
1
vote
1 answer

Using python to create an average out of a list of times in pandas

I have a large number data. I need to average each fifteen minutes 'w'. Now I use for loop to execute,but it is so slow. pandas have any suite can help? I really need your help.Many thanks.
Linminxiang
  • 325
  • 2
  • 14
1
vote
1 answer

Convert a folder of PDFs into a csv of CMYK values

tldr: How can I convert a folder of pdfs into a list of CMYK values (or RGB or any kind of colour scale values), preferably in python. I have a folder with around ~100,000 documents in it. To make sampling these documents easier I want to run data…
The Lemon
  • 1,211
  • 15
  • 26
1
vote
0 answers

Music21 and D3.js for music feature extraction and visualization?

I am looking for suggestions on what tools could be used for the following scenarios about music feature extraction and visualization (on my Mac): identify and group notes in a score (from different voices/instruments) that sound concurrently (even…
1
vote
0 answers

Window function for unique rows in SQL Server

I have a table like below The main idea is to get the amount of each channel for each orderID. If the channel is repeating for Id, it should take the amount only once and rest would be null. The result should look like below I want to do the same…
user123
  • 31
  • 1
  • 7
1
vote
1 answer

BigQuery Filter by Date

I want to filter my data as per the DATE and TIME format '2016-01-09 16:31:04.000 UTC' using Legacy SQL in BigQuery. Kindly help me out with the correct syntax. I'm stuck. Code SELECT * FROM [table.column] AS Alias WHERE date > '2017-03-31Z';
shakti nayan
  • 13
  • 1
  • 4