Questions tagged [data-analysis]

Data Analysis involves extracting meaning and insights from raw data. It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions.

Data Analysis involves extracting meaning and insights from raw data.

It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions and insights.

Typically, data analysis involves a series of steps. Starting with measuring some parameters of interest, collecting the data, cleaning it, storing it in meaningful ways, then summarizing and examining it, and also testing various hyoptheses about the data.

More information can be found the Wikipedia's Data Analysis page.

4642 questions
1
vote
2 answers

converting the column's data type object to float in pandas using python

I have been trying to do some analysis on the Year column in the csv file, Since it's in object data type, I am trying to convert to float to carry forward my analyses. Code##... import pandas as…
pestoSauce
  • 9
  • 1
  • 3
1
vote
2 answers

Use VBA to suppress Analysis Toolpak Histogram function messege

Question overview: I am using Excel VBA histogram function from 'Analysis Toolpak' to generate approximately 25 histograms automatically. When Histogram graph is generated, it is placed on top of cells that have values in it, effectively hiding them…
1
vote
0 answers

'x' must be atomic for 'sort.list', using dbFD(). FD package

I am trying to run dbFD(traits, as.matrix(abun)) but i receive this error: Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? my data looks similar, but larger to this:
1
vote
3 answers

Subtracting values across grouped data frames in Pandas

I have a set of IDs and Timestamps, and want to calculate the "total time elapsed per ID" by getting the difference of the oldest / earliest timestamps, grouped by ID. Data id timestamp 1 2018-02-01 03:00:00 1 2018-02-01 03:01:00 2 …
tbd_
  • 1,058
  • 1
  • 16
  • 39
1
vote
1 answer

How to add a new column and aggregate values in R

I am completely new to gnuplot and am only trying this because I need to learn it. I have a values in three columns where the first represents the filename (date and time, one hour interval) and the remaining two columns represent two different…
sfactor
  • 12,592
  • 32
  • 102
  • 152
1
vote
1 answer

Google Cloud SQL Export without disruption

I'm trying to export from a postgres database without causing any disruption to the main server and so far I can see a few ways I might achieve this. The database isn't under huge load (10 req/s), but I don't want to cause any significant…
1
vote
2 answers

How to reduce part of a dataframe colunm value based on another column

I have a dataframe like this. I am trying to remove the string which presents in substring column. Main substring Sri playnig well cricket cricket sri went out NaN Ram is in NaN Ram went to UK,US …
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

Why convert numbers to factors while model bulding

I was following a tutorial on model building using logistic regression. In the tutorial, columns having numeric data type and with levels 3, were converted into factors using as.factor function. I wanted to know the reason for this conversion.
krish___na
  • 692
  • 7
  • 14
1
vote
0 answers

Getting an error while writing dataframe into csv

I am trying to write dataframe into csv file using !cat but I'm getting some errors. Code: data.to_csv(r'C:\Users\Downloads\pydata\pydata-book-2nd-edition\examples\out.csv') !cat…
Pruthvish
  • 21
  • 3
1
vote
0 answers

How to refresh shape data file in spotfire

I am a beginner in working with geospatial data. What I have done so far: I created a map chart visualization in spotfire. I created a shape file using QGIS. I added the shapefile in the spotfire using Add Data Table -> File I added a feature layer…
Jay
  • 339
  • 1
  • 7
  • 23
1
vote
1 answer

Need help to solve the Unnamed and to change it in dataframe in pandas

how set my indexes from "Unnamed" to the first line of my dataframe in python import pandas as pd df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False) df = df.dropna(how='all',axis=1) df =…
hzgfx
  • 13
  • 2
1
vote
1 answer

How to analyze information from the comments of users on my site?

Can anybody suggest a way to process the information and analyze the data from the comments users post on a article in my website. I exactly want to process the comments as follows: Example: Like on a article on computerization may get the following…
Lokesh Sah
  • 2,283
  • 5
  • 23
  • 33
1
vote
1 answer

Pandas: fix typos in keys within a dataframe

So, I have a large data frame with customer names. I used the phone number and email combined to create a unique ID key for each customer. But, sometimes there will be a typo in the email so it will create two keys for the same customer. Like…
1
vote
1 answer

compare list of data with CSV file and sort the matching

I have a data set of product names and a brands list. I need to find the how much branded products are there in my list. **Brands sample :** ['HM International', 'Sara', 'Wildcraft', 'Nike'] **Product name sample :** [Attache backpack11Green…
1
vote
0 answers

How to make Jupyter Notebooks Sharable to your colleagues

At my organisation we currently use a sql query tool on top of Redshift. This provider us with ability to save our sql queries and create a place where any one can search for a query name and look at it and its results. We can also give query links…
ila
  • 920
  • 12
  • 35