Questions tagged [data-analysis]

Data Analysis involves extracting meaning and insights from raw data. It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions.

Data Analysis involves extracting meaning and insights from raw data.

It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions and insights.

Typically, data analysis involves a series of steps. Starting with measuring some parameters of interest, collecting the data, cleaning it, storing it in meaningful ways, then summarizing and examining it, and also testing various hyoptheses about the data.

More information can be found the Wikipedia's Data Analysis page.

4642 questions
1
vote
1 answer

How do I find the mean/median of the different values in a row?

I have a dataset in a csv file that looks like this: teacher student student grade Jon marin 99 Jon Rob 81 Jon marly 90 Bon martin 76 Bon …
JJ123
  • 573
  • 1
  • 4
  • 18
1
vote
1 answer

Best way to do real time data analytics

I'm currently am interested in performing real time data analytics using real time aircraft performance data for predictive analysis. What tools and technologies could be used to implement such a system on research level?
TheShark
  • 420
  • 3
  • 6
  • 17
1
vote
1 answer

What are the empty files after RDD.saveAsTextFile?

I'm learning Spark by working through some of the examples in Learning Spark: Lightning Fast Data Analysis and then adding my own developments in. I created this class to get a look at basic transformations and actions. /** * Find errors in a log…
runnerpaul
  • 5,942
  • 8
  • 49
  • 118
1
vote
0 answers

Can I normalise subsets of training data for a neural network?

Say I have a training set with 50 vectors. I split this set into 5 sets each with 10 vectors and then I scale the vectors in each subset and normalise the subsets. Then I train my ANN with each vector from each subset. After training is complete, I…
1
vote
4 answers

Combine data from two columns into one, except if second is already occupied in pandas

Say I have two columns in a data frame, one of which is incomplete. df = pd.DataFrame({'a': [1, 2, 3, 4], 'b':[5, '', 6, '']}) df Out: a b 0 1 5 1 2 2 3 6 3 4 is there a way to fill the empty values in column b with…
skailasa
  • 121
  • 8
1
vote
2 answers

Number of Anomalies generated in Bucket Span [X-PACK]

Sorry for the newbie question but i am new to Elastic products. I am learning X-Pack from Tutorials by Elastic. So while watching this video tutorial on Investigating Anomalies in dataset using Kibana and X-Pack i got confused(though i answered them…
Rajat Khandelwal
  • 477
  • 1
  • 5
  • 19
1
vote
2 answers

return the difference between maximum and minimum value of float items in a list

I am trying to return the difference of max and min of float items in the sequence. The output shall be an int, but the algorithm given below returns a list instead. Could someone let me know what I am missing? def flatten(*args): res = [] …
Yu Ni
  • 65
  • 4
  • 8
1
vote
2 answers

How to split a (569 ,31 ) DataFrame into two with shapes (569 ,30) and (569, )

How to split a (569 ,31 ) dataframe into two with shapes (569 ,30) and (569, ) The dataFrame has 31 columns- df.columns yields this - Index([u'mean radius', u'mean texture', u'mean perimeter', u'mean area', u'mean smoothness', u'mean…
surabhi gupta
  • 65
  • 1
  • 1
  • 9
1
vote
1 answer

How do we return multiple plots in R through plumber?

This is what my code looks like library(plumber) data(mtcars) test=mtcars #' @get /graph #' @png makePlot <- function(){ par(mfrow=c(2,1)) hist(test$mpg) hist(test$wt) } r <- plumb("plum_api.R") …
1
vote
1 answer

How to minimize duplicate queries

Suppose I have two datasets. In QlikView, if I try to include these in a load using a query like the following: sql select marriage_id, primary_person_id, seconary_person_id, marriage_start_date, marriage_end_date from marriage_table; sql select…
Marisa
  • 732
  • 6
  • 22
1
vote
1 answer

barplot - Grouping x-axis labels without manipulating accompanying bars

I'm doing some basic data analysis on this dataset: https://www.kaggle.com/murderaccountability/homicide-reports I'm generating a basic barplot using the State names as the x-axis values, and the y-axis values is the percentage of nationwide…
1
vote
1 answer

how to group pandas timestamps plot several plots in one figure and stack them together in matplotlib?

I have a data frame with perfectly organised timestamps, like below: It's a web log, and the timestamps go though the whole year. I want to cut them into each day and show the visits within each hour and plot them into the same figure and stack…
1
vote
3 answers

Talend : Create a Component using java code

I am new in user of Talend open studio I want to find a way to add component like tinputfile or tligrow without the drag and drop tools , but with java code manually Help please Thank you very much
1
vote
1 answer

Is there a way of using Pandas or Matplotlib to plot Pandas Time Series density?

I am having a hard time of plotting the density of Pandas time series. I have a data frame with perfectly organised timestamps, like below: It's a web log, and I want to show the density of the timestamp, which indicates how many visitors in…
1
vote
4 answers

Splitting data from a txt file

I am new to Python. What I am trying to do is to split what I got from a txt file to select just the Aperture and the ShutterSpeed values. This is how my data looks like (30 different values of Aperture and Shutter Speed) : ========…
Mourad Over Flow
  • 191
  • 1
  • 5
  • 16
1 2 3
99
100