Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
2
votes
1 answer

How can we extract speed and acceleration features from GPS (longitude and latitude) time series data using python

I want to extract as many numerical features as I can from GPS data (longitude and latitude). I am using pandas with python. The main features I am interested in are speed, lateral and longitudinal acceleration. a sample of the data is as below…
Busy Bee
  • 107
  • 2
  • 8
2
votes
3 answers

Nearest Neighbor for partially unknown vector

Let's say we have list of people and would like to find people like person X. The feature vector has 3 items [weight, height, age] and there are 3 persons in our list. Note that we don't know height of person C. A: [70kg, 170cm, 60y] B: [60kg,…
2
votes
2 answers

How to transform a key/value string into distinct rows?

I have a R dataset with key value strings which looks like below: quest<-data.frame(city=c("Atlanta","New York","Atlanta","Tampa"),…
Aiyanna K
  • 458
  • 2
  • 18
2
votes
1 answer

Using Text Sentiment as feature in machine learning model?

I am researching what features I'll have for my machine learning model, with the data I have. My data contains a lot of textdata, so I was wondering how to extract valuable features from it. Contrary to my previous belief, this often consists of…
2
votes
1 answer

Looking at an H2o MOJO model, how can I figure out the datatypes of the training data it was trained on?

Looking at an H2o MOJO model, is there a way to figure out the datatypes of the training data it was trained on?
kivk02
  • 599
  • 1
  • 4
  • 16
2
votes
4 answers

reshape multi id repeated variable readings from long to wide

This is what I have: id<-c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2) measure<-c("speed","weight","time","speed","weight","time","speed","weight","time", …
aadeeb
  • 21
  • 2
2
votes
3 answers

Python Pandas: how to convert a list of pair mappings to a row-vector format?

I have a 2-column DataFrame, column-1 corresponds to customer, column-2 corresponds to the city this customer has visited. The DataFrame looks like the following: print(df) customer visited_city 0 John London 1 Mary …
cwl
  • 501
  • 2
  • 7
  • 18
2
votes
1 answer

xgboost error - label must be in [0,1] when my label is already in numeric and I need result in numbers not in range of 0 ,1

I m using xgboost for regression problem but I m getting error regarding response variable which is output sales and it is initially numeric in class but as I use xgboost it shows error BUT I want output in numeric form only labels <-…
jatin singh
  • 123
  • 1
  • 1
  • 13
2
votes
1 answer

silhouette analysis on GaussianMixture

i am doing silhouette analysis using GaussianMixture . I tried to modify similar code written in scikit website but getting weird error:- --> 82 centers = clusterer.cluster_centers_ 83 # Draw white circles at cluster centers …
2
votes
1 answer

How to merge two CSV files having same value for two column fields using python?

The first CSV file. DATE TIME ENG-1 ENG-2 ENG-3 ENG-4 ENG-5 ENG-6 '01 10 2016' '06:35:00' 0.28596 0.29029 0.28756 0.28571 0.30868 0.14109 '01 10 2016' '06:40:00' 0.44193 0.45012 0.44324 0.44423 0.46907…
user3110784
  • 43
  • 1
  • 4
2
votes
3 answers

Pandas Correlation Between List of Columns X Whole Dataframe

I'm looking for help with the Pandas .corr() method. As is, I can use the .corr() method to calculate a heatmap of every possible combination of columns: corr = data.corr() sns.heatmap(corr) Which, on my dataframe of 23,000 columns, may terminate…
julianstanley
  • 1,367
  • 4
  • 13
  • 26
2
votes
1 answer

Create Feature Using K-Nearest Neighbors

I'm relatively new to Python and Machine Learning, but I've been working on building out a predictive model for Mortgage prices. Where I'm struggling is using the K-Nearest Neighbor algorithm to create a feature. Here's how I understand the…
2
votes
1 answer

Beginner Tkinter in Python: Functions with Inputs

I would like to have a basic GUI with two text box inputs: one for each of the arguments in my function, convert_databases, but I'm not sure how to pass those arguments (I've seen some examples using lambda, but I wasn't able to implement them…
julianstanley
  • 1,367
  • 4
  • 13
  • 26
2
votes
1 answer

How to show and run code but don't print results in Rmarkdown knitr

When I knit the following code chunk in Rmarkdown it will print out the results as well. I just want to run and show the code. In other code chunks in the same .Rmd file this knitr syntax works... ```{r import, results = "hide"} gs_ls() df <-…
Tdebeus
  • 1,519
  • 5
  • 21
  • 43
2
votes
1 answer

Simple/Beginner Excel Transformation in Pandas

I'm have an excel document formatted like so (Columns are datasets, Rows are cell types, values are comma-delineated gene names) I would like to reformat the sheet like so (Columns are still datasets, but Rows are now gene names, and values are…
julianstanley
  • 1,367
  • 4
  • 13
  • 26