Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
2
votes
1 answer

matplotlib graph for IMDB Voting vs Rating

Plotting a graph between the voting and ratings for movies from IMDB data, What is the best way to show "Weighted Rank" Voting vs Rating Graph with the help of Pandas and Matplotlib. Tried this so far but doesn't appears in correct format, even the…
min2bro
  • 4,509
  • 5
  • 29
  • 55
2
votes
2 answers

Is there anyway to know the progress in sklearn GridSearch

For grid search is always time consuming, so I want to see how much it run now. For example ,it might output paramsXXX processed paramsYYY processed ...
mrbean
  • 171
  • 2
  • 15
2
votes
0 answers

Can not Connect to a database on Redshift in R by RODBC package

I am trying to connect to A DB on Redshift in r using following syntax (I am using a Mac): odbcConnect("xxxxaddresss.redshift.amazonaws.com:5439", uid = "xxxx", pwd = "xxxx") and get the following errors. Warning messages: 1: In …
2
votes
1 answer

Notebook as production rest API

I know databricks offers the possibility to simply convert notebooks into "production-grade" rest APIs. Is there a similar functionality for open source notebooks like Zeppelin, Scala-Notebook or Jupiter Notebook or hue-notebook? It would be great…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
2
votes
0 answers

More training set errors than bounded support vectors?

We are training a 1-class svm using scikit-learn OneClassSVM, which is a wrapper around libsvm. When we run with verbose=True, it reports the number of bounded suppport vectors, nBSV = 106 in the output below. >>> clf = svm.OneClassSVM(nu=0.75,…
Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
2
votes
2 answers

Elixir for Data Science

I recently started playing with Elixir and some patterns remind me of Python, which is widely used in data science projects. For example list comprehensions or anonymous functions. Considering the high performance of Elixir and the ability to run…
Ole Spaarmann
  • 15,845
  • 27
  • 98
  • 160
2
votes
1 answer

Get ImageNet label for a specific index in the 1000-dimensional output tensor in torch

I have the output Tensor of a forward pass for a Facebook implementation of the ResNet model with a cat image. That is a 1000-dimensional Tensor with the classification probabilities. Using torch.topk I can obtain the top-5 probabilities and their…
Manuel Araoz
  • 15,962
  • 24
  • 71
  • 95
2
votes
2 answers

python pandas and matplotlib installation conflict

I am using a Mac OSX Yosemite 10.10.5 and I am trying to practice data science with python on my laptop. I am using python 3.5.1 on a virtualenv however when I install pandas and matplotlib seems like both of them are having a conflict when trying…
Dean Christian Armada
  • 6,724
  • 9
  • 67
  • 116
2
votes
2 answers

How do I check whether a given string is a valid geographical location or not?

I have a list of strings (noun phrases) and I want to filter out all valid geographical locations from them. Most of these (unwanted location names) are country or city or state names. What would be a way to do this? Is there any open-source lookup…
Soumyajit
  • 435
  • 1
  • 9
  • 19
2
votes
1 answer

F# csv type provider questions

I'm struggling to get my ahead around using the csv type provider in F# for simple data analysis tasks. I have done some googling around the 'Seq' function and the csv type provider as a whole but cant find resources relevant to my issue, so help is…
Alex Zevenbergen
  • 181
  • 1
  • 12
2
votes
1 answer

How to represent a linear data in TensorFlow

I'm trying to model some oscilloscope-like data in TensorFlow - a linear stream of energy pulses with a duration, intensity, etc. - but otherwise performing very similar classification tasks, and I'm having trouble figuring out how best to represent…
BioInfoBrett
  • 305
  • 1
  • 8
2
votes
0 answers

Finding k for kmeans in python

So I have a dataset consisting 130000 points, in the format (x,y). My final goal is to cluster this data using kmeans. But for applying that, I need to find the optimum number of clusters to pass to the kmeans algorithm. How should I apply something…
2
votes
1 answer

SVM for text classification in R

I am using SVM to classify my text where in i don't actually get the result instead get with numerical probabilities. Dataframe (1:20 trained set, 21:50 test set) Updated: ou <- structure(list(text = structure(c(1L, 6L, 1L, 1L, 8L, 13L, 24L,…
KRU
  • 291
  • 4
  • 18
1
vote
3 answers

Time Series Long to Wide Format R?

In R, I have a time series ts_big in long format as shown below, with observations of type A and B: ts1<-tibble(dates=c("2023-01-01","2023-02-01","2023-03-01", "2023-04-01"), numbers_1=c(1.0, 2.8, 2.9, 2.0), …
James Rider
  • 633
  • 1
  • 9
1
vote
1 answer

Datetime column deformes when it is converted to parquet file

I am working on a csv file which includes a column including dates, but dtype of this column is actually just object so I changed it to datetime. This part went without a flaw data wasn't changed except it's datatype. But when I turn this dataframe…