Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data-mining.

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions

votes

1 answer

matplotlib graph for IMDB Voting vs Rating

Plotting a graph between the voting and ratings for movies from IMDB data, What is the best way to show "Weighted Rank" Voting vs Rating Graph with the help of Pandas and Matplotlib. Tried this so far but doesn't appears in correct format, even the…

python pandas matplotlib data-science

asked May 27 '16 at 15:38

min2bro

4,509
5
29
55

votes

2 answers

Is there anyway to know the progress in sklearn GridSearch

For grid search is always time consuming, so I want to see how much it run now. For example ,it might output paramsXXX processed paramsYYY processed ...

python machine-learning scikit-learn data-science

asked May 22 '16 at 14:30

mrbean

votes

0 answers

Can not Connect to a database on Redshift in R by RODBC package

I am trying to connect to A DB on Redshift in r using following syntax (I am using a Mac): odbcConnect("xxxxaddresss.redshift.amazonaws.com:5439", uid = "xxxx", pwd = "xxxx") and get the following errors. Warning messages: 1: In …

mysql r amazon-redshift rodbc data-science

asked Apr 19 '16 at 12:39

Hossein Yousefi

votes

1 answer

Notebook as production rest API

I know databricks offers the possibility to simply convert notebooks into "production-grade" rest APIs. Is there a similar functionality for open source notebooks like Zeppelin, Scala-Notebook or Jupiter Notebook or hue-notebook? It would be great…

apache-spark jupyter-notebook apache-zeppelin data-science

asked Apr 07 '16 at 18:27

Georg Heiler

16,916
36
162
292

votes

0 answers

More training set errors than bounded support vectors?

We are training a 1-class svm using scikit-learn OneClassSVM, which is a wrapper around libsvm. When we run with verbose=True, it reports the number of bounded suppport vectors, nBSV = 106 in the output below. >>> clf = svm.OneClassSVM(nu=0.75,…

machine-learning scikit-learn svm libsvm data-science

asked Mar 29 '16 at 21:24

Daniel Mahler

7,653
5
51
90

votes

2 answers

Elixir for Data Science

I recently started playing with Elixir and some patterns remind me of Python, which is widely used in data science projects. For example list comprehensions or anonymous functions. Considering the high performance of Elixir and the ability to run…

python elixir data-science

asked Mar 01 '16 at 14:25

Ole Spaarmann

15,845
27
98
160

votes

1 answer

Get ImageNet label for a specific index in the 1000-dimensional output tensor in torch

I have the output Tensor of a forward pass for a Facebook implementation of the ResNet model with a cat image. That is a 1000-dimensional Tensor with the classification probabilities. Using torch.topk I can obtain the top-5 probabilities and their…

label torch data-science resnet imagenet

asked Feb 20 '16 at 20:56

Manuel Araoz

15,962
24
71
95

votes

2 answers

python pandas and matplotlib installation conflict

I am using a Mac OSX Yosemite 10.10.5 and I am trying to practice data science with python on my laptop. I am using python 3.5.1 on a virtualenv however when I install pandas and matplotlib seems like both of them are having a conflict when trying…

python pandas matplotlib python-3.5 data-science

asked Jan 23 '16 at 13:47

Dean Christian Armada

6,724
9
67
116

votes

2 answers

How do I check whether a given string is a valid geographical location or not?

I have a list of strings (noun phrases) and I want to filter out all valid geographical locations from them. Most of these (unwanted location names) are country or city or state names. What would be a way to do this? Is there any open-source lookup…

geolocation nlp gis text-mining data-science

asked Jan 08 '16 at 17:37

Soumyajit

votes

1 answer

F# csv type provider questions

I'm struggling to get my ahead around using the csv type provider in F# for simple data analysis tasks. I have done some googling around the 'Seq' function and the csv type provider as a whole but cant find resources relevant to my issue, so help is…

.net f# functional-programming data-science

asked Dec 11 '15 at 13:33

Alex Zevenbergen

votes

1 answer

How to represent a linear data in TensorFlow

I'm trying to model some oscilloscope-like data in TensorFlow - a linear stream of energy pulses with a duration, intensity, etc. - but otherwise performing very similar classification tasks, and I'm having trouble figuring out how best to represent…

machine-learning tensorflow data-science

asked Dec 07 '15 at 04:11

BioInfoBrett

votes

0 answers

Finding k for kmeans in python

So I have a dataset consisting 130000 points, in the format (x,y). My final goal is to cluster this data using kmeans. But for applying that, I need to find the optimum number of clusters to pass to the kmeans algorithm. How should I apply something…

python machine-learning cluster-analysis k-means data-science

asked Nov 19 '15 at 19:50

Siddharth Shah

votes

1 answer

SVM for text classification in R

I am using SVM to classify my text where in i don't actually get the result instead get with numerical probabilities. Dataframe (1:20 trained set, 21:50 test set) Updated: ou <- structure(list(text = structure(c(1L, 6L, 1L, 1L, 8L, 13L, 24L,…

r svm text-classification data-science

asked Apr 17 '15 at 07:11

KRU

vote

3 answers

Time Series Long to Wide Format R?

In R, I have a time series ts_big in long format as shown below, with observations of type A and B: ts1<-tibble(dates=c("2023-01-01","2023-02-01","2023-03-01", "2023-04-01"), numbers_1=c(1.0, 2.8, 2.9, 2.0), …

r dataframe time-series data-science

asked Aug 30 '23 at 00:49

James Rider

vote

1 answer

Datetime column deformes when it is converted to parquet file

I am working on a csv file which includes a column including dates, but dtype of this column is actually just object so I changed it to datetime. This part went without a flaw data wasn't changed except it's datatype. But when I turn this dataframe…

python dataframe datetime data-science parquet

asked Aug 24 '23 at 08:58

miraakbutnotded

Prev 1 2 3

…

99 100 Next