Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data-mining.

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions

votes

1 answer

Cannot load plugin `ipynb.markup` | ImportError: No module named 'IPython' when i input"pelican content"

i want to use "pelican content" to create a html file with a '.ipynb' file, however,it goes wrong. (C:\Users\Administrator\Anaconda3) D:\jupyter-blog>pelican content WARNING: PLUGIN_PATH setting has been replaced by PLUGIN_PATHS, moving it to the…

jupyter-notebook jupyter data-science pelican

asked Jul 22 '17 at 00:00

Victor

votes

3 answers

In machine learning which algorithm should I use to recommend, based on different features like rating,type,gender etc

I am developing a website, which will recommend recipes to the visitors based on their data. I am collecting data from their profile, website activity and facebook. Currently I have data like [username/userId, rating of recipes, age, gender,…

machine-learning pyspark apache-spark-mllib data-science

asked Jul 14 '17 at 19:36

JAGDISH CHAUDHARI

votes

1 answer

How to crosstab this table with pandas?

I have this data and i want to cross-tabulate between the GDP level (above average vs. below average) vs. Level of alcohol consumption (above average vs. below average). and find the correlation. data I'm trying this but is not what i…

pandas crosstab data-science

asked Jul 05 '17 at 04:32

GeeGeeks

votes

2 answers

Feature Engineering in Python with Pandas Using Multiple Rows Per Calculation

I have CSV data in the following format: +-----------------+--------+-------------+ | reservation_num | rate | guest_name | +-----------------+--------+-------------+ | B874576 | 169.95 | Bob Smith | | H786234 | 258.95 | Jane…

python python-3.x pandas machine-learning data-science

asked Jul 03 '17 at 18:49

HMLDude

1,547
7
27
47

votes

2 answers

How to access python groupby objects values

I group a pandas dataframe using groupby() function with multiple columns. df_tr_mod = df_tr.groupby(['Col1','Col2']).aCol.agg(['count']) Now I want to access this count values (I want to multiply this all count values by 10) How i can do this?

python pandas data-science pandas-groupby

asked Jun 29 '17 at 03:27

GihanDB

votes

1 answer

Time-Based Clustering of Multidimensional Data

I'm trying to do clustering of a large number of people based on the pattern of their hours worked across a week. This is an example of the data I'm working with: table, th, td { border: 1px solid black; } …

r analytics data-science

asked Jun 27 '17 at 09:45

NewbCoder

votes

2 answers

Scatter_Matrix Will Not Display Using Pandas and

Working through following the Machine Learning Tutorial: http://machinelearningmastery.com/machine-learning-in-python-step-by-step/ Specifically, Section 4.2. Unfortunately, my code is throwing an error NameError: name 'scatter_matrix' is not…

python pandas matplotlib machine-learning data-science

asked Jun 16 '17 at 19:01

HMLDude

1,547
7
27
47

votes

1 answer

Optimizing over two loss functions in difrent ranges.

I am optimizing over two loss functions which take very different values. To give an example: loss1 = 1534 loss2 = 0.723 and I want to optimize over loss1+loss2. Would rescaling loss1 to values closer to loss2 be a good idea? I tried the naive way…

optimization data-science loss cost-based-optimizer

asked Jun 13 '17 at 13:19

Qubix

4,161
7
36
73

votes

1 answer

How to edit specific portions of the File upload functionality in Shiny?

I am totally new to R programming (and StackOverflow too) and was working on a small R project using the Shiny package(It seemed a lot easier and in tune to my requirements). Now I need to upload a .csv file for which Shiny has already provided a…

r shiny data-science

asked Jun 11 '17 at 09:31

total_noob

votes

0 answers

Undersampling for multilabel imbalanced datasets in pandas

I'm working on a roll-your-own undersampling function, since imblearn does not work neatly with multi-label classification (e.g. it only accepts one dimensional y). I want to iterate through X and y, removing a row every 2 or 3 rows that are part…

pandas data-science imblearn

asked May 31 '17 at 19:51

tw0000

votes

3 answers

Drop a row and column at the same time Pandas Dataframe

I have a dataframe which is something like this Victim Sex Female Male Unknown Perpetrator Sex Female 10850 37618 24 Male 99354 299781 92 Unknown 33068 156545 …

python pandas dataframe data-science

asked May 24 '17 at 07:48

nrmb

votes

4 answers

Pandas series between two date time files

My question is about using Pandas time series. I have one file(Spots) that has pandas time series for a month's data with 7.5 seconds range. Example : 2016-11-01 00:00:00,0 2016-11-01 00:00:07.500000,1 2016-11-01 00:00:15,2 2016-11-01…

python-2.7 pandas data-science

asked May 03 '17 at 02:27

Himanshu Sharma

votes

1 answer

Failing to ignore NAs in my list of files

I have a list of files (from 1 to 332) inside my directory. The file1 corresponds to id1, and the file2 corresponds to id2, and so on and so forth. Each file contains 4 columns, and I have to calculate the sums and lengths of the 2th column…

r data-science

asked May 02 '17 at 10:20

Kathia

votes

1 answer

How should I train my train models (multiple or single) with Azure Machine Learning?

I am working on my thesis (making the traffic lights system work more efficiently by letting them learn) and in my first part of this research, which is how to predict the traffic intensities of the next fifteen minutes, I have to predict the…

machine-learning regression prediction data-science azure-machine-learning-service

asked Apr 17 '17 at 12:32

A. Gh

votes

0 answers

Running a Google Compute Engine VM instance entirely on a RAM disk

I'm trying to develop a data exploration environment for heavy processing of "Small Data" (10 - 30 GB). Reliability and stability are not concerns for these lightweight environments (that basically just contain Jupyter, Julia, Python, and R, plus…

google-cloud-platform google-compute-engine data-science ramdisk

asked Apr 12 '17 at 23:12

Nick

5,228
9
40
69

Prev 1 2 3

…

99 100 Next

Name