Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
2
votes
1 answer

Cannot load plugin `ipynb.markup` | ImportError: No module named 'IPython' when i input"pelican content"

i want to use "pelican content" to create a html file with a '.ipynb' file, however,it goes wrong. (C:\Users\Administrator\Anaconda3) D:\jupyter-blog>pelican content WARNING: PLUGIN_PATH setting has been replaced by PLUGIN_PATHS, moving it to the…
Victor
  • 23
  • 4
2
votes
3 answers

In machine learning which algorithm should I use to recommend, based on different features like rating,type,gender etc

I am developing a website, which will recommend recipes to the visitors based on their data. I am collecting data from their profile, website activity and facebook. Currently I have data like [username/userId, rating of recipes, age, gender,…
2
votes
1 answer

How to crosstab this table with pandas?

I have this data and i want to cross-tabulate between the GDP level (above average vs. below average) vs. Level of alcohol consumption (above average vs. below average). and find the correlation. data I'm trying this but is not what i…
GeeGeeks
  • 77
  • 2
  • 9
2
votes
2 answers

Feature Engineering in Python with Pandas Using Multiple Rows Per Calculation

I have CSV data in the following format: +-----------------+--------+-------------+ | reservation_num | rate | guest_name | +-----------------+--------+-------------+ | B874576 | 169.95 | Bob Smith | | H786234 | 258.95 | Jane…
HMLDude
  • 1,547
  • 7
  • 27
  • 47
2
votes
2 answers

How to access python groupby objects values

I group a pandas dataframe using groupby() function with multiple columns. df_tr_mod = df_tr.groupby(['Col1','Col2']).aCol.agg(['count']) Now I want to access this count values (I want to multiply this all count values by 10) How i can do this?
GihanDB
  • 591
  • 2
  • 6
  • 23
2
votes
1 answer

Time-Based Clustering of Multidimensional Data

I'm trying to do clustering of a large number of people based on the pattern of their hours worked across a week. This is an example of the data I'm working with: table, th, td { border: 1px solid black; } …
NewbCoder
  • 21
  • 1
2
votes
2 answers

Scatter_Matrix Will Not Display Using Pandas and

Working through following the Machine Learning Tutorial: http://machinelearningmastery.com/machine-learning-in-python-step-by-step/ Specifically, Section 4.2. Unfortunately, my code is throwing an error NameError: name 'scatter_matrix' is not…
HMLDude
  • 1,547
  • 7
  • 27
  • 47
2
votes
1 answer

Optimizing over two loss functions in difrent ranges.

I am optimizing over two loss functions which take very different values. To give an example: loss1 = 1534 loss2 = 0.723 and I want to optimize over loss1+loss2. Would rescaling loss1 to values closer to loss2 be a good idea? I tried the naive way…
Qubix
  • 4,161
  • 7
  • 36
  • 73
2
votes
1 answer

How to edit specific portions of the File upload functionality in Shiny?

I am totally new to R programming (and StackOverflow too) and was working on a small R project using the Shiny package(It seemed a lot easier and in tune to my requirements). Now I need to upload a .csv file for which Shiny has already provided a…
total_noob
  • 23
  • 3
2
votes
0 answers

Undersampling for multilabel imbalanced datasets in pandas

I'm working on a roll-your-own undersampling function, since imblearn does not work neatly with multi-label classification (e.g. it only accepts one dimensional y). I want to iterate through X and y, removing a row every 2 or 3 rows that are part…
tw0000
  • 475
  • 1
  • 7
  • 13
2
votes
3 answers

Drop a row and column at the same time Pandas Dataframe

I have a dataframe which is something like this Victim Sex Female Male Unknown Perpetrator Sex Female 10850 37618 24 Male 99354 299781 92 Unknown 33068 156545 …
nrmb
  • 460
  • 1
  • 6
  • 17
2
votes
4 answers

Pandas series between two date time files

My question is about using Pandas time series. I have one file(Spots) that has pandas time series for a month's data with 7.5 seconds range. Example : 2016-11-01 00:00:00,0 2016-11-01 00:00:07.500000,1 2016-11-01 00:00:15,2 2016-11-01…
2
votes
1 answer

Failing to ignore NAs in my list of files

I have a list of files (from 1 to 332) inside my directory. The file1 corresponds to id1, and the file2 corresponds to id2, and so on and so forth. Each file contains 4 columns, and I have to calculate the sums and lengths of the 2th column…
Kathia
  • 502
  • 2
  • 7
  • 20
2
votes
1 answer

How should I train my train models (multiple or single) with Azure Machine Learning?

I am working on my thesis (making the traffic lights system work more efficiently by letting them learn) and in my first part of this research, which is how to predict the traffic intensities of the next fifteen minutes, I have to predict the…
2
votes
0 answers

Running a Google Compute Engine VM instance entirely on a RAM disk

I'm trying to develop a data exploration environment for heavy processing of "Small Data" (10 - 30 GB). Reliability and stability are not concerns for these lightweight environments (that basically just contain Jupyter, Julia, Python, and R, plus…
Nick
  • 5,228
  • 9
  • 40
  • 69
Name