Questions tagged [data-science-experience]

IBM Data Science Experience is an interactive, collaborative, cloud-based environment where data scientists can use multiple tools to activate their insights.

IBM Data Science Experience is an interactive, collaborative, cloud-based environment where data scientists can use multiple tools to activate their insights.

Source: http://datascience.ibm.com/blog/welcome-to-the-data-science-experience/

261 questions
0
votes
1 answer

Scaling of Nominal, Ordinal, Binary with numneric Variable dataset

If the dataset is given in characters i.e. Categorical, then we need to convert them into numerical data using one hot encoding ? My second question is that, One hot encoding only is meaningful for the Nominal datatype or its meaningful for both…
0
votes
1 answer

Problem updating joblib library from GitHub repo in IBM Watson Studio

In my program, I need to use some joblib functions. However, when I run the program, I get the error message: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Apparently the library has been updated in this Github repo but…
0
votes
1 answer

How to display classification report in flask web application

i need output to be displayed in classification matrix from but i am getting a string as output from pyod.models.xgbod import XGBClassifier clf = XGBClassifier(max_depth=15, min_child_weight=4, gamma=0.3, colsample_bytree=0.4) …
0
votes
1 answer

What is the threshold of the difference between training set and testing set?

There is always performance difference between training set and testing test. I am wondering what is the threshold for this difference, which is acceptable or not? For example, maybe the score for training is 87% and for testing is 83%. The 4 %…
henry
  • 11
  • 3
0
votes
1 answer

502 Bad Gateway error when i try to launch neural network modeller canvas from IBM Watson studio?

I am using Neural Network Modeller Beta version and consistently get 502 bad getaway error when i try to launch the modeller flow. Please advise whether it is a known issue and whether there's a fix.
0
votes
1 answer

Is there a way to calculate the coefficient of the Correlation of binary variables between a and b?

So there are two variables a -- Who is greater than 40 year old (BINARY 0 or 1) b -- If they have a Luxury Car (Binary 0 or 1) Now they have the data sum values. Total sample size -- 500 Total number of people…
VPapz
  • 13
  • 3
0
votes
1 answer

Differentiating between numerical and categorical columns

I have started working in a company and we are using a lot of Data tables most of which don't contain a description of columns, and in case a column is categorical most definitions of categories are not defined. I came with a solution to send a list…
bazinga
  • 2,120
  • 4
  • 21
  • 35
0
votes
1 answer

Best method to identify and replace outlier for Salary column in python

What is best method to identify and replace outlier for ApplicantIncome, CoapplicantIncome,LoanAmount,Loan_Amount_Term column in pandas python. I tried IQR with seaborne boxplot, and tried to identified the outlet and fill with NAN record after…
0
votes
1 answer

Model comparison with RMSE

I am newby on data science and would like to ask for help of model selection. I have built 8 models to predict Salary vs year exp, position name and location. Then, I tried to compare 8 models by RMSE. But finally, I am not sure that which model I…
0
votes
1 answer

Right way to serialize a Random Forest Regression File

I am working on building a Random Forest Regression model for predicting ETA. I am saving the model in pickle format by using pickle package. I have also used joblib to save the model. But the size of file is really large (more than 100 GB). I would…
0
votes
0 answers

what is the mean term add to Moving average model in time series?

that equation is gotten from here Is that mean term represents the best fit for the bais term for MA model gotten by minimizing the mean squared error equation?
Chaymae Ahmed
  • 371
  • 1
  • 4
  • 14
0
votes
1 answer

Why are we using ARMA model that mixes AR and MA model. Isn't AR or MA sufficient?

Why are we using ARMA model that mixes AR and MA model. Isn't AR or MA sufficient? I know that AR model is a function of previous readings and MA model is a function of previous errors, also know that identifying AR model is best done with PACF and…
Chaymae Ahmed
  • 371
  • 1
  • 4
  • 14
0
votes
1 answer

Batch size and weight updates

Could you clarify this question. I have Epoch of 1000 and Batch size 100. (Then 1 epoch will be 10 batch sizes). May I know when will the weights get updated. WIll it update for every batch size or at the end of every epoch. Thanks
Chakra
  • 647
  • 1
  • 8
  • 16
0
votes
1 answer

I was visualizing a data set using seaborn in python3 but its giving me an error. unsupported operand type(s) for /: 'str' and 'int'

import pandas as pd from pandas import Series,DataFrame import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_style('whitegrid') %matplotlib…
0
votes
0 answers

How to replace NA's in categorical or continuous variable using WOE and IV in R?

I have a dataframe with 25 variables with 70000 rows. Out of which 3 variables have 1000, 250 and 250 NA values respectively. How to replace NA values using WOE method. Do we need to replace all the columns with WOE values or only find the WOE for 3…
Rameez Shaik
  • 31
  • 1
  • 5