Questions tagged [data-science-experience]

IBM Data Science Experience is an interactive, collaborative, cloud-based environment where data scientists can use multiple tools to activate their insights.

IBM Data Science Experience is an interactive, collaborative, cloud-based environment where data scientists can use multiple tools to activate their insights.

Source: http://datascience.ibm.com/blog/welcome-to-the-data-science-experience/

261 questions
1
vote
1 answer

Is it possible for a spark job on bluemix to see a list of the other processes on the operating system?

A common approach for connecting to third party systems from spark is to provide the credentials for the systems as arguments to the spark script. However, this raises some questions about security. E.g. See this question Bluemix spark-submit --…
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
1
vote
3 answers

!pip install nltk -> permission denied

I'm trying to install nltk with the following notebook command: !pip install nltk However, that throws the following error: error: could not create '/usr/local/src/bluemix_ipythonspark_141/notebook/lib/python2.7/site-packages/nltk': Permission…
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
0
votes
0 answers

Handling a Dataset with a High Percentage (25%) of Missing Values

I am working on a project that involves a dataset with a significant amount of missing data—approximately 25% of the dataset entries are missing. The dataset is large and diverse, encompassing multiple features relevant to my analysis. I am keen to…
0
votes
1 answer

I want to improve the efficiency of cosine similarity calculation to make it faster

I have a numpy array of size (96341,1000). I want to find the cosine similarity of this array. The machine I'm working on is 8 vCPU 32 GB. This is my initial code. And I want this function to run faster , can control/limit the amount of memory used…
0
votes
1 answer

how can merge multiple part file into single file in databricks

i am trying to merge multiple part file into single file. In staging folder, it itterating the all files, schema is same. part file we are converting .Tab files. Files are generating based on salesorgcode ex:7001 ,600,8002 every country having…
0
votes
0 answers

Getting error when running deepseed in dolly training with exits with return code = -9

Describe the bug I started running deepseed config on EleutherAI/pythia-2.8b model, I ran into error exits with run code= -9. After splitting and preprocessing the dataset i am getting [ERROR] [launch.py:434:sigkill_handler]. Log output I used…
Dinesh
  • 9
  • 2
0
votes
0 answers

AOC CURVE Issue

I have written AOC_Curve: X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=.20,random_state=1) # AUC and ROC for the training data from sklearn.model_selection import * from sklearn.linear_model import * # predict probabilities probs =…
0
votes
0 answers

AttributeError: 'CountVectorizer' object has no attribute 'fit_transfrom'

1 from sklearn.feature_your textextraction.text import CountVectorizer 2 cv = CountVectorizer() ----> 3 X = cv.fit_transfrom(df['transformed_text']).toarray() no error in this line
0
votes
2 answers

How to take a sum (in denominator) for calculating group by weighted average in a dataframe?

I have a data frame that looks like this. import pandas as pd import numpy as np data = [ ['A',1,2,3,4], ['A',5,6,7,8], ['A',9,10,11,12], ['B',13,14,15,16], ['B',17,18,19,20], ['B',21,22,23,24], ['B',25,26,27,28], …
Bad Coder
  • 177
  • 11
0
votes
4 answers

How to calculate percentage change with zero in pandas?

I want to calculate the percentage change for the following data frame. import pandas as pd df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'], 'points': [12, 0, 19, 22, 0, 25, 0, 30], 'score':…
0
votes
1 answer

How can we assign new variables after each for loop iteration in python?

Although it might be easy but i am not able to get a hang of it.. I want to assign the result to a new variable every time the for loop iteration occurs. I don't wish to do with initializing the list or dict and then adding the result. Because that…
0
votes
1 answer

How to check different rows values of a column within the same group and return a specific value?

I have the following code that generates the two columns. import pandas as pd data = {'Group': ['1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3', '3', '4',…
0
votes
0 answers

How to match labels for train and test in machine learning using python?

The title may not be clear but I will try to explain my problem as clearly as possible. I have dummy data i.e. data = {'month': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01',…
0
votes
2 answers

How can we perform group by single column without aggregation in pandas?

I have a couple of questions about the group by function. 1. I would like to group by pandas data frame by single column without aggregation. 2. After group by, I would like to split the dataset into several datasets by the month date. So, I wasn't…
0
votes
0 answers

Does SVR handles outliers and seasonality?

I have time-series data and trying to build a model using Support Vector Regression(SVR). If I use SVR to build to model, should I be worried about seasonality, trend, and outliers? If I should care about these things then how can I deal with…