Highest Voted 'data-science-experience' Questions

1

vote

1 answer

Is it possible for a spark job on bluemix to see a list of the other processes on the operating system?

A common approach for connecting to third party systems from spark is to provide the credentials for the systems as arguments to the spark script. However, this raises some questions about security. E.g. See this question Bluemix spark-submit --…

asked May 25 '16 at 21:46

Chris Snow

23,813
35
144
309

1

vote

3 answers

!pip install nltk -> permission denied

I'm trying to install nltk with the following notebook command: !pip install nltk However, that throws the following error: error: could not create '/usr/local/src/bluemix_ipythonspark_141/notebook/lib/python2.7/site-packages/nltk': Permission…

apache-spark ibm-cloud nltk jupyter data-science-experience

asked Dec 01 '15 at 22:22

Chris Snow

23,813
35
144
309

0

votes

0 answers

Handling a Dataset with a High Percentage (25%) of Missing Values

I am working on a project that involves a dataset with a significant amount of missing data—approximately 25% of the dataset entries are missing. The dataset is large and diverse, encompassing multiple features relevant to my analysis. I am keen to…

python-2.7 data-science data-analysis data-wrangling data-science-experience

asked Aug 26 '23 at 15:54

Ahmed Adel

49
6

0

votes

1 answer

I want to improve the efficiency of cosine similarity calculation to make it faster

I have a numpy array of size (96341,1000). I want to find the cosine similarity of this array. The machine I'm working on is 8 vCPU 32 GB. This is my initial code. And I want this function to run faster , can control/limit the amount of memory used…

numpy matrix cosine-similarity data-science-experience

asked Jun 22 '23 at 04:50

Nared Fuengverojsakul

19
2

0

votes

1 answer

how can merge multiple part file into single file in databricks

i am trying to merge multiple part file into single file. In staging folder, it itterating the all files, schema is same. part file we are converting .Tab files. Files are generating based on salesorgcode ex:7001 ,600,8002 every country having…

pandas azure azure-databricks data-science-experience pyspark-pandas

asked Jun 14 '23 at 08:46

KIRAN KUMAR

7
2

0

votes

0 answers

Getting error when running deepseed in dolly training with exits with return code = -9

Describe the bug I started running deepseed config on EleutherAI/pythia-2.8b model, I ran into error exits with run code= -9. After splitting and preprocessing the dataset i am getting [ERROR] [launch.py:434:sigkill_handler]. Log output I used…

python-3.x databricks data-science-experience

asked May 06 '23 at 14:39

Dinesh

9
2

0

votes

0 answers

AOC CURVE Issue

I have written AOC_Curve: X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=.20,random_state=1) # AUC and ROC for the training data from sklearn.model_selection import * from sklearn.linear_model import * # predict probabilities probs =…

data-science-experience

asked Mar 01 '23 at 14:16

Rohit Kulkarni

21
1
5

0

votes

0 answers

AttributeError: 'CountVectorizer' object has no attribute 'fit_transfrom'

1 from sklearn.feature_your textextraction.text import CountVectorizer 2 cv = CountVectorizer() ----> 3 X = cv.fit_transfrom(df['transformed_text']).toarray() no error in this line

python-3.x google-colaboratory sklearn-pandas countvectorizer data-science-experience

asked Feb 06 '23 at 15:04

67_Thorat Manish

1

0

votes

2 answers

How to take a sum (in denominator) for calculating group by weighted average in a dataframe?

I have a data frame that looks like this. import pandas as pd import numpy as np data = [ ['A',1,2,3,4], ['A',5,6,7,8], ['A',9,10,11,12], ['B',13,14,15,16], ['B',17,18,19,20], ['B',21,22,23,24], ['B',25,26,27,28], …

python pandas dataframe data-science-experience

asked Dec 04 '22 at 02:03

Bad Coder

177
11

0

votes

4 answers

How to calculate percentage change with zero in pandas?

I want to calculate the percentage change for the following data frame. import pandas as pd df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'], 'points': [12, 0, 19, 22, 0, 25, 0, 30], 'score':…

python pandas dataframe group-by data-science-experience

asked Nov 18 '22 at 19:23

Bad Coder

177
11

0

votes

1 answer

How can we assign new variables after each for loop iteration in python?

Although it might be easy but i am not able to get a hang of it.. I want to assign the result to a new variable every time the for loop iteration occurs. I don't wish to do with initializing the list or dict and then adding the result. Because that…

python data-science-experience

asked Nov 18 '22 at 14:26

Ravi_DataScientist

3
2

0

votes

1 answer

How to check different rows values of a column within the same group and return a specific value?

I have the following code that generates the two columns. import pandas as pd data = {'Group': ['1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3', '3', '4',…

python pandas dataframe group-by data-science-experience

asked Nov 07 '22 at 20:19

Bad Coder

177
11

0

votes

0 answers

How to match labels for train and test in machine learning using python?

The title may not be clear but I will try to explain my problem as clearly as possible. I have dummy data i.e. data = {'month': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01', '2022-03-01', '2022-01-01', '2022-02-01',…

python pandas dataframe scikit-learn data-science-experience

asked Sep 21 '22 at 17:56

Bad Coder

177
11

0

votes

2 answers

How can we perform group by single column without aggregation in pandas?

I have a couple of questions about the group by function. 1. I would like to group by pandas data frame by single column without aggregation. 2. After group by, I would like to split the dataset into several datasets by the month date. So, I wasn't…

python pandas dataframe group-by data-science-experience

asked Sep 20 '22 at 16:09

Bad Coder

177
11

0

votes

0 answers

Does SVR handles outliers and seasonality?

I have time-series data and trying to build a model using Support Vector Regression(SVR). If I use SVR to build to model, should I be worried about seasonality, trend, and outliers? If I should care about these things then how can I deal with…

time-series regression data-science svm data-science-experience

asked Sep 12 '22 at 22:16

Bad Coder

177
11

Questions tagged [data-science-experience]