Questions tagged [kaggle]

Relating to Competitions, Datasets, Kernels, Learn, or Kaggle's API.

Relating to the following Kaggle data science categories:

1115 questions
0
votes
0 answers

Error for SVM tuning in R for Kaggle Titanic dataset

I'm trying to complete tuning for an SVM model in R, using the Titanic Kaggle dataset. When I run the following code: tune.out = tune(svm, Survived ~ Pclass + Sex + Age + Fare + Embarked + family, data = boat, kernel = "linear", …
zthomas.nc
  • 3,689
  • 8
  • 35
  • 49
0
votes
1 answer

kaggle titanic Subset Women and Children

I am trying to make a feature variable from the Titanic dataset on kaggle by pulling specific information from two variables but I can't figure out how to code it. I want to combine the "Sex" variable and the "Parch" variable. What I want is if the…
Darrin Thomas
  • 159
  • 1
  • 10
0
votes
1 answer

Indexing using iloc

Going through a kaggle tutorial the now, while I get the basic idea of what it does, from looking at the output and reading up the documentation, I think I need confirmation of what is going on here: predictors = ["Pclass", "Sex", "Age", "SibSp",…
PurpleCoffee
  • 35
  • 1
  • 2
  • 10
0
votes
0 answers

Smarter than an Eighth grader? Kaggle AI Challenge. R

I am working on the Allen AI Science Challenge currently up on Kaggle. The idea behind the challenge is to train to a model using the training data provided (a set of Eighth grade level science questions along with four answer options, one of which…
0
votes
2 answers

error: cannot convert argument to integer in Python

I am working on a dataset from Kaggle and I want to extract the titles of a Pandas column with names. I use the following code: def extract_patt(patt, linea): matchObj = re.match(patt, linea) result = "" if matchObj: …
Tasos
  • 7,325
  • 18
  • 83
  • 176
0
votes
1 answer

Infrastructure for running Spark

I'm participating in a Kaggle competition with 4 other people. We all met in a MOOC by edx.org. Although we can code using the Apache Spark engine, we don't know how to set up a cluster and install the necessary software to run spark on it. Ideally,…
Paca
  • 69
  • 7
0
votes
0 answers

Python code runs in loop with PTVS in Visual Studio 2013

I have a simple Python code (digit recognition exercise from Kaggle), which runs fine if I execute it from the command line (I use Windows 8.1 64-bit with Enthought Canopy 1.4.1). import numpy from sklearn.ensemble import RandomForestClassifier from…
darXider
  • 447
  • 5
  • 16
0
votes
1 answer

Read multiple files from a directory using Spark

I am trying to solve this problem at kaggle using spark: the hierarchy of input is like this : drivers/{driver_id}/trip#.csv e.g., drivers/1/1.csv drivers/1/2.csv drivers/2/1.csv I want to read the parent directory "drivers" and for…
vishnu viswanath
  • 3,794
  • 2
  • 36
  • 47
0
votes
0 answers

Kaggle Titanic: Machine Learning From Disaster Decision Tree for Cabin Prediction

One of the variables, 'Cabin', has a hefty amount of NAs. I am trying to use a decision tree (rpart) to predict the Cabin deck of passengers whose Cabin is not available. Currently, this is the structure of my data table, which is a rbind of the…
0
votes
1 answer

RuntimeError on windows trying python multiprocessing

I'm going to dump the error code I got while try a python script : Preprocess validation data upfront Using gpu device 0: Tesla K20c Traceback (most recent call last): File "", line 1, in File…
Thalish Sajeed
  • 1,351
  • 11
  • 25
0
votes
1 answer

RStudio Shiny Error - number of items to replace is not a multiple of replacement length

I am fairly new to R and currently working on a Shiny web app using RStuido to recognise handwritten digits. The data I am using is from a Kaggle competition: Digit-Recogniser I have the following function to render average representations of…
umutesen
  • 2,523
  • 1
  • 27
  • 41
0
votes
0 answers

TclError with kaggle tutorial in ubuntu 14.04 : ipython, pylab and pandas

import pandas as pd df = pd.read_csv('./csv/train.csv', header=0) import pylab as P df['Age'].hist() P.show() TclError Traceback (most recent call last) in () 1 import…
0
votes
0 answers

Merging data in R; repeats data in columns?

I have to datasets from these links: cmu: http://lib.stat.cmu.edu/S/Harrell/data/descriptions/titanic.html kaggle: https://www.kaggle.com/c/titanic-gettingStarted/data When I try to merge them, my columns to the right repeat, any way I can fix this?…
Redspart
  • 3
  • 1
  • 3
0
votes
1 answer

how to filter data which integer64 class in data.table in r

I have a 20GB transaction data set from kaggle (http://www.kaggle.com/c/acquire-valued-shoppers-challenge/data). row are over 300 million and variables are 11. It is too heavy to handle with R. So I want to filter data. id class is…
Rokmc1050
  • 463
  • 1
  • 6
  • 16
0
votes
1 answer

Trying to use separate to split one column into more than 2 columns

I'm new to R and practicing using the Titanic data set from Kaggle. I am attempting to separate last name, first name, salutation, and extra information into separate columns so that I can try to categorize the age of the passengers - adult or…
SandraK
  • 1
  • 1
  • 3