Questions tagged [categorization]
228 questions
33
votes
2 answers
Categorize numeric variable with mutate
I would like to a categorize numeric variable in my data.frame object with the use of dplyr (and have no idea how to do it).
Without dplyr, I would probably do something like:
df <- data.frame(a = rnorm(1e3), b = rnorm(1e3))
df$a <- cut(df$a ,…

Marta Karas
- 4,967
- 10
- 47
- 77
32
votes
4 answers
Categorize numeric variable into group/ bins/ breaks
I am trying to categorize a numeric variable (age) into groups defined by intervals so it will not be continuous. I have this code:
data$agegrp(data$age >= 40 & data$age <= 49) <- 3
data$agegrp(data$age >= 30 & data$age <= 39) <-…

leian
- 443
- 2
- 5
- 5
20
votes
4 answers
How does music fingerprinting work (for sites such as Shazam and Lala.com)?
My large (120gb) music collection contains many duplicate songs, and I've been trying to fingerprint tracks in the hopes of detecting duplicates. And since I'm a CS Major I'm very curious as to what is done out there? Nothing I do has nearly the…

Niels Joubert
- 342
- 3
- 8
18
votes
1 answer
plotting in different shapes using pch= argument
If I am using R to plot. How can I assign a particular shape to data points belonging to one category (using the pch argument to plot()) based on a column in my data frame that has the categorical data? Will using as.factor() to group data and then…

Anurag Mishra
- 1,007
- 6
- 16
- 23
8
votes
4 answers
Domain name classification API
I need to categorize domains into different categories that offer the best use of a domain name.
Like categorizing 'gamez.com' as a gaming portal.
Is there any service that offers classification of domain name like Sedo is doing?

Paige Cherry
- 175
- 2
- 7
8
votes
3 answers
Extracting motion data from a list of coordinates
I have a series of CSV files of timestamped coordinates (X, Y, and Z in mm). What would be the simplest way to extract motion data from them?
Measurables
The information I'd like to extract includes the following:
Number of direction…

Tom Wright
- 11,278
- 15
- 74
- 148
7
votes
2 answers
How to change multiple Pandas DF columns to categorical without a loop
I have a DataFrame where I want to change several columns from type 'object' to 'category'.
I can change several columns at the same time for float,
dftest[['col3', 'col4', 'col5', 'col6']] = \
dftest[['col3', 'col4', 'col5',…

Pablo Marin-Garcia
- 4,151
- 2
- 32
- 50
7
votes
3 answers
Is integration testing an umbrella term and if so, what types of tests does it include?
I find the concept of 'integration testing' confusing. There seems to be quite a few explanations and scopes:
Functional/acceptance testing (e.g. testing the user interface with for example, Selenium)
Testing the integration of different…

Tuukka Mustonen
- 4,722
- 9
- 49
- 79
7
votes
4 answers
text categorization classifiers
Does anybody know of good open-source text-categorization models? I know about Stanford Classifier, Weka, Mallet, etc. but all of them require training.
I need to classify news articles into Sports/Politics/Health/Gaming/etc. Is there any…

MFARID
- 700
- 1
- 12
- 18
6
votes
1 answer
Rails 4 collection_check_boxes, with a has_many through
I'm trying to associate categories to products.
The way I've implemented it so far is
Class Product
has_many :categorizations
has_many :categories, through: :categorizations
.
Class Categorization
belongs_to :product
belongs_to…

JacobJuul
- 152
- 1
- 12
5
votes
5 answers
Designing a SQL table with hierarchy/sub-categories
I have a table that looks something like this:
ID | Keyword | Category | Sub-Category | Sub-Sub-Category | Sub-Sub-Sub-Category
Do i need to split it up in two tables (a keyword table and a categories table with parent id)
if one…

chips
- 2,296
- 2
- 16
- 17
5
votes
1 answer
Reverse query matching solr
I have a list of user queries to solr from a website (100's of thousands of them). My requirement is to return all the queries, in the given list, that are true for a document. I know I could index that one document and loop through the list of…

everreadyeddy
- 738
- 1
- 8
- 18
5
votes
2 answers
Caluculating IDF(Inverse Document Frequency) for document categorization
I have doubt in calculating IDF (Inverse Document Frequency) in document categorization. I have more than one category with multiple documents for training. I am calculating IDF for each term in a document using following formula:
IDF(t,D)=log(Total…

vignesh kumar rathakumar
- 628
- 8
- 19
4
votes
3 answers
Algorithms used for programmatic classification of recipes
I'm interested in classifying recipes programmatically based on a statistical analysis of various properties of the recipe. In other words, I want to classify a recipe as Breakfast, Lunch, Dinner or Dessert without any user input.
The properties I…

Mike Christensen
- 88,082
- 50
- 208
- 326
4
votes
1 answer
classification using lingpipe
As a part of my academic research project, I am trying to build an application wherein I will have a set of urls retrieved from the web. The task is classify each of these urls into some category.
For Instance, the following URL is regarding cricket…

funnyguy
- 513
- 3
- 6
- 15