Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

votes

3 answers

An understandable clusterization

I have a dataset. Each element of this set consists of numerical and categorical variables. Categorical variables are nominal and ordinal. There is some natural structure in this dataset. Commonly, experts clusterize datasets such as mine using…

algorithm machine-learning computer-science data-mining cluster-analysis

asked Aug 28 '12 at 08:01

Artem Pianykh

1,161
1
10
23

votes

3 answers

How to test if a kernel is a valid kernel

If I define my own method of determining the similarity between two input entities of my Support Vector Machine classifier, and thus define it as my kernel, how do I verify if it is indeed a valid kernel that I can use? For example, if my inputs are…

machine-learning data-mining svm

asked Aug 02 '12 at 17:08

London guy

27,522
44
121
179

votes

6 answers

How to find the minimum support in Apriori algorithm

When the percentage values of support and confidence is given how can I find the minimum support in Apriori algorithm. For an example when support and confidence is given as 60% and 60% respectively what is the minimum support?

data-mining apriori

asked Apr 28 '12 at 14:53

Chanikag

1,419
2
18
31

votes

3 answers

Latent Semantic Analysis concepts

I've read about using Singular Value Decomposition (SVD) to do Latent Semantic Analysis (LSA) in corpus of texts. I've understood how to do that, also I understand mathematical concepts of SVD. But I don't understand why does it works applying to…

algorithm nlp data-mining text-mining latent-semantic-indexing

asked Aug 14 '11 at 21:49

stemm

5,960
2
34
64

votes

2 answers

Weka simple K-means clustering assignments

I have what feels like a simple problem, but I can't seem to find an answer. I'm pretty new to Weka, but I feel like I've done a bit of research on this (at least read through the first couple of pages of Google results) and come up dry. I am using…

cluster-analysis data-mining weka k-means

asked Jul 13 '11 at 21:32

machine yearning

9,889
5
38
51

votes

1 answer

Naive Bayesian for Topic detection using "Bag of Words" approach

I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ? Also, i am trying to improve my dictionary as i go along.…

machine-learning nlp data-mining naivebayes

asked May 06 '10 at 14:18

AlgoMan

2,785
6
34
40

votes

3 answers

Cosine distance as vector distance function for k-means

I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at the corresponding place or 0 if that place was not…

cluster-analysis data-mining distance k-means cosine-similarity

asked Aug 07 '14 at 11:15

Thalis K.

7,363
6
39
54

votes

2 answers

What method do you use for selecting the optimum number of clusters in k-means and EM?

Many algorithms for clustering are available. A popular algorithm is the K-means where, based on a given number of clusters, the algorithm iterates to find best clusters for the objects. What method do you use to determine the number of clusters in…

r cluster-analysis data-mining expectation-maximization

asked Feb 22 '10 at 17:53

gd047

29,749
18
107
146

votes

3 answers

WEKA Tutorials / Examples for a Newbie

In a follow-up to this answer I want to ask if any of you know any good (and more importantly easy to understand) tutorials and / or examples of data mining with the Weka toolkit. I've been very interested in Data Mining ever since I've first heard…

machine-learning data-mining weka

asked Feb 19 '10 at 00:07

Alix Axel

151,645
95
393
500

votes

6 answers

Monitor brands with common words

Let's say you should monitor the brand "ONE" online. What algorithms can be used to separate pages about the brand ONE from pages containing the common word ONE? I'm thinking maybe Bayes could work, but are there other ways to do this?

algorithm language-agnostic data-mining linguistics

asked Feb 15 '10 at 12:20

Christian Davén

16,713
12
64
77

votes

4 answers

Hierarchical Clustering: Determine optimal number of cluster and statistically describe Clusters

I could use some advice on methods in R to determine the optimal number of clusters and later on describe the clusters with different statistical criteria. I’m new to R with basic knowledge about the statistical foundations of cluster analysis.…

r data-mining cluster-analysis

asked Nov 06 '12 at 10:51

Joschi

2,941
9
28
36

votes

2 answers

What free/paid search API's allow for programmatic querying and caching/storage of the resulting data?

If you've done any serious research into search API's, you know that most of them have a huge slew of TOS/TOU restrictions that make them nearly impossible to use in anything but the most inane applications. Bing's 2.0 API, Yahoo Search BOSS, Google…

api search screen-scraping data-mining

asked Aug 31 '11 at 23:15

rinogo

8,491
12
61
102

votes

2 answers

Python, Scipy: Building triplets using large adjacency matrix

I am using an adjacency matrix to represent a network of friends which can be visually interpreted as Mary 0 1 1 1 Joe 1 0 1 1 Bob 1 1 0 1 Susan 1 1 1 0 …

python numpy data-mining scipy adjacency-matrix

asked Aug 03 '11 at 19:15

will

votes

4 answers

Outlier detection in data mining

I have a few sets of questions regarding outlier detection: Can we find outliers using k-means and is this a good approach? Is there any clustering algorithm which does not accept any input from the user? Can we use support vector machine or any…

data-mining svm outliers

asked May 17 '11 at 03:53

Navin

votes

5 answers

Randomness in Artificial Intelligence & Machine Learning

This question came to my mind while working on 2 projects in AI and ML. What If I'm building a model (e.g. Classification Neural Network,K-NN, .. etc) and this model uses some function that includes randomness. If I don't fix the seed, then I'm…

artificial-intelligence machine-learning data-mining classification

asked May 05 '11 at 01:32

Morano88

2,047
4
25
44

Prev 1 2 3

…

99 100 Next