Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

votes

1 answer

ELKI OPTICSXi - how set xi?

I'm trying to use ELKI to cluster a dataset of geolocations using OPTICS. I've understood that to extract the clusters, I need to use the OPTICSXi algorithm rather than OPTICS which computes just the clusters order. I was wondering if you could give…

cluster-analysis data-mining elki optics-algorithm

asked Nov 18 '13 at 18:00

Deborah

votes

3 answers

can "splitting attribute" appear many times in decision tree?

Just want to clarify one thing: the same attribute can appear in decision tree for many times as long as they are in different "branches" right?

machine-learning data-mining decision-tree

asked Nov 15 '13 at 03:45

yvetterowe

1,239
7
20
34

votes

1 answer

Cluster high dimensional data with python and DBSCAN

I have a dataset with 1000 dimensions and I am trying to cluster the data with DBSCAN in Python. I have a hard time understanding what metric to choose and why. Can someone explain this? And how should I decide what values to set eps to? I am…

python cluster-analysis data-mining dbscan n-dimensional

asked Apr 22 '13 at 14:14

Ekgren

1,024
1
9
13

votes

2 answers

ELKI implementation of OPTICS clustering algorithm detects only one cluster

I'm having issue with using OPTICS implementation in ELKI environment. I have used the same data for DBSCAN implementation and it worked like a charm. Probably I'm missing something with parameters but I can't figure it out, everything seems to be…

cluster-analysis data-mining dbscan elki optics-algorithm

asked Dec 25 '12 at 14:54

Ilya Khaustov

votes

2 answers

Removing outliers from a k-mean cluster

I have number of smaller data sets, containing 10 XY coordinates each. I am using Matlab (R2012a)and k-means to obtain a centroid. In some of the clusters (see figure below) I can see some extreme points, beacuse my dataset are as small as they are,…

matlab data-mining cluster-analysis k-means outliers

asked Dec 21 '12 at 11:35

carro

votes

5 answers

How to approach Machine Learning problems with dynamically sized input collection?

I'm approaching a problem trying to classify a data sample as good or bad quality with machine learning. The data sample is stored in a relational database. A sample contains the attributes id, name, number of up-votes (for good/bad quality…

machine-learning relational-database neural-network data-mining feature-extraction

asked Dec 06 '12 at 10:09

user822448

votes

3 answers

how to speed up color-clustering in openCV?

for a project I want to implement a color-clustering algorithm, which replace similar colors with the average color of a cluster. For now, I use the kmeans-algorithm to cluster the whole image . But this take's a long time. Has someone an idea how…

opencv cluster-analysis data-mining k-means image-segmentation

asked Nov 28 '12 at 20:10

501 - not implemented

2,638
4
39
74

votes

4 answers

Machine Learning Algorithm for Completing Sparse Matrix Data

I've seen some machine learning questions on here so I figured I would post a related question: Suppose I have a dataset where athletes participate at running competitions of 10 km and 20 km with hilly courses i.e. every competition has its own…

algorithm machine-learning data-mining

asked Nov 21 '12 at 16:42

user1141785

votes

3 answers

How should I start with learning math required for AI

I have studied mathematics, but that was long time ago. I have been a programmer for 8 years but when I started to study concepts in AI and data mining I find it very difficult to understand the theory. Now I have wasted 2-3 years and I have got…

math artificial-intelligence data-mining

asked Oct 23 '12 at 02:04

Mirage

30,868
62
166
261

votes

1 answer

What is evaluation of a cluster in WEKA?

What do we mean when we say that we are evaluating the clusters in WEKA frmework? Clustering is an unsupervised approach to grouping objects. What do we mean when we say we want to evaluate the result? Also, in addition to this, when we say that we…

java machine-learning data-mining weka

asked Jun 04 '12 at 09:23

London guy

27,522
44
121
179

votes

2 answers

How do I create a new data table in Orange?

I am using Orange (in Python) for some data mining tasks. More specifically, for clustering. Although I have gone through the tutorial and read most of the documentation, I still have a problem. All the examples in docs and tutorials assume that I…

python data-mining orange

asked Jan 24 '12 at 12:22

George Eracleous

4,278
6
41
50

votes

2 answers

How is BI related to data mining?

I'm a little confused on how to connect BI with data mining. Can BI be termed as some kind of a manifestation of data mining? How different is a BI tool like Microsoft Analysis Services from a data mining tool like Weka? I guess BI involves more of…

olap business-intelligence data-mining

asked May 09 '09 at 23:59

Arnkrishn

29,828
40
114
128

votes

2 answers

sequence mining for time and product prediction

I am facing a tricky problem about sequence mining, say I have 10 products, I have millions of records each containing user, product and timestamp of purchase . Each user may have only 1 record or 100 records.. such as : user 1, p1, t1 user 1, p1,…

algorithm artificial-intelligence machine-learning data-mining

asked Dec 08 '11 at 17:14

yzhang

votes

2 answers

Similarity matrix -> feature vectors algorithm?

If we have a set of M words, and know the similarity of the meaning of each pair of words in advance (have a M x M matrix of similarities), which algorithm can we use to make one k-dimensional bit vector for each word, so that each pair of words can…

algorithm vector machine-learning data-mining similarity

asked Oct 12 '11 at 06:00

Ognjen

2,508
2
31
47

votes

2 answers

Implementing proximity matrix for clustering

Please I am a little new to this field so pardon me if the question sound trivial or basic. I have a group of dataset(Bag of words to be specific) and I need to generate a proximity matrix by using their edit distance from each other to find and…

r machine-learning cluster-analysis data-mining

asked Aug 08 '11 at 19:09

damola

Prev 1 2 3

…

99 100 Next