Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

3094 questions
7
votes
1 answer

ELKI OPTICSXi - how set xi?

I'm trying to use ELKI to cluster a dataset of geolocations using OPTICS. I've understood that to extract the clusters, I need to use the OPTICSXi algorithm rather than OPTICS which computes just the clusters order. I was wondering if you could give…
Deborah
  • 355
  • 1
  • 5
  • 15
7
votes
3 answers

can "splitting attribute" appear many times in decision tree?

Just want to clarify one thing: the same attribute can appear in decision tree for many times as long as they are in different "branches" right?
yvetterowe
  • 1,239
  • 7
  • 20
  • 34
7
votes
1 answer

Cluster high dimensional data with python and DBSCAN

I have a dataset with 1000 dimensions and I am trying to cluster the data with DBSCAN in Python. I have a hard time understanding what metric to choose and why. Can someone explain this? And how should I decide what values to set eps to? I am…
Ekgren
  • 1,024
  • 1
  • 9
  • 13
7
votes
2 answers

ELKI implementation of OPTICS clustering algorithm detects only one cluster

I'm having issue with using OPTICS implementation in ELKI environment. I have used the same data for DBSCAN implementation and it worked like a charm. Probably I'm missing something with parameters but I can't figure it out, everything seems to be…
7
votes
2 answers

Removing outliers from a k-mean cluster

I have number of smaller data sets, containing 10 XY coordinates each. I am using Matlab (R2012a)and k-means to obtain a centroid. In some of the clusters (see figure below) I can see some extreme points, beacuse my dataset are as small as they are,…
carro
  • 109
  • 1
  • 1
  • 6
7
votes
5 answers

How to approach Machine Learning problems with dynamically sized input collection?

I'm approaching a problem trying to classify a data sample as good or bad quality with machine learning. The data sample is stored in a relational database. A sample contains the attributes id, name, number of up-votes (for good/bad quality…
7
votes
3 answers

how to speed up color-clustering in openCV?

for a project I want to implement a color-clustering algorithm, which replace similar colors with the average color of a cluster. For now, I use the kmeans-algorithm to cluster the whole image . But this take's a long time. Has someone an idea how…
7
votes
4 answers

Machine Learning Algorithm for Completing Sparse Matrix Data

I've seen some machine learning questions on here so I figured I would post a related question: Suppose I have a dataset where athletes participate at running competitions of 10 km and 20 km with hilly courses i.e. every competition has its own…
user1141785
  • 431
  • 7
  • 21
7
votes
3 answers

How should I start with learning math required for AI

I have studied mathematics, but that was long time ago. I have been a programmer for 8 years but when I started to study concepts in AI and data mining I find it very difficult to understand the theory. Now I have wasted 2-3 years and I have got…
Mirage
  • 30,868
  • 62
  • 166
  • 261
7
votes
1 answer

What is evaluation of a cluster in WEKA?

What do we mean when we say that we are evaluating the clusters in WEKA frmework? Clustering is an unsupervised approach to grouping objects. What do we mean when we say we want to evaluate the result? Also, in addition to this, when we say that we…
London guy
  • 27,522
  • 44
  • 121
  • 179
6
votes
2 answers

How do I create a new data table in Orange?

I am using Orange (in Python) for some data mining tasks. More specifically, for clustering. Although I have gone through the tutorial and read most of the documentation, I still have a problem. All the examples in docs and tutorials assume that I…
George Eracleous
  • 4,278
  • 6
  • 41
  • 50
6
votes
2 answers

How is BI related to data mining?

I'm a little confused on how to connect BI with data mining. Can BI be termed as some kind of a manifestation of data mining? How different is a BI tool like Microsoft Analysis Services from a data mining tool like Weka? I guess BI involves more of…
Arnkrishn
  • 29,828
  • 40
  • 114
  • 128
6
votes
2 answers

sequence mining for time and product prediction

I am facing a tricky problem about sequence mining, say I have 10 products, I have millions of records each containing user, product and timestamp of purchase . Each user may have only 1 record or 100 records.. such as : user 1, p1, t1 user 1, p1,…
6
votes
2 answers

Similarity matrix -> feature vectors algorithm?

If we have a set of M words, and know the similarity of the meaning of each pair of words in advance (have a M x M matrix of similarities), which algorithm can we use to make one k-dimensional bit vector for each word, so that each pair of words can…
Ognjen
  • 2,508
  • 2
  • 31
  • 47
6
votes
2 answers

Implementing proximity matrix for clustering

Please I am a little new to this field so pardon me if the question sound trivial or basic. I have a group of dataset(Bag of words to be specific) and I need to generate a proximity matrix by using their edit distance from each other to find and…
damola
  • 282
  • 5
  • 18