Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

votes

5 answers

how to determine the number of topics for LDA?

I am a freshman in LDA and I want to use it in my work. However, some problems appear. In order to get the best performance, I want to estimate the best topic number. After reading "Finding Scientific topics", I know that I can calculate logP(w|z)…

nlp data-mining lda

asked Jul 02 '13 at 09:22

Chelsea Wang

votes

5 answers

Algorithm to find the most common substrings in a string

Is there any algorithm that can be used to find the most common phrases (or substrings) in a string? For example, the following string would have "hello world" as its most common two-word phrase: "hello world this is hello world. hello world repeats…

algorithm substring language-agnostic data-mining

asked Feb 03 '13 at 08:16

Anderson Green

30,230
67
195
328

votes

5 answers

Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

I have a data table ("norm") containing numeric - at least to what I can see - normalized values of the following form: When I am executing k <- kmeans(norm,center=3) I am receving the following error: Error in do_one(nmeth) : NA/NaN/Inf in…

r machine-learning cluster-analysis data-mining k-means

asked Apr 07 '16 at 07:40

Jonathan Rhein

1,616
3
23
47

votes

20 answers

Data Mining open source tools

I'm due to take up a project which is into data mining. Before I jump in I wanted to probe around for different data mining tools (preferably open source) which allows web based reporting. In my scenario the data would be provided to me, so I'm not…

open-source data-mining

asked May 07 '09 at 16:37

Arnkrishn

29,828
40
114
128

votes

5 answers

random unit vector in multi-dimensional space

I'm working on a data mining algorithm where i want to pick a random direction from a particular point in the feature space. If I pick a random number for each of the n dimensions from [-1,1] and then normalize the vector to a length of 1 will I…

random distribution data-mining computational-geometry uniform

asked Jun 08 '11 at 17:53

Matt

1,513
3
16
32

votes

3 answers

Difference between Closed and open Sequential Pattern Mining Algorithms

I want to use some algorithms to mine my log data. I found a pattern mining framework on: http://www.philippe-fournier-viger.com/spmf/index.php?link=algorithms.php I have tried several algorithms, the BIDE+ algorithm performs the best. The BIDE+…

pattern-matching data-mining sequential apriori

asked Apr 22 '13 at 10:57

leon

10,085
19
60
77

votes

6 answers

Fast (< n^2) clustering algorithm

I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be bounding spheres with a specified radius). That means that there probably has…

algorithm machine-learning cluster-analysis data-mining k-means

asked Dec 09 '10 at 23:11

John Hawksley

votes

3 answers

Clustering values by their proximity in python (machine learning?)

I have an algorithm that is running on a set of objects. This algorithm produces a score value that dictates the differences between the elements in the set. The sorted output is something like…

python machine-learning cluster-analysis data-mining

asked Aug 21 '13 at 17:31

PCoelho

7,850
11
31
36

votes

3 answers

Javascript and Scientific Processing?

Matlab, R, and Python are powerful but either costly or slow for some data mining work I'd like to do. I'm considering using Javascript both for speed, good visualization libraries, and to be able to use the browser as an interface. The first…

javascript data-mining scientific-computing

asked Jul 25 '12 at 13:40

MikeB

votes

2 answers

Hierarchical clustering of 1 million objects

Can anyone point me to a hierarchical clustering tool (preferable in python) that can cluster ~1 Million objects? I have tried hcluster and also Orange. hcluster had trouble with 18k objects. Orange was able to cluster 18k objects in seconds, but…

python machine-learning cluster-analysis data-mining hierarchical-clustering

asked Feb 06 '12 at 07:40

Atish Kathpal

votes

6 answers

What is the difference between Big Data and Data Mining?

As Wikpedia states The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use How is this related with Big Data? Is it correct if I say that Hadoop…

hadoop machine-learning bigdata data-mining data-science

asked Mar 15 '14 at 05:25

DesirePRG

6,122
15
69
114

votes

7 answers

Finding 2 & 3 word Phrases Using R TM Package

I am trying to find a code that actually works to find the most frequently used two and three word phrases in R text mining package (maybe there is another package for it that I do not know). I have been trying to use the tokenizer, but seem to have…

r data-mining text-mining

asked Jan 17 '12 at 16:53

appletree

votes

3 answers

Using frequent itemset mining to build association rules?

I am new to this area as well as the terminology so please feel free to suggest if I go wrong somewhere. I have two datasets like this: Dataset 1: A B C 0 E A 0 C 0 0 A 0 C D E A 0 C 0 E The way I interpret this is at some point in time, (A,B,C,E)…

python machine-learning data-mining

asked Aug 13 '11 at 00:01

Legend

113,822
119
272
400

votes

4 answers

Information retrieval (IR) vs data mining vs Machine Learning (ML)

People often throw around the terms IR, ML, and data mining, but I have noticed a lot of overlap between them. From people with experience in these fields, what exactly draws the line between these?

machine-learning data-mining information-retrieval

asked Aug 05 '10 at 18:04

Boris Yeltz

2,341
5
21
20

votes

2 answers

What is the difference between a Confusion Matrix and Contingency Table?

I'm writting a piece of code to evaluate my Clustering Algorithm and I find that every kind of evaluation method needs the basic data from a m*n matrix like A = {aij} where aij is the number of data points that are members of class ci and elements…

matrix cluster-analysis data-mining difference

asked Sep 30 '11 at 15:56

MangMang

Prev 1 2

…

99 100 Next