Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

votes

3 answers

What techniques/tools are there for discovering common phrases in chunks of text?

Lets say I have 100000 email bodies and 2000 of them contains an abitrary common string like "the quick brown fox jumps over the lazy dog" or "lorem ipsum dolor sit amet". What techniques could/should I use to "mine" these phrases? I'm not…

.net data-mining

asked Sep 15 '09 at 10:42

JohannesH

6,430
5
37
71

votes

7 answers

Application of machine learning

I've seen some machine learning questions on here so I figured I would post a related question: Suppose I have a randomly generated food list which includes an entree, dessert, and a drink. An example would be Chicken, cheesecake, orange juice. The…

machine-learning data-mining

asked Nov 19 '12 at 21:27

ono

2,984
9
43
85

votes

2 answers

Handling missing attributes in Naive Bayes classifier

I am writing a Naive Bayes classifier for performing indoor room localization from WiFi signal strength. So far it is working well, but I have some questions about missing features. This occurs frequently because I use WiFi signals, and WiFi access…

java machine-learning data-mining bayesian classification

asked Nov 19 '12 at 18:33

stackoverflowuser2010

38,621
48
169
217

votes

3 answers

Computing F-measure for clustering

Can anyone help me to calculate F-measure collectively ? I know how to calculate recall and precision, but don't know for a given algorithm how to calculate one F-measure value. As an exemple, suppose my algorithm creates m clusters, but I know…

cluster-analysis data-mining precision-recall

asked Oct 04 '12 at 10:27

mahesh cs

votes

4 answers

Finding the center of a cluster

I have the following problem - made abstract to bring out the key issues. I have 10 points each which is some distance from the other. I want to be able to find the center of the cluster i.e. the point for which the pairwise distance to each other…

algorithm cluster-analysis data-mining

asked Aug 10 '09 at 08:52

Ankur

50,282
110
242
312

votes

3 answers

What is the meaning of jitter in visualize tab of weka

In weka I load an arff file. I can view the relationship between attributes using the visualize tab. However I can't understand the meaning of the jitter slider. What is its purpose?

java data-mining weka arff

asked Aug 09 '09 at 16:52

Xolve

22,298
21
77
125

votes

2 answers

Iregular plot of k-means clustering, outlier removal

Hi I'm working on trying to cluster network data from the 1999 darpa data set. Unfortunately I'm not really getting clustered data, not compared to some of the literature, using the same techniques and methods. My data comes out like this: As you…

matlab plot data-mining cluster-analysis k-means

asked Jul 07 '12 at 07:32

G Gr

6,030
20
91
184

votes

3 answers

Formula for "Relative absolute error" and "Root relative squared error" used in machine learning (as computed by Weka)

In open source data mining software Weka (written in Java), when I run some data mining algorithm like Linear regression Weka returns model and some model evaluating metrics for test data. It looks like this: Correlation coefficient …

machine-learning data-mining weka

asked May 27 '12 at 19:35

Rasto

17,204
47
154
245

votes

2 answers

What is the Time and Space complexity of FP-Growth algorithm?

How do we calculate the Time complexity and Space complexity of FP_growth algorithm in Data Mining??

algorithm complexity-theory data-mining apriori

asked Mar 26 '12 at 09:44

Kalyan Manda

votes

2 answers

Association mining with large number of small datasets

I have a large number (100-150) of small (approx 1 kbyte) datasets. We will call these the 'good' datasets. I also have a similar number of 'bad' datasets. Now I'm looking for software (or perhaps algorithm(s)) to find rules for what constitutes a…

algorithm machine-learning data-mining

asked Mar 04 '12 at 13:05

Paul Lovell

votes

3 answers

Weka GUI - Not enough memory, won't load?

This same installation of Weka has loaded for me in the past. I am simply trying to load the Weka GUI (double click on the icon) and I get the following error. How can I fix it? OutOfMemory Not enough memory. Please load a smaller dataset or use…

machine-learning data-mining weka

asked Feb 06 '12 at 17:48

Jim

4,509
16
50
80

votes

3 answers

Recommendation algorithm (and implementation) for finding similar items and users

I have a database of about 700k users along with items they have watched/listened to/read/bought/etc. I would like to build a recommendation engine that recommends new items based on what users with similar taste in things have enjoyed, as well as…

algorithm theory data-mining recommendation-engine collaborative-filtering

asked Jan 19 '12 at 19:25

vomitcuddle

votes

1 answer

Calculating a user's importance or 'Betweenness Centrality' from a user's followers?

I want to know how I can find interesting relationships between users accounts such as the most connected, or most valuable users based on their connections to others. Below I have the two tables I use. One has all the users, the other has the keys…

php data-mining rdbms graph-databases

asked Jan 14 '12 at 02:55

Xeoncross

55,620
80
262
364

votes

2 answers

Data Mining situation

Suppose I have the data as mentioned below. 11AM user1 Brush 11:05AM user1 Prep Brakfast 11:10AM user1 eat Breakfast 11:15AM user1 Take bath 11:30AM user1 Leave for office 12PM user2 Brush 12:05PM user2 Prep Brakfast 12:10PM user2 eat…

data-mining text-mining

asked Sep 30 '11 at 17:23

user722856

votes

2 answers

Adding CURE clustering algorithm to WEKA

I have written a java program to perform CURE clustering. I wish to add this program to weka as a clustering algorithm and visualize the clustering. Has anyone already implemented it on weka?Any links to that would be very much helpful. How do I…

java cluster-analysis weka data-mining

asked May 15 '11 at 18:09

JSpider

Prev 1 2 3

…

99 100 Next