Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

votes

4 answers

Maximal vs. Closed Patterns in Association Rule Mining

In frequent itemset generation of association rule mining, what is the fundamental difference between maximal & closed patterns itemsets. Can someone guide me a resource about them?

machine-learning data-mining

asked Sep 05 '15 at 14:41

Michael

votes

2 answers

normalization methods for stream data

I am using Clustream algorithm and I have figured out that I need to normalize my data. I decided to use min-max algorithm to do this, but I think in this way the values of new coming data objects will be calculated differently as the values of min…

stream machine-learning data-mining normalization

asked Jul 24 '15 at 15:03

T.Sh

votes

6 answers

what is the difference between Association rule mining & frequent itemset mining

i am new to data mining and confuse about Association rules and frequent item mining. for me i think both are same but i need views from experts on this forum My question is what is the difference between Association rule mining & frequent itemset…

data-mining

asked Jun 16 '10 at 05:20

Zia

votes

3 answers

clustering very large dataset in R

I have a dataset consisting of 70,000 numeric values representing distances ranging from 0 till 50, and I want to cluster these numbers; however, if I'm trying the classical clustering approach, then I would have to establish a 70,000X70,000…

r machine-learning bigdata cluster-analysis data-mining

asked Feb 24 '14 at 10:24

DOSMarter

1,485
5
21
29

votes

4 answers

Creating a comparable and flexible fingerprint of an object

My situation Say I have thousands of objects, which in this example could be movies. I parse these movies in a lot of different ways, collecting parameters, keywords and statistics about each of them. Let's call them keys. I also assign a weight to…

c# sql algorithm data-mining bigdata

asked Feb 07 '14 at 08:42

Magnus Engdal

5,446
3
31
50

votes

4 answers

Using adaboost within R's caret package

I've been using the ada R package for a while, and more recently, caret. According to the documentation, caret's train() function should have an option that uses ada. But, caret is puking at me when I use the same syntax that sits within my ada()…

r machine-learning data-mining classification adaboost

asked Oct 11 '13 at 17:04

Bryan

5,999
9
29
50

votes

2 answers

How to use Weka for predicting results

I'm new to Weka and I'm confused with the tool. I have a data set about fruit prices and related attributes. I'm trying to predict the specific fruit price using the data set. Since I'm new to Weka, I couldn't figure out how to do this task. Please…

dataset data-mining classification weka prediction

asked Nov 17 '12 at 17:11

Prabodha Dissanayake

votes

1 answer

In scikit learn, how to deal with the data mixed with numerical and nominal value?

I know that the computation in scikit-learn is based on NumPy so everything is a matrix or array. How does this package handle mixed data (numerical and nominal values)? For example, a product could have the attribute 'color' and 'price', where…

python machine-learning scikit-learn data-mining mixed

asked Jul 27 '12 at 15:26

xueliang liu

votes

7 answers

What is data mining from a developer's perspective?

I can find the technical explanation of what data mining is in a book or on Wikipedia, but I'm wondering what sort of development does it exactly involve? Is it more about using tools or more about writing tools? Is it really any much different from…

data-mining

asked Jul 14 '09 at 08:00

aberrant80

12,815
8
45
68

votes

4 answers

Algorithm to handle data aggregation from multiple error-prone sources

I'm aggregating concert listings from several different sources, none of which are both complete and accurate. Some of the data comes from users (such as on last.fm), and may be incorrect. Other data sources are highly accurate, but may not contain…

algorithm data-mining

asked May 25 '11 at 03:14

Matt Green

2,032
2
22
36

votes

2 answers

What is stratified bootstrap?

I have learned bootstrap and stratification. But what is stratified bootstrap? And how does it work? Let's say we have a dataset of n instances (observations), and m is the number of classes. How should I divide the dataset, and what's the…

algorithm machine-learning data-mining

asked Feb 10 '16 at 23:44

Kevin217

votes

2 answers

Removing "almost duplicate" strings in subquadratic time

I'm trying to do machine learning on a real-life dataset (hotel reviews). Unfortunately, it's plagued by spam, which comes in the form of almost identical reviews, complicating matters for me greatly. I would like to remove "almost duplicates" from…

algorithm data-mining

asked Jan 10 '14 at 15:16

Alexei Averchenko

1,706
1
16
29

votes

2 answers

Estimating/Choosing optimal Hyperparameters for DBSCAN

I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means clustering but of less help, it didn't work well, there was a lot of…

data-mining cluster-analysis dbscan

asked Feb 24 '13 at 09:29

Riyaz

1,430
2
17
27

votes

1 answer

No. of hidden layers, units in hidden layers and epochs till Neural Network starts behaving acceptable on Training data

I am trying to solve this Kaggle Problem using Neural Networks. I am using Pybrain Python Library. It's a classical supervised Learning Problem. In following code: 'data' variable is numpy array(892*8). 7 fields are my features and 1 field is my…

machine-learning artificial-intelligence neural-network data-mining pybrain

asked Oct 08 '12 at 05:41

Jack Smith

votes

2 answers

How can i cluster document using k-means (Flann with python)?

I want to cluster documents based on similarity. I haved tried ssdeep (similarity hashing), very fast but i was told that k-means is faster and flann is fastest of all implementations, and more accurate so i am trying flann with python bindings but…

nlp cluster-analysis data-mining k-means text-mining

asked Sep 19 '12 at 14:51

Phyo Arkar Lwin

6,673
12
41
55

Prev 1 2 3

…

99 100 Next