Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

vote

0 answers

predictive attributes in WEKA

I am trying to select the best attributes for my training data set which contains numeric values/attributes. which attribute evaluator/method would yield the best results for about 10 or so attributes? Training dataset is about 1400 lines of…

data-mining weka

asked Feb 05 '14 at 00:57

user3015045

vote

1 answer

R-convert transaction format dataset to basket format for sequence mining

ORIGINAL TABLE CELL NUMBER ----------ACTIVITY--------TIME
001................................call a................12.23
002................................call b................01.00
002................................call…

r data-mining arules

asked Feb 01 '14 at 08:24

steven

vote

3 answers

Performance of Frequent Itemset mining

I have implemented apriori algorithm for mining frequent itemset its working fine for sample data but when i have tried to execute it for retail dataset available at http://fimi.ua.ac.be/data/retail.dat which is around 3mb data with 88k transaction…

algorithm data-mining distributed-system apriori

asked Jan 21 '14 at 15:56

user1276005

vote

1 answer

scikit-learn interpretation of integer variables

I'm just started to use scikit-learn after years of datamining with SAS/SPSS products. I'm amazed by the capability of scikit-learn and pandas however there is one thing I can't figure out by myself. Let us assume that my training data is build up…

data-mining scikit-learn decision-tree

asked Jan 20 '14 at 16:10

dealah

vote

2 answers

why training and testing file same in svmlight

I Downloaded the SVM-Light for linux OS. run the Commands .It produce 2 executable svm_learn and svm_classify. Using this i tried to execte a example file(It contain a train.dat,test.dat files) with following code ./svm_learn example1/train.dat…

linux machine-learning data-mining svm svmlight

asked Jan 20 '14 at 05:52

user39133

vote

2 answers

Clustering huge data matrix in python?

I want to cluster 1,5 million of chemical compounds. This means having 1.5 x 1.5 Million distance matrix... I think I can generate such a big table using pyTables but now - having such a table how will I cluster it? I guess I can't just pass…

scikit-learn bigdata cluster-analysis data-mining pytables

asked Jan 15 '14 at 12:06

mnowotka

16,430
18
88
134

vote

4 answers

Implementation of k-means clustering algorithm

In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop. can anyone please guide me where…

java algorithm data-mining cluster-analysis k-means

asked Jan 14 '14 at 10:23

chinu

vote

1 answer

How to compute a knee in k-distance plot?

I want to implement some kind of improvement of DBSCAN algorithm, where user do not need to enter input parameters (minPts and Eps). My idea is to use the K-distances plot, but what is the best method to compute the 'knee' of this plot? How to count…

cluster-analysis data-mining dbscan

asked Jan 09 '14 at 19:54

user3146344

vote

1 answer

Finding data patterns in sequential Postgresql rows

I'd like to ask Postgres how often two occurrences of an event, one occurrence per row, are seen. For example, if I have user events like: User 1: Clicked button 1, redirected to page 2 User 1: Clicked button 2, redirected to page 3 User 1: Clicked…

sql database postgresql data-mining

asked Jan 08 '14 at 06:28

Carson

17,073
19
66
87

vote

1 answer

Aggregating overlapping "all-previous-events" features from time series data - in Python

My problem is pretty general and can probably be solved in many ways. But what is a smart way considering time and memory? I have time series data of user interactions of the following form: cookie_id interaction --------- ----------- 1234 …

python pandas time-series data-mining data-extraction

asked Jan 08 '14 at 03:17

elgehelge

2,014
1
19
24

vote

2 answers

Pattern mining for item sets of length 2

I am looking for association mining algorithm where I can mine frequent item sets of length 2 only. Is it better to use a query on database to compute frequent items when stopping at 2-itemsets.

data-mining apriori

asked Jan 06 '14 at 22:06

user1239080

vote

1 answer

SQL To Find Word Pairs/Clusters Between Columns

I have a SQL Server 2012 database with a table that contains questions and answers. Simplified structure is like this: question_id int question varchar(500) answer varchar(50) I'd like to find word pairs or clusters between the question and…

sql sql-server text data-mining

asked Dec 30 '13 at 01:10

user3144970

vote

4 answers

Intelligent Database - Capable of identifying out of the ordinary values

I am looking for a tool or system to take a look at the database and identify values that are out of the ordinary. I don't need anything to do real time checks, just a system which does processing overnight or at scheduled points. I am looking for a…

sql mysql sql-server database data-mining

asked Jan 17 '10 at 11:56

Pasta

2,491
5
24
33

vote

2 answers

DBSCAN in hadoop

Actually I don't know what should be the key and value for map() and what should be the input format and output format. If I read one point at a time by map() then how the neighbors can be computed using one point because remaining points are not…

hadoop mapreduce data-mining cluster-analysis dbscan

asked Dec 16 '13 at 12:12

Girjesh

vote

1 answer

Clustering Data in a 3D matrix with another matrix

I Have got 2 Data cubes represented as 3D matrices. Both of them will be of same dimensions. we have to do rule based ordering. our condition is that if any sub cube of both of them ( sub cube must match exactly in location and orientation) matches…

algorithm matrix 3d data-mining cluster-analysis

asked Dec 14 '13 at 05:11

infinitum

Prev 1 2 3

…

99 100 Next