Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

vote

2 answers

Understanding Partition density of partitioned network

i am implementing Link Communities community detection algorithm. I have trouble understanding explanation of partition density described in the paper Here is only the part defining partition density: I cannot find the connection between…

data-mining cluster-analysis hierarchical-clustering

asked Feb 25 '13 at 20:54

hendrix

3,364
8
31
46

vote

0 answers

Declarative Data Mining: Frequent Itemset Tiling

For a course in my Computer Science studies, I have to come up with a set of constraints and a score-definition to find a tiling for frequent itemset mining. The matrix with the data consists of ones and zeroes. My task is to come up with a set of…

data-mining declarative constraint-programming

asked Feb 19 '13 at 17:12

Aäron Verachtert

vote

1 answer

Do we need to normalize input segment of training set only?

I want to know that data normalization that is required whether it must be applied to whole part of training set both input and output or input segment is enough.

neural-network normalization data-mining backpropagation

asked Feb 15 '13 at 07:36

saeed sheikholeslami

vote

2 answers

Data mining: Apriori issue. Min-support

I wrote data mining apriori algorithm, it works well on small test data but I am having issue to run it on bigger data sets. I am trying to generate rules of items which were bought together frequently. My small test data is 5 transactions and 10…

algorithm data-mining apriori

asked Feb 13 '13 at 19:51

John Latham

vote

3 answers

alternative similarity measure in DBSCAN?

I test my image set on DBSCAN algorithm in scikit-learn python module . There are alternatives for similarity computing: # Compute similarities D = distance.squareform(distance.pdist(X)) S = 1 - (D / np.max(D)) A weighted measure or something like…

python scikit-learn cluster-analysis data-mining dbscan

asked Feb 13 '13 at 13:03

postgres

2,242
5
34
50

vote

0 answers

What's a good way of storing R models for future scoring

Let's say I run random forest or kmeans. I get an R object. Now I want to save that model for future use. I thought PMML was a good format but then realized that R can't read PMML and turn it back into an object that can be used for scoring. It can…

r data-mining

asked Jan 30 '13 at 02:01

user1827975

vote

1 answer

How much mxRealloc can affect a C-Mex matlab code?

For these days I was working on C-mex code in order to improve speed in DBSCAN matlab code. In fact, at the moment I finished a DBSCAN on C-mex. But instead, it takes more time (14.64 seconds in matlab, 53.39 seconds in C-Mex) with my test data…

c matlab data-mining cluster-analysis dbscan

asked Jan 29 '13 at 20:48

mrDataos

vote

3 answers

Writing a large number of queries to a text file

I have a list of about 200,000 entities, and I need to query a specific RESTful API for each of those entities, and end up with all the 200,000 entities saved in JSON format in txt files. The naive way of doing it is going through the list of the…

python rest data-mining

asked Jan 14 '13 at 05:12

leonsas

4,718
6
43
70

vote

1 answer

Decision Tree - Sparse dataset

I have very sparse dataset with huge number of attributes (~12 K features and 700K records) I can not fit it in memory (attribute values are binomial i.e. True/False) , As it is sparse I keep the dataset in (ID , Feature) format, so for example I…

machine-learning data-mining decision-tree

asked Jan 05 '13 at 21:30

Arian

7,397
21
89
177

vote

1 answer

How to make weka treat empty strings as 0

I'm using weka for clustering binary data. Note that I use weka directly through the API or the source code. My data input is a huge .csv file for example attrib1, attrib2, atrib3 0,1,0 1,0,1 0,0,1 But in order to reduce the .csv size the data…

null boolean data-mining cluster-analysis weka

asked Jan 05 '13 at 18:13

Flo

1,367
1
13
27

vote

1 answer

Changing feature value type in RapidMiner

I have a dataset with many attributes (2k) which a few of them (about 10) are not binary and the rest are binary (0,1) , I want to change the value types of these binary attributes from integer to binomial , as the name of features are not fixed I…

machine-learning data-mining rapidminer

asked Dec 29 '12 at 22:08

Arian

7,397
21
89
177

vote

1 answer

How can I use the rule-based learning algorithms for this example

I have data as follows in order to do a predictive learning as to what feature do people find attractive in a model when purchasing clothes online. So I have data as follows. COLORofCLOTHING MODELHAIR_COLOR MODEL_BUILD SELLER_CATEGORY Red …

machine-learning data-mining weka

asked Dec 21 '12 at 03:39

ExceptionHandler

vote

1 answer

Can SAS Enterprise Miner - Cluster node - take coordinate matrix as input?

I am using SAS proc distance to create a distance matrix. I wanted to know if SAS EM cluster node can use this matrix to create perform K mean clustering?

machine-learning sas cluster-analysis data-mining k-means

asked Dec 09 '12 at 18:44

user1889930

vote

1 answer

Create Edge List From Ragged Data Frame in R (for network analysis)

I have a ragged data frame with each row as an occurrence in time of one or more entities, like so: (time1) entitya entityf entityz (time2) entityg entityh (time3) entityo entityp entityk entityL (time4) entityM I want to create an edge list for…

r social-networking data-mining

asked Dec 08 '12 at 21:30

Olga Mu

vote

2 answers

Fast and scalable similarity detection

I have large postgresql database, containing documents. Every document represented as a row in the table. When new document added to the database I need to check for duplicates. But I can't just use select to find exact match. Two documents can vary…

data-mining inverted-index minhash

asked Dec 04 '12 at 11:13

Evgeny Lazin

9,193
6
47
83

Prev 1 2 3

…

100 Next