Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

vote

1 answer

How to extract useful features from a graph?

Things are like this: I have some graphs like the pictures above and I am trying to classify them to different kinds so the shape of a character can be recognized, and here is what I've done: I apply a 2-D FFT to the graphs, so I can get the…

matlab computer-vision signal-processing fft data-mining

asked Apr 08 '13 at 14:50

Bob Fang

6,963
10
39
72

vote

5 answers

how to get the similar texts from a lot of pages?

get the x most similar texts from a lot of texts to one text. maybe change the page to text is better. You should not compare the text to every text, because its too slow.

algorithm data-mining text-processing similarity

asked Oct 19 '09 at 04:51

Bruce Dou

4,673
10
37
56

vote

2 answers

What does `locality-sensitive` stands for in locality-sensitive hashing?

What does locality-sensitive stands for in locality-sensitive hashing ? Is there formal definition of this term ?

data-mining discrete-mathematics

asked Apr 05 '13 at 16:47

Qbik

5,885
14
62
93

vote

2 answers

Cluster text documents in database

I do have 20.000 text files loaded in PostgreSQL database, one file in one row, all stored in table named docs with columns doc_id and doc_content. I know that there is approximately 8 types of documents. Here are my questions: How can I find these…

postgresql data-mining text-mining document-classification

asked Apr 04 '13 at 08:02

Tomas Greif

21,685
23
106
155

vote

6 answers

How do I data mine text?

Here's the problem. I have a bunch of large text files with paragraphs and paragraphs of written matter. Each para contains references to a few people (names), and documents a few topics (places, objects). How do I data mine this pile to assemble…

sorting text data-mining

asked Oct 15 '09 at 21:04

Robin Rodricks

110,798
141
398
607

vote

2 answers

Input arff file for Weka Apriori

I am trying to do association mining on version history. I have my transaction data in mysql. Weka apriori algorithm requires arff or csv file in a certain format. It has to have columns for each item. The values will be specified as TRUE or FALSE…

data-mining weka

asked Mar 28 '13 at 20:57

user1239080

vote

1 answer

How can I install "DataMining Adding for Office 2007" as part of my setup?

I'm writting a setup program that needs to install the DataMining Adding for Office 2007. 1) How do I detect if it's already installed? 2) If it is not installed, I download and run the MSI (SQLServer2008_DMAddin.msi). But how can I run the Server…

sql-server excel data-mining

asked Oct 13 '09 at 14:44

Nestor

13,706
11
78
119

vote

1 answer

Extract data from a cube's dimension created from a View

We have imported an SQL View table into a dimension. We already programmed a connector that talks with data cubes (MDX queries). That said, the view we originally imported contains all the raw data we need to query. Problem is, the MDX client…

tsql ssas mdx data-mining cube-dimension

asked Mar 14 '13 at 20:20

Tommy Dubé-Leblanc

vote

1 answer

ELKI COPAC implementation

I tried to run COPAC ELKI implementation on the example dataset provided on the official site (mouse.csv) but I get a NullPointerException which leads me to think that there is some detail that I omit (shame on me). The exception is the…

algorithm data-mining cluster-analysis elki

asked Mar 12 '13 at 14:22

Gibbster

vote

1 answer

Extract x-axis value using y-axis data in R

I have a time-series dataset in this format: Time Val1 Val2 0 0.68 0.39 30 0.08 0.14 35 0.12 0.07 40 0.17 0.28 45 0.35 0.31 50 0.14 0.45 100 1.01 1.31 105 0.40 1.20 110 2.02 0.57 115 1.51 0.58 130…

time-series data-mining missing-data interpolation

asked Mar 08 '13 at 00:24

Khader Shameer

vote

1 answer

How can I globally visiting a huge dict in each mapper of Hadoop map-reduce program?

I'm doing a co-occur analysis on huge web logs. I have computed the occur times for each item, and the co-occur times for each pair of using hadoop. Now, I want to compute some correlation measure for a pair , such as…

hadoop data-mining

asked Mar 07 '13 at 15:32

rudaoshi

vote

1 answer

How to calculate Confidence from Support in java

Right now I am working on a program that takes a list of users who have rated movies and calculates the support for all movies. I give my program a maximum number of movies I want to calculate, a support minimum, and a confidence minimum. Currently…

java data-mining apriori

asked Mar 07 '13 at 02:02

Michael Staudt

vote

0 answers

A-close data mining implementation

I need to compare the Apriori and the A-close algorithm on a dataset so I need the implementations of both algorithms. I can find implementions of the Apriori algorithm but I can't find implementations of the A-close algorithm. It's saves me lots of…

data-mining implementation apriori

asked Mar 07 '13 at 00:24

Bart Koopmans

vote

4 answers

Python : DIY generalize this "all_subsets" function to any size subsets

Implementing a toy Apriori algorithm for a small-data association rule mine, I have a need for a function to return all subsets. The length of the subsets is given by parameter i. I need to generalize this function for any i. The cases for i 1 or 2…

python algorithm data-mining nested-loops apriori

asked Mar 02 '13 at 13:06

Cris Stringfellow

3,714
26
48

vote

1 answer

Data-mining algorithm for dynamically consolidating recurring substrings?

I am trying to construct an artificial intelligence unit. I plan to do this by first collecting sensory input ('observations') into a short-term working-memory list, continually forming patterns found in this list ('ideas'), and committing those…

string artificial-intelligence substring data-mining longest-substring

asked Feb 27 '13 at 04:34

Bondolin

2,793
7
34
62

Prev 1 2 3

…

99 100 Next