Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

votes

4 answers

Which datamining tool to use?

Can somebody explain me the main pros and cons of the most known datamining open-source tools? Everywhere I read that RapidMiner, Weka, Orange, KNIME are the best ones. look at this blog post Can somebody do a fast technical comparison in a small…

comparison weka data-mining rapidminer

asked Jul 25 '16 at 09:58

user2670818

votes

2 answers

Efficient algorithm to group points in clusters by distance between every two points

I am looking for an efficient algorithm for the following problem: Given a set of points in 2D space, where each point is defined by its X and Y coordinates. Required to split this set of points into a set of clusters so that if distance between two…

algorithm machine-learning cluster-analysis data-mining

asked Sep 06 '15 at 21:34

ovk

2,318
1
23
30

votes

5 answers

'Similarity' in Data Mining

In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? If yes, what does it deal with. Any examples, links, references will be helpful. Also, being new to the field, I would like the community opinion on how closely…

artificial-intelligence data-mining similarity

asked May 22 '10 at 09:16

Shailesh Tainwala

6,299
12
58
69

votes

1 answer

What FFT descriptors should be used as feature to implement classification or clustering algorithm?

I have some geographical trajectories sampled to analyze, and I calculated the histogram of data in spatial and temporal dimension, which yielded a time domain based feature for each spatial element. I want to perform a discrete FFT to transform the…

machine-learning fft data-mining similarity feature-extraction

asked Dec 18 '14 at 12:19

LittleLittleQ

votes

3 answers

How to plot/visualize a C50 decision tree in R?

I am using the C50 decision tree algorithm. I am able to build the tree and get the summaries, but cannot figure out how to plot or viz the tree. My C50 model is called credit_model In other decision tree packages, I usually use something like…

r plot visualization data-mining decision-tree

asked Jan 22 '14 at 03:16

mpg

3,679
8
36
45

votes

3 answers

Historical weather data from NOAA

I am working on a data mining project and I would like to gather historical weather data. I am able to get historical data through the web interface that they provide at http://www.ncdc.noaa.gov/cdo-web/search. But I would like to access this data…

web-scraping data-mining weather-api

asked Nov 14 '13 at 13:19

azrosen92

8,357
4
26
45

votes

2 answers

TFIDF calculating confusion

I found the following code on the internet for calculating TFIDF: https://github.com/timtrueman/tf-idf/blob/master/tf-idf.py I added "1+" in the function def idf(word, documentList) so i won't get divided by 0 error: return…

python data-mining text-processing information-retrieval tf-idf

asked May 20 '13 at 11:33

badc0re

3,333
6
30
46

votes

5 answers

Similarity distance measures

Vectors like this v1 = {0 0 0 1 1 0 0 1 0 1 1} v2 = {0 1 1 1 1 1 0 1 0 1 0} v3 = {0 0 0 0 0 0 0 0 0 0 1} Need to calculate similarity between them. Hamming distance between v1 and v2 is 4 and between v1 and v3 is also 4. But because I am…

vector data-mining similarity hamming-distance

asked May 11 '13 at 11:29

user1306283

votes

4 answers

Splitting data into training/testing datasets in MATLAB?

Upon some research I found two functions in MATLAB to do the task: cvpartition function in the Statistics Toolbox crossvalind function in the Bioinformatics Toolbox Now I've used the cvpartition to create n-fold cross validation subsets before,…

matlab data-mining

asked Sep 03 '09 at 07:05

Amro

123,847
25
243
454

votes

1 answer

How to perform collaborative filtering in R

I'm have matrix data containing some null values. To fill the null values, I'd like to perform collaborative filtering. As I am studying for R, rather I'd like to use R. So, Does anyone know how to perform collaborative filtering in R?

r data-mining collaborative-filtering

asked May 26 '12 at 14:25

Chappy 003

votes

5 answers

When are n-grams (n>3) important as opposed to just bigrams or trigrams?

I am just wondering what is the use of n-grams (n>3) (and their occurrence frequency) considering the computational overhead in computing them. Are there any applications where bigrams or trigrams are simply not enough? If so, what is the…

nlp data-mining nltk n-gram

asked Apr 23 '12 at 18:20

Legend

113,822
119
272
400

votes

1 answer

OpenNLP Name Finder

I am using the NameFinder API example doc of OpenNLP. After initializing the Name Finder the documentation uses the following code for the input text: for (String document[][] : documents) { for (String[] sentence : document) { Span…

apache nlp data-mining opennlp

asked Apr 16 '12 at 19:33

Chris

18,075
15
59
77

votes

2 answers

Combining different similarities to build one final similarity

Im pretty much new to data mining and recommendation systems, now trying to build some kind of rec system for users that have such parameters: city education interest To calculate similarity between them im gonna apply cosine similarity and…

cluster-analysis data-mining distance similarity

asked Nov 20 '11 at 13:09

Leg0

votes

3 answers

What are the differences between Dynamic Time Warping and Needleman-Wunsch algorithm?

I am looking for the differences between Dynamic Time Warping and Needleman-Wunsch algorithm. Basically, they both find an alignment score. I need to calculate alignment (similarity) score between short sequence of strings (<20 characters) and…

time-series alignment bioinformatics data-mining

asked Aug 04 '11 at 23:39

iinception

1,945
2
21
19

votes

5 answers

Machine learning library for .net analog of Apache Mahout

Are there libraries for .net like Mahout. What you can recommend for machine learning?

c# java machine-learning data-mining

asked Jun 25 '11 at 09:21

John

Prev 1 2 3

…

99 100 Next