Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

votes

5 answers

Simplest feature selection algorithm

I am trying to create my own and simple feature selection algorithm. The data set that I am going to work with is here (very famous data set). Can someone give me a pointer on how to do so? I am planning to write a feature rank algorithm for a text…

algorithm machine-learning data-mining semantic-analysis

asked Mar 07 '11 at 17:10

aherlambang

14,290
50
150
253

votes

1 answer

Is DLIB a good open source library for developing my own machine learning algorithms in C++?

Is DLIB a good open source library for developing my own machine learning algorithms in C++? How about other ones, such as libSVM, SHOGUN?

c++ machine-learning data-mining dlib

asked Jan 22 '11 at 19:30

user297850

7,705
17
54
76

votes

3 answers

How to get topic associated with each document using pyspark(2.1.0) LdA?

I am using LDAModel of pyspark to get topics from corpus. My goal is to find topics associated with each document. For that purpose I tried to set topicDistributionCol as per Docs. Since I am new to this, I am not sure what is the purpose of this…

pyspark data-mining lda topic-modeling data-processing

asked Jan 31 '17 at 13:09

Hiren patel

votes

1 answer

How to draw a small graph with community structure in networkx

The graph has around 100 nodes, and the number of communities ranges from 5 to 20. Is there any way to draw the graph such that the nodes of the same community are close to each other? I've tried to assign different communities different colors,…

python data-mining networkx graph-visualization

asked Dec 02 '16 at 21:41

user3813057

votes

4 answers

how to choose initial centroids for k-means clustering

I am working on implementing k-means clustering in Python. What is the good way to choose initial centroids for a data set? For instance: I have following data set: A,1,1 B,2,1 C,4,4 D,4,5 I need to create two different clusters. How do i start…

python cluster-analysis data-mining k-means centroid

asked Mar 12 '16 at 00:15

Clint Whaley

votes

2 answers

DBSCAN for clustering data by location and density

I'm using the method dbscan::dbscan in order to cluster my data by location and density. My data looks like this: str(data) 'data.frame': 4872 obs. of 3 variables: $ price : num ... $ lat : num ... $ lng : num ... Now I'm using…

r machine-learning cluster-analysis data-mining dbscan

asked Jan 25 '16 at 11:54

Paul

1,325
2
19
41

votes

2 answers

Anything better than ruby alchemy for extracting keywords?

I've currently written an algorithm in Ruby based on the arc90 readability code to extract an article from a web page. Now that I have the article, I want to extract keywords and specific information from it (names, author, etc) I heard Alchemy was…

ruby rubygems data-mining extract keyword

asked Aug 09 '10 at 19:39

dpigera

3,339
5
39
60

votes

10 answers

Hadoop beginners

I'm trying to practice some data mining algorithms using hadoop. Can I do this with HDFS alone, or do I need to use the sub-projects like hive/hbase/pig?

hadoop data-mining

asked Jul 19 '10 at 00:18

realnumber

2,124
5
25
33

votes

3 answers

Is there a stop word list for twitter?

I want to do some mining on tweets. Is there any more specific stop word list for tweets such as removing "lol" and other twitter smiley?

twitter nlp data-mining

asked Apr 30 '15 at 03:28

陈家泽

votes

1 answer

Implementing Naïve Bayes algorithm in Java - Need some guidance

As a School assignment i'm required to implement Naïve Bayes algorithm which i am intending to do in Java. In trying to understand how its done, i've read the book "Data Mining - Practical Machine Learning Tools and Techniques" which has a section…

java algorithm data-mining

asked May 22 '10 at 12:49

ke3pup

1,835
4
36
66

votes

1 answer

R: unclear behaviour of tuneRF function (randomForest package)

I feel uncomfortable with the meaning of the stepFactor parameter of the tuneRF function which is used for tuning the mtry parameter used further in the randomForest function. The documentation of tuneRF says that stepFactor is a magnitude by which…

r optimization machine-learning data-mining random-forest

asked Nov 30 '14 at 09:19

Anna Monika

votes

2 answers

Speed-efficient classification in Matlab

I have an image of size as RGB uint8(576,720,3) where I want to classify each pixel to a set of colors. I have transformed using rgb2lab from RGB to LAB space, and then removed the L layer so it is now a double(576,720,2) consisting of AB. Now, I…

performance matlab machine-learning classification data-mining

asked Nov 18 '14 at 12:25

casparjespersen

3,460
5
38
63

votes

4 answers

Sentiment Analysis java Library

I have some unlabeled microblogging posts and I want to create a sentiment analysis module. To do this I have try Stanford library and Alchemy Api web service but the result it is not very good. For now I don't want training my classifier. So I…

java machine-learning data-mining text-mining sentiment-analysis

asked Nov 15 '14 at 18:32

Jimmysnn

votes

5 answers

Which data mining algorithm would you suggest for this particular scenario?

This is not a directly programming related question, but it's about selecting the right data mining algorithm. I want to infer the age of people from their first names, from the region they live, and if they have an internet product or not. The…

algorithm data-mining

asked Mar 01 '10 at 16:39

ercan

1,639
1
20
34

votes

7 answers

How whether a string is randomly generated or plausibly an English word?

I have a corpus of text which contains some strings. In these strings, some are English words, some are random such as VmsVKmGMY6eQE4eMI, there are no limit on the number of characters in each string. Is there any way to test whether or not one…

java text data-mining text-mining

asked Feb 11 '14 at 23:21

ikel

1,790
6
31
61

Prev 1 2 3

…

99 100 Next