Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

machine-learning, artificial-intelligence and statistics provide many techniques used in data mining, in combination with database technologies for efficiency. Please use the appropriate tag (e.g. machine-learning) to refer to the raw methods.
Cluster analysis (dataclustering) and outlier detection (outliers) are two of the main challenges from data mining.
Wiki Links
Data Mining Introduction

3094 questions

votes

4 answers

Guided mining of common substructures in large set of graphs

I have a large (>1000) set of directed acyclic graphs with a large (>1000) set of vertices each. The vertices are labeled, the label's cardinality is small (< 30) I want to identify (mine) substructures that appear frequently over the whole set of…

algorithm graph data-mining graph-theory

asked Dec 15 '16 at 17:48

user2722968

13,636
2
46
67

votes

3 answers

How to find out if a sentence is a question (interrogative)?

Is there an open source Java library/algorithm for finding if a particular piece of text is a question or not? I am working on a question answering system that needs to analyze if the text input by user is a question. I think the problem can…

java algorithm nlp data-mining text-processing

asked Aug 26 '10 at 09:41

nabeelmukhtar

1,371
15
24

votes

6 answers

Text mining with PHP

I'm doing a project for a college class I'm taking. I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes…

php nlp data-mining nltk weka

asked May 06 '10 at 17:17

garyc40

votes

1 answer

How to find common phrases in a large body of text

I'm working on a project at the moment where I need to pick out the most common phrases in a huge body of text. For example say we have three sentences like the following: The dog jumped over the woman. The dog jumped into the car. The dog jumped…

data-structures graph data-mining text-analysis

asked Dec 18 '09 at 15:52

benmcredmond

1,702
2
15
22

votes

2 answers

pandas pivot table rename columns

How to rename columns with multiple levels after pandas pivot operation? Here's some code to generate test data: import pandas as pd df = pd.DataFrame({ 'c0': ['A','A','B','C'], 'c01': ['A','A1','B','C'], 'c02': ['b','b','d','c'], …

python pandas pivot pivot-table data-mining

asked Feb 07 '17 at 20:09

muon

12,821
11
69
88

votes

2 answers

scikit-learn: clustering text documents using DBSCAN

I'm tryin to use scikit-learn to cluster text documents. On the whole, I find my way around, but I have my problems with specific issues. Most of the examples I found illustrate clustering using scikit-learn with k-means as clustering algorithm.…

machine-learning scikit-learn cluster-analysis data-mining dbscan

asked Aug 09 '14 at 09:22

Christian

3,239
5
38
79

votes

7 answers

Can k-means clustering do classification?

I want to know whether the k-means clustering algorithm can do classification? If I have done a simple k-means clustering . Assume I have many data , I use k-means clusterings, then get 2 clusters A, B. and the centroid calculating method is…

algorithm cluster-analysis data-mining k-means

asked Mar 10 '14 at 13:00

Sirius Wang

votes

6 answers

Is it ok to define your own cost function for logistic regression?

In least-squares models, the cost function is defined as the square of the difference between the predicted value and the actual value as a function of the input. When we do logistic regression, we change the cost function to be a logarithmic…

machine-learning data-mining regression

asked Aug 28 '12 at 11:01

London guy

27,522
44
121
179

votes

5 answers

How can I perform K-means clustering on time series data?

How can I do K-means clustering of time series data? I understand how this works when the input data is a set of points, but I don't know how to cluster a time series with 1XM, where M is the data length. In particular, I'm not sure how to update…

matlab time-series cluster-analysis data-mining k-means

asked Aug 17 '10 at 14:44

Jaz

votes

3 answers

Better text documents clustering than tf/idf and cosine similarity?

I'm trying to cluster the Twitter stream. I want to put each tweet to a cluster that talk about the same topic. I tried to cluster the stream using an online clustering algorithm with tf/idf and cosine similarity but I found that the results are…

machine-learning data-mining cluster-analysis text-mining

asked Jul 08 '13 at 23:40

Jack Twain

6,273
15
67
107

votes

2 answers

Can an author's unique "literary style" be used to identify him/her as the author of a text?

Let's imagine, I have two English language texts written by the same person. Is it possible to apply some Markov chain algorithm to analyse each: create some kind of fingerprint based on statistical data, and compare fingerprints gotten from…

machine-learning data-mining markov-chains nlp

asked Jan 22 '11 at 23:22

user313885

votes

1 answer

Clustering cosine similarity matrix

A few questions on stackoverflow mention this problem, but I haven't found a concrete solution. I have a square matrix which consists of cosine similarities (values between 0 and 1), for example: | A | B | C | D A | 1.0 | 0.1 | 0.6 | 0.4 B…

python math scikit-learn cluster-analysis data-mining

asked May 06 '15 at 23:58

Stefan D

1,229
2
15
29

votes

5 answers

Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30…

java groovy artificial-intelligence machine-learning data-mining

asked Dec 03 '09 at 00:20

Registered User

3,050
5
26
32

votes

4 answers

Best clustering algorithm? (simply explained)

Imagine the following problem: You have a database containing about 20,000 texts in a table called "articles" You want to connect the related ones using a clustering algorithm in order to display related articles together The algorithm should do…

algorithm text cluster-analysis data-mining text-mining

asked May 12 '09 at 14:38

caw

30,999
61
181
291

votes

5 answers

What kind of artificial intelligence jobs are out there?

Throughout my academic years in computer science I fell in love with many aspects of artificial intelligence. From expert systems, neural networks, to data mining (classification). I wonder, if I was to transform this academic passion…

artificial-intelligence neural-network data-mining

asked Apr 03 '09 at 18:53

wsb3383

3,841
12
44
59

Prev 1 2 3

…

99 100 Next