Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

3094 questions
10
votes
1 answer

Time Series Breakout/Change/Disturbance Detection in R: strucchange, changepoint, BreakoutDetection, bfast, and more

I would like for this to become a sign-post for various time series breakout/change/disturbance detection methods in R. My question is to describe the motivation and differences in approaches with each of the following packages. That is, when does…
JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
10
votes
4 answers

In scikit-learn, can DBSCAN use sparse matrix?

I got Memory Error when I was running dbscan algorithm of scikit. My data is about 20000*10000, it's a binary matrix. (Maybe it's not suitable to use DBSCAN with such a matrix. I'm a beginner of machine learning. I just want to find a cluster method…
10
votes
4 answers

Expectation Maximization coin toss examples

I've been self-studying the Expectation Maximization lately, and grabbed myself some simple examples in the process: http://cs.dartmouth.edu/~cs104/CS104_11.04.22.pdf There are 3 coins 0, 1 and 2 with P0, P1 and P2 probability landing on Head when…
10
votes
1 answer

Comparing sentences according to their meaning

Python provides the NLTK library which is a vast resource of text and corpus, along with a slew of text mining and processing methods. Is there any way we can compare sentences based on the meaning they convey for a possible match? That is, an…
SexyBeast
  • 7,913
  • 28
  • 108
  • 196
10
votes
2 answers

How to group nearby latitude and longitude locations stored in SQL

Im trying to analyse data from cycle accidents in the UK to find statistical black spots. Here is the example of the data from another website. http://www.cycleinjury.co.uk/map I am currently using SQLite to ~100k store lat / lon locations. I want…
Robert
  • 37,670
  • 37
  • 171
  • 213
10
votes
2 answers

large scale data mining with clojure

I'm looking for a good reference on large scale data mining with Clojure I know of many good clojure programming books (Programming Clojure, Joy of Clojure, ...), and many good data mining text books (mining of massive data sets, managing gigabytes,…
user1383359
  • 2,673
  • 2
  • 25
  • 32
10
votes
5 answers

large scale clustering library possibly with python bindings

I've been trying to cluster some larger dataset. consisting of 50000 measurement vectors with dimension 7. I'm trying to generate about 30 to 300 clusters for further processing. I've been trying the following clustering implementations with no…
tisch
  • 1,098
  • 3
  • 13
  • 30
9
votes
4 answers

Check if one regex covers another regex

I'm attempting to implement a text clustering algorithm. The algorithm clusters similar lines of raw text by replacing them with regexes, and aggregates the number of patterns matching each regex so as to provide a neat summary of the input text…
Kowshik
  • 1,541
  • 3
  • 17
  • 25
9
votes
5 answers

Clustering Algorithm with discrete and continuous attributes?

Does anyone know a good algorithm for perform clustering on both discrete and continuous attributes? I am working on a problem of identifying a group of similar customers and each customer has both discrete and continuous attributes (Think type of…
Matt W
  • 185
  • 1
  • 6
9
votes
1 answer

clustering and matlab

I'm trying to cluster some data I have from the KDD 1999 cup dataset the output from the file looks like…
G Gr
  • 6,030
  • 20
  • 91
  • 184
9
votes
1 answer

The relationship between latent Dirichlet allocation and documents clustering

I would like to clarify the relationship between latent Dirichlet allocation (LDA) and the generic task of document clustering. The LDA analysis tends to output the topic proportions for each document. If my understanding is correct, this is not…
user785099
  • 5,323
  • 10
  • 44
  • 62
9
votes
2 answers

Java text clustering library

Which of the data mining java libraries can do text clusterization?
bme
  • 516
  • 5
  • 13
9
votes
1 answer

How to perform clustering on Word2Vec

I have a semi-structured dataset, each row pertains to a single user: id, skills 0,"java, python, sql" 1,"java, python, spark, html" 2, "business management, communication" Why semi-structured is because the followings skills can only be selected…
Ivan
  • 673
  • 2
  • 8
  • 20
9
votes
3 answers

Can someone please explain data mining, SSIS, BI, ETL and other related technologies?

I was talking with a co-worker yesterday regarding a situation where he used SSIS (or something like that) to do some really cool thing with an SSIS Package where he passed in a name like "Dr. Reginald Williams, PhD." and based on some weighting…
Micah
  • 111,873
  • 86
  • 233
  • 325
9
votes
3 answers

Apriori algorithm Anti-monotonic vs monotonic

According to Wikipedia, a monotonic function is a function that is either increasing or decreasing. If a function is increasing and decreasing then it's not a monotonic function or it's an anti-monotonic function. But the data mining book, "Data…
Mohamed Horani
  • 544
  • 3
  • 9
  • 23