Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

3094 questions
16
votes
10 answers

Datamining open source software alternatives

I am evaluating datamining packages. I have find these two so far: RapidMiner Weka Do you have any experience to share with these two products, or any other product to recommend me? Thanks
Guillermo Vasconcelos
  • 1,701
  • 2
  • 17
  • 30
16
votes
3 answers

Use feedback or reinforcement in machine learning?

I am trying to solve some classification problem. It seems many classical approaches follow a similar paradigm. That is, train a model with some training set and than use it to predict the class labels for new instances. I am wondering if it is…
smwikipedia
  • 61,609
  • 92
  • 309
  • 482
16
votes
4 answers

java framework for image pattern recognition?

I'm looking for a Java framework to help with some data mining specific to images. We have a set of historical images that I would like to categorize and classify. I'm was hoping to find something like weka http://www.cs.waikato.ac.nz/ml/weka/ or…
D.C.
  • 15,340
  • 19
  • 71
  • 102
16
votes
2 answers

What are data requirements for FP-Growth in Weka?

I'd like to use FP-Growth association rule algorithm on my dataset (model) in Weka. Unfortunately, this algorithm is greyed out. What are preconditions I have to meet in order to make use of it?
ŁukaszBachman
  • 33,595
  • 11
  • 64
  • 74
15
votes
6 answers

Load MIT-BIH Arrhythmia ECG database onto MATLAB

I am working on ECG signal processing using neural network which involves pattern recognition. As I need to collect all the data from Matlab to use it as test signal, I am finding it difficult to load it on to the Matlab. I am using MIT Arrhythmia…
L.fole
  • 687
  • 3
  • 12
  • 19
15
votes
3 answers

Architecture for database analytics

We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are…
David Cournapeau
  • 78,318
  • 8
  • 63
  • 70
15
votes
9 answers

Best DataMining Database

I am an occasional Python programer who only have worked so far with MYSQL or SQLITE databases. I am the computer person for everything in a small company and I have been started a new project where I think it is about time to try new databases.…
Eric
  • 283
  • 1
  • 3
  • 10
15
votes
6 answers

What does dimensionality reduction mean?

What does dimensionality reduction mean exactly? I searched for its meaning, I just found that it means the transformation of raw data into a more useful form. So what is the benefit of having data in useful form, I mean how can I use it in a…
15
votes
4 answers

Trajectory Clustering: Which Clustering Method?

As a newbie in Machine Learning, I have a set of trajectories that may be of different lengths. I wish to cluster them, because some of them are actually the same path and they just SEEM different due to the noise. In addition, not all of them are…
Sibbs Gambling
  • 19,274
  • 42
  • 103
  • 174
14
votes
5 answers

NLP and Machine learning for sentiment analysis

I'm trying to write a program that takes text(article) as input and outputs the polarity of this text, weather its a positive or a negative sentiment. I've read extensively about different approaches but i am still confused. I read about many…
14
votes
2 answers

Is there a good way to do this type of mining?

I am trying to find points that are closest in space in X and Y directions (sample dataset given at the end) and am looking to see if there are smarter approaches to do this than my trivial (and untested) approach. The plot of these points in space…
Legend
  • 113,822
  • 119
  • 272
  • 400
14
votes
3 answers

Comparing a large number of graphs for isomorphism

I am comparing a large set of networkx graphs for isomorphism, where most of the graphs should not be isomorphic (Lets say 0-20% are isomorphic to something in the list, for example). I have tried the following approach. graphs = [] # A list of…
Tristan Maxson
  • 229
  • 4
  • 15
14
votes
1 answer

Issues in getting trigrams using Gensim

I want to get bigrams and trigrams from the example sentences I have mentioned. My code works fine for bigrams. However, it does not capture trigrams in the data (e.g., human computer interaction, which is mentioned in 5 places of my…
user8566323
14
votes
3 answers

Data Mining Operation using SQL Query (Fuzzy Apriori Algorithm) - Coding it using SQL

So I have this Table: Trans_ID Name Fuzzy_Value Total_Item 100 I1 0.33333333 3 100 I2 0.33333333 3 100 I5 0.33333333 3 200 I2 0.5 2 200 I5 0.5 …
Rico
  • 244
  • 4
  • 12
14
votes
4 answers

Find substring in text which has the highest similarity to a given keyword

Say I have this text = I love apples, kiwis, oranges and bananas and the searchString = kiwis and bananas and a similarity algorithm say Jaccard index. How can I efficiently find the substring in text which has the highest similarity to…
pathikrit
  • 32,469
  • 37
  • 142
  • 221