Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

3094 questions
8
votes
3 answers

What techniques/tools are there for discovering common phrases in chunks of text?

Lets say I have 100000 email bodies and 2000 of them contains an abitrary common string like "the quick brown fox jumps over the lazy dog" or "lorem ipsum dolor sit amet". What techniques could/should I use to "mine" these phrases? I'm not…
JohannesH
  • 6,430
  • 5
  • 37
  • 71
8
votes
7 answers

Application of machine learning

I've seen some machine learning questions on here so I figured I would post a related question: Suppose I have a randomly generated food list which includes an entree, dessert, and a drink. An example would be Chicken, cheesecake, orange juice. The…
ono
  • 2,984
  • 9
  • 43
  • 85
8
votes
2 answers

Handling missing attributes in Naive Bayes classifier

I am writing a Naive Bayes classifier for performing indoor room localization from WiFi signal strength. So far it is working well, but I have some questions about missing features. This occurs frequently because I use WiFi signals, and WiFi access…
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
8
votes
3 answers

Computing F-measure for clustering

Can anyone help me to calculate F-measure collectively ? I know how to calculate recall and precision, but don't know for a given algorithm how to calculate one F-measure value. As an exemple, suppose my algorithm creates m clusters, but I know…
mahesh cs
  • 337
  • 1
  • 2
  • 12
8
votes
4 answers

Finding the center of a cluster

I have the following problem - made abstract to bring out the key issues. I have 10 points each which is some distance from the other. I want to be able to find the center of the cluster i.e. the point for which the pairwise distance to each other…
Ankur
  • 50,282
  • 110
  • 242
  • 312
8
votes
3 answers

What is the meaning of jitter in visualize tab of weka

In weka I load an arff file. I can view the relationship between attributes using the visualize tab. However I can't understand the meaning of the jitter slider. What is its purpose?
Xolve
  • 22,298
  • 21
  • 77
  • 125
8
votes
2 answers

Iregular plot of k-means clustering, outlier removal

Hi I'm working on trying to cluster network data from the 1999 darpa data set. Unfortunately I'm not really getting clustered data, not compared to some of the literature, using the same techniques and methods. My data comes out like this: As you…
G Gr
  • 6,030
  • 20
  • 91
  • 184
8
votes
3 answers

Formula for "Relative absolute error" and "Root relative squared error" used in machine learning (as computed by Weka)

In open source data mining software Weka (written in Java), when I run some data mining algorithm like Linear regression Weka returns model and some model evaluating metrics for test data. It looks like this: Correlation coefficient …
Rasto
  • 17,204
  • 47
  • 154
  • 245
7
votes
2 answers

What is the Time and Space complexity of FP-Growth algorithm?

How do we calculate the Time complexity and Space complexity of FP_growth algorithm in Data Mining??
Kalyan Manda
  • 71
  • 1
  • 2
7
votes
2 answers

Association mining with large number of small datasets

I have a large number (100-150) of small (approx 1 kbyte) datasets. We will call these the 'good' datasets. I also have a similar number of 'bad' datasets. Now I'm looking for software (or perhaps algorithm(s)) to find rules for what constitutes a…
7
votes
3 answers

Weka GUI - Not enough memory, won't load?

This same installation of Weka has loaded for me in the past. I am simply trying to load the Weka GUI (double click on the icon) and I get the following error. How can I fix it? OutOfMemory Not enough memory. Please load a smaller dataset or use…
Jim
  • 4,509
  • 16
  • 50
  • 80
7
votes
3 answers

Recommendation algorithm (and implementation) for finding similar items and users

I have a database of about 700k users along with items they have watched/listened to/read/bought/etc. I would like to build a recommendation engine that recommends new items based on what users with similar taste in things have enjoyed, as well as…
7
votes
1 answer

Calculating a user's importance or 'Betweenness Centrality' from a user's followers?

I want to know how I can find interesting relationships between users accounts such as the most connected, or most valuable users based on their connections to others. Below I have the two tables I use. One has all the users, the other has the keys…
Xeoncross
  • 55,620
  • 80
  • 262
  • 364
7
votes
2 answers

Data Mining situation

Suppose I have the data as mentioned below. 11AM user1 Brush 11:05AM user1 Prep Brakfast 11:10AM user1 eat Breakfast 11:15AM user1 Take bath 11:30AM user1 Leave for office 12PM user2 Brush 12:05PM user2 Prep Brakfast 12:10PM user2 eat…
user722856
  • 415
  • 1
  • 4
  • 14
7
votes
2 answers

Adding CURE clustering algorithm to WEKA

I have written a java program to perform CURE clustering. I wish to add this program to weka as a clustering algorithm and visualize the clustering. Has anyone already implemented it on weka?Any links to that would be very much helpful. How do I…
JSpider
  • 71
  • 2