Questions tagged [data-mining]

Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

Data mining, also known as knowledge discovery, is the process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools like SQL Server Analysis Services, predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Input to learning mining algorithms is called cases, samples, examples, instances, events, and observations.

3094 questions
1
vote
1 answer

How to extract useful features from a graph?

Things are like this: I have some graphs like the pictures above and I am trying to classify them to different kinds so the shape of a character can be recognized, and here is what I've done: I apply a 2-D FFT to the graphs, so I can get the…
Bob Fang
  • 6,963
  • 10
  • 39
  • 72
1
vote
5 answers

how to get the similar texts from a lot of pages?

get the x most similar texts from a lot of texts to one text. maybe change the page to text is better. You should not compare the text to every text, because its too slow.
Bruce Dou
  • 4,673
  • 10
  • 37
  • 56
1
vote
2 answers

What does `locality-sensitive` stands for in locality-sensitive hashing?

What does locality-sensitive stands for in locality-sensitive hashing ? Is there formal definition of this term ?
Qbik
  • 5,885
  • 14
  • 62
  • 93
1
vote
2 answers

Cluster text documents in database

I do have 20.000 text files loaded in PostgreSQL database, one file in one row, all stored in table named docs with columns doc_id and doc_content. I know that there is approximately 8 types of documents. Here are my questions: How can I find these…
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155
1
vote
6 answers

How do I data mine text?

Here's the problem. I have a bunch of large text files with paragraphs and paragraphs of written matter. Each para contains references to a few people (names), and documents a few topics (places, objects). How do I data mine this pile to assemble…
Robin Rodricks
  • 110,798
  • 141
  • 398
  • 607
1
vote
2 answers

Input arff file for Weka Apriori

I am trying to do association mining on version history. I have my transaction data in mysql. Weka apriori algorithm requires arff or csv file in a certain format. It has to have columns for each item. The values will be specified as TRUE or FALSE…
user1239080
  • 61
  • 2
  • 6
1
vote
1 answer

How can I install "DataMining Adding for Office 2007" as part of my setup?

I'm writting a setup program that needs to install the DataMining Adding for Office 2007. 1) How do I detect if it's already installed? 2) If it is not installed, I download and run the MSI (SQLServer2008_DMAddin.msi). But how can I run the Server…
Nestor
  • 13,706
  • 11
  • 78
  • 119
1
vote
1 answer

Extract data from a cube's dimension created from a View

We have imported an SQL View table into a dimension. We already programmed a connector that talks with data cubes (MDX queries). That said, the view we originally imported contains all the raw data we need to query. Problem is, the MDX client…
Tommy Dubé-Leblanc
  • 317
  • 1
  • 7
  • 20
1
vote
1 answer

ELKI COPAC implementation

I tried to run COPAC ELKI implementation on the example dataset provided on the official site (mouse.csv) but I get a NullPointerException which leads me to think that there is some detail that I omit (shame on me). The exception is the…
Gibbster
  • 33
  • 4
1
vote
1 answer

Extract x-axis value using y-axis data in R

I have a time-series dataset in this format: Time Val1 Val2 0 0.68 0.39 30 0.08 0.14 35 0.12 0.07 40 0.17 0.28 45 0.35 0.31 50 0.14 0.45 100 1.01 1.31 105 0.40 1.20 110 2.02 0.57 115 1.51 0.58 130…
1
vote
1 answer

How can I globally visiting a huge dict in each mapper of Hadoop map-reduce program?

I'm doing a co-occur analysis on huge web logs. I have computed the occur times for each item, and the co-occur times for each pair of using hadoop. Now, I want to compute some correlation measure for a pair , such as…
rudaoshi
  • 53
  • 5
1
vote
1 answer

How to calculate Confidence from Support in java

Right now I am working on a program that takes a list of users who have rated movies and calculates the support for all movies. I give my program a maximum number of movies I want to calculate, a support minimum, and a confidence minimum. Currently…
Michael Staudt
  • 327
  • 2
  • 5
  • 13
1
vote
0 answers

A-close data mining implementation

I need to compare the Apriori and the A-close algorithm on a dataset so I need the implementations of both algorithms. I can find implementions of the Apriori algorithm but I can't find implementations of the A-close algorithm. It's saves me lots of…
1
vote
4 answers

Python : DIY generalize this "all_subsets" function to any size subsets

Implementing a toy Apriori algorithm for a small-data association rule mine, I have a need for a function to return all subsets. The length of the subsets is given by parameter i. I need to generalize this function for any i. The cases for i 1 or 2…
Cris Stringfellow
  • 3,714
  • 26
  • 48
1
vote
1 answer

Data-mining algorithm for dynamically consolidating recurring substrings?

I am trying to construct an artificial intelligence unit. I plan to do this by first collecting sensory input ('observations') into a short-term working-memory list, continually forming patterns found in this list ('ideas'), and committing those…