1

I have created a term document matrix using R tm package and exported it into a csv by converting it into a dataframe.

Sample portion of the term document matrix:

        1   10  12  14  15  16  17
century 0   4   0   0   1   5   3
pete    0   2   0   6   1   0   0
additive    2   0   0   0   0   0   0
administration  1   5   3   0   3   0   0
administration  1   0   0   0   0   0   5
administrator   0   0   0   0   0   0   0
aeronautical    3   0   0   45  5   0   0
agency  0   0   5   0   0   0   0
amateur 0   0   6   0   0   0   0
anchor  5   0   1   0   0   6   0
basic   0   0   0   0   0   0   0
charles 0   0   6   0   0   0   0
commercial  0   6   0   0   0   4   0
commercial  0   0   0   0   0   2   0
commission  0   0   3   7   2   0   0
committee   0   4   0   0   1   5   3
compelling  0   2   7   6   1   0   0
construction    2   0   0   0   0   0   0
controlled  1   5   6   0   3   0   0
cooperating 1   0   0   0   0   0   5
cost    0   0   0   0   0   0   0
crewmember  3   0   0   45  0   0   0
depressed   0   0   0   0   0   0   0
developer   0   0   8   0   0   0   0
development 5   0   0   0   0   0   0
development 0   0   0   0   0   0   0
direct  0   0   0   0   0   0   0

How can I convert it into a table in the below table that contains the title and only the terms present in it, for further analysis in tableau?

Title   term    freq
1   additive    2
1   administration  1
1   administration  1
1   aeronautical    3
1   anchor  5
1   construction    2
1   controlled  1
1   cooperating 1
1   crewmember  3
1   development 5
10  century 4
10  pete    2
10  administration  5
10  commercial  6
10  committee   4
10  compelling  2
10  controlled  5
12  administration  3
12  agency  5
12  amateur 6
12  anchor  1
12  charles 6
12  commission  3
12  compelling  7
12  controlled  6
12  developer   8
.   ... ..
.   ... ..
.   ... ..
.   ... ..
.   ... ..
koder
  • 81
  • 3
  • 9
  • 2
    This seems like a case for `melt` in package `reshape2`, i.e. reshaping data from 'wide' to 'long' format. There is a plethora of posts on this topic on SO. – Henrik Apr 01 '14 at 09:17
  • For me that looks nach Pandas. I don't understand the format of your original data. Could you give more explanations? What is the header of your data? – Tengis Apr 01 '14 at 09:20
  • @Henrik I'll look into the reshape threads. thanks! Tengis - This is a term document matrix generated by R text mining (tm) package using a corpus of text documents. I'm not sure if it will have a header. 1,10,12,14 etc are document numbers. – koder Apr 01 '14 at 09:37
  • melt did the trick! Thanks a ton Henrik! – koder Apr 15 '14 at 08:52

0 Answers0