0

I'm creating an array within an array. (paragraphs within articles). Then i flatten this into one array. Let's call this ARRAY_COMPARISON_REFERENCE

Next to that I am making an array that gives a number to every next paragraph. The first paragraph of the first article is number one, the second paragraph is number two, and the first paragraph of the second article is number three, and so on. Let's call this ARRAY_INFORMATION_REFERENCE

Everytime that I find a 50% match of words between two paragraphs, I want to save the text of those of paragraphs into a file. I use .flatten to loop through all the articles, and reference in the index numbers. However, these do not correspond with the index number of the ARRAY_INFORMATION_REFERENCE.

How do I make the translation of a (two level).flatten to a regular (every new paragraph += 1) reference?

paragraphnumber = Array.new
paragraphs = []
Dir.glob("*.txt").each do |textfile|
    #first level: textfiles
 paragraphtext = []
 File.foreach(textfile, "\.\r") do |paragraph|
    #second level: paragraphs within the textfiles
    #here I fill the array, effectively starting the index and adding 1 to the index at every iteration through the loop.
    # THIS IS THE ARRAY_INFORMATION_REFERENCE
    paragraphtext << paragraph
  end
  paragraphs << paragraphtext
 end

#here i make the second index:
paragraphs.flatten.each_with_index do |x, indexx|

    paragraphs.flatten.each_with_index do |y, indexy|
    count = x.count { |k,v| y.include?(k) }
    if count > 20
        #these are the reference numbers
        index_paragraph1 = "#{indexx}"
        index_paragraph2 = "#{indexy}"
        #And here i try to use the reference of the second Array, to find the information out the first Array, which is not working
            #THIS IS THE ARRAY_COMPARISON_INDEX
        information_paragraph1 = paragraphtext.at(indexx)
        information_paragraph2 = paragraphtext.at(indexy)
        end
    end
end

The problem is: the reference number of a paragraph in the ARRAY_COMPARISON_REFERENCE does not correspond with that of the ARRAY_INFORMATION REFERENCE. Using .flatten the index is made in a different way apparently.. How do I make the translation between both indexes?

Seeb
  • 199
  • 4
  • 16
  • It doesn't look like you're re-initializing `paragraphtext` within your `glob(...).each` block. Is that really true? – Peter Alfvin Dec 19 '13 at 16:14
  • My actual code is much longer and has a lot more elements. But yes, in the actual code there is a paragraphtext = [] AFTER starting glob(...).each. If that's what you mean. – Seeb Dec 19 '13 at 16:19
  • 1
    Yeah, that's what I mean. I suggest you add that statement and eliminate the `paragraphtext = Array.new` line altogether, for clarity. – Peter Alfvin Dec 19 '13 at 16:21
  • How do you initialize and increment `paragraafno`? I think it's important to show that. – Peter Alfvin Dec 19 '13 at 16:24
  • Actually, I don't think it is. I used it to reference the number of the paragraph WITHIN an article. In this case everytime i push a new value into the paragraphtext array, the reference number is simply added with 1. Example: p = Array.new | p << a | p << b | p(1) = b – Seeb Dec 19 '13 at 16:37
  • I see you edited the question. I thought you were building up `paragraphnumber` as what you described as `ARRAY_INFORMATION_REFERENCE`. Is that not the case? On a related point, what's the purpose of showing `paragraphnumber` now? – Peter Alfvin Dec 19 '13 at 16:42
  • In any event, if `paragraphnumber` is not `ARRAY_INFORMATION_REFERENCE`, then what _is_? You're asking why the indexes in `ARRAY_INFORMATION_REFERENCE` don't match the indexes you're getting from `each_with_index` after flattening, but you're not showing us how you create `ARRAY_INFORMATION_REFERENCE`. – Peter Alfvin Dec 19 '13 at 16:46
  • Alvin, first of all: thanks for the attention to my question. Really great. Why i don't think paragraphnumber is necessary is the following: an array is automatically numbered. The first value has a 'zero', the second value gets a '1'. The "ARRAY_INFORMATION_REFERENCE" therefore is automatic: every value in the array gets a number. The third value gets the number 2 as a reference.The fourth the number 3 as a reference, and so forth. I am trying to use the automatic indexing of an array as the reference for ARRAY_INFORMATION_REFERENCE. However, this does not correspond with ARRAY_COMPARISON_REF – Seeb Dec 19 '13 at 19:33
  • Ok, I see that you've explained in comments what you mean by `ARRAY_INFORMATION_REFERENCE` and `ARRAY_COMPARISON_INDEX`. – Peter Alfvin Dec 19 '13 at 20:14
  • I think your question would be a lot clearer if you eliminated any reference to `ARRAY_INFORMATION_REFERENCE` and `ARRAY_COMPARISON_INDEX` in your question and rephrased it in terms of variables actually used in your code. – Peter Alfvin Dec 19 '13 at 20:19
  • Also, `paragraphnumber` is no longer relevant and should be removed from your sample code. Similarly, there is no apparent reason for including the assignments to `index_paragraph1` or `index_paragraph2`. – Peter Alfvin Dec 19 '13 at 20:21

1 Answers1

1

Since you're not providing all the relevant code and we can't see what is going in paragraphs, we can't know the effect of flatten.

However, we can say that the flattened indices will only match the unflattened indices if flatten has no effect (i.e. if there are no arrays within paragraphs.)

Peter Alfvin
  • 28,599
  • 8
  • 68
  • 106
  • I will give my code another look. Apparently in trimming it down so it's presentable on stackoverflow, I've lost some elements. THe issue is that i want to use the index of parahraphs.flatten to locate the corresponding value at paragraph_text (so the array with the information). However, by using flatten. the index is changed. Whereas the first paragraph of the second article could be at index 3 in ARRAY_INFORMATION_REFERENCE it could be at, for example, 7 at ARRAY_COMPARISON_REFERENCE. I'll try to better the code in the example.. – Seeb Dec 19 '13 at 20:40
  • you say: "flattened indices will only match the unflattened indices if flatten has no effect (i.e. if there are no arrays within paragraphs.)" -- and that's exactly my problem. How then can i reference back towards (the array with) all the information? If i know there is a match between two paragraphs, I then want the information out of those paragraphs, but I don't know how? – Seeb Dec 19 '13 at 20:59
  • 1
    If you have a non-trivial structure for your paragraphs array (i.e. embedded arrays, hashes, etc.), then you're going to have to navigate and retain the indices of that complicated structure if you want to subsequence index into that structure. Alternatively, you can retain and index into the flattened array. – Peter Alfvin Dec 19 '13 at 21:03