-1

I would like to split a data set into multiple data set of 1000 rows and how is it possible?

The Node row splitter has only two output . Let me know if there is any way to use java snippet for this requirement.

Danny
  • 94
  • 10
  • 1
    Welcome to SO, please be a bit more specific when asking question: what have you tried, what do you expect, etc. See [how to ask](http://stackoverflow.com/help/how-to-ask) – Nehal Jan 22 '16 at 11:50
  • thanks Nehal for your comment! Addition: I have tried to use Java Snippet Row Splitter to split the rows based on the below criteria return $Value$ <= 1000; I could split the entire data set into two however i would like to split them into multiple data set so that I can cluster them with without outlier and Can plot them in scatter plot ( It is not accepting large number of rows) Can someone suggest me a method to multi split a data set? – Danny Jan 22 '16 at 11:55
  • please try to add your code what have you tried yet – Nehal Jan 22 '16 at 11:58
  • There ain't a code . I have just used the default node available in Knime – Danny Jan 22 '16 at 12:02
  • When asking at a *programming* website, you should first try to find a *programming* solution yourself, not only "default nodes" in a UI. – Has QUIT--Anony-Mousse Jan 22 '16 at 12:25
  • My problem is which node to use for splitting the data set and this is my first exercise in knime , I do not have good idea about all the nodes. I am pretty much sure that this simple operation could be achieved by tweaking workflow control. Mousse : Am I asking in a wrong group? or could you redirect me to the the correct audience ? – Danny Jan 22 '16 at 13:28
  • May I know why was my query down voted? I have explained everything that I did and require – Danny Jan 22 '16 at 14:17
  • @Nehal I think of these visual workflow systems as visual programming languages, so in my opinion asking about how to achieve things is probably on-topic without code (as the workflow would be the code which can be described using regular words), but with description what he or she has tried. In this case though the question could be more specific on how the table should be split. – Gábor Bakos Jan 24 '16 at 20:36
  • Thanks for the explanation Gabor. I don,t have any specific condition for split. I just need to know how to achieve multi split because the default node row splitter has one input and two output( splits into two based on the condition ) – Danny Jan 25 '16 at 20:33

1 Answers1

1

It is not entirely well specified how you want to split the table, but there are two loop types that might do what you are looking for: Chunk Loop (Start) or Group Loop (Start). Your workflow probably would look like this:

[(Chunk/Group) Loop Start] --> Your processing nodes of the selected rows --> [Loop End]

In the part Your processing nodes of the selected rows you will only see the splitted parts you need.

The difference between the two nodes is the following: the Chunk Loop Start nodes collect the rows to a group by their position (consecutive nodes part of the same group till the requested number of rows are consumed), while the Group Loop Start collects the rows with the same properties to the same collection for processing. (The Loop End node might be not the best fit depending on your processing requirements, in that case look for other Loop End nodes.)

In case these are not sufficient, you might try the parallel chunk loop nodes or as I remember there are bagging, ensemble and cross validation (X-Validation) nodes too in some extensions. (For more complex workflows you can also use recursive loops.) For feature elimination, you can also find support.

Gábor Bakos
  • 8,982
  • 52
  • 35
  • 52