7

Is there any rule of thumb to initialize the num_leaves parameter in lightgbm. For example for 1000 featured dataset, we know that with tree-depth of 10, it can cover the entire dataset, so we can choose this accordingly, and search space for tuning also get limited.

But in lightgbm, how we can roughly guess this parameters, otherwise its search space will be pretty large while using grid-search method.

Any intuition on selecting this parameters will be helpful.

Mischa Lisovyi
  • 3,207
  • 18
  • 29
Ankish Bansal
  • 1,827
  • 3
  • 15
  • 25

1 Answers1

8

The best recommendation, that I bumped into is this awesome summary by Laurae on lightgbm github. As always, this very much depends on your data.

My personal rule of thumb based on limited kaggle experience is to start by trying values in the range [10,100]. But if you have a solid heuristic to choose tree depth you can always use it and set num_leaves to 2^tree_depth - 1

Mischa Lisovyi
  • 3,207
  • 18
  • 29
  • 3
    why `-1`, why not just `2^tree_depth`? – zyxue Jun 18 '20 at 16:21
  • 1
    The tree of depth one has 1 leaf/node, depth two- (1+2) leaves, depth three - (1+2+4). The rest you get by induction – Mischa Lisovyi Jun 19 '20 at 08:56
  • I think leaf should mean terminal nodes, so tree of depth n could have 2^n leaves/terminal-nodes,, and 2^n - 1 non-terminal nodes. https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html confirms my understanding. But I'm not sure of the intuition behind 2^n - 1 num_leaves. – zyxue Jun 19 '20 at 13:51
  • I fail to see where does the page confirm the assumption that leaves are terminal nodes. The number of leaves in the layer n of a tree is 2^(n-1), but this does not relate to `num_leaves` – Mischa Lisovyi Jun 19 '20 at 19:25
  • For "the assumption that leaves are terminal nodes", I'm looking at the line `Theoretically, we can set num_leaves = 2^(max_depth) to obtain the same number of leaves as depth-wise tree.`. I think for **a tree of depth one, it has one split or non-terminal node, and two leaves/terminal nodes**. That saying tree of depth one has 1 leaf/node is _NOT_ correct. – zyxue Jun 19 '20 at 21:08
  • 2
    Ah, i think they have not been explicit in that statement. You can see at the end of that paragraph, that they actually quote the value 2**7-1 for the example that they give. Throughout the docs you will see that the maximum value is odd and not even. One other example is the parameter docs by Laurae: https://sites.google.com/view/lauraepp/parameters -> `Maximum leaves`: "On LightGBM, the maximum leaves must be tuned with the maximum depth together. To get xgboost behavior, set the maximum leaves to 2^depth - 1." You can verify your hypothesis by building a tree of depth 1 and plot it – Mischa Lisovyi Jun 21 '20 at 06:37
  • 31 is random, see [this issue](https://github.com/microsoft/LightGBM/issues/3177#issuecomment-647206218) – zyxue Jun 22 '20 at 00:51