Questions tagged [boosting]

Boosting is a machine learning ensemble meta-algorithm in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Also: Boosting is the process of enhancing the relevancy of a document or field

From [the docs]:

"Boosting" is a machine learning ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones.

Also:

From the docs:

Boosting is the process of enhancing the relevancy of a document or field. Field level mapping allows to define an explicit boost level on a specific field. The boost field mapping (applied on the root object) allows to define a boost field mapping where its content will control the boost level of the document.

181 questions
2
votes
2 answers

Classification results depend on random_state?

I want to implement a AdaBoost model using scikit-learn (sklearn). My question is similar to another question but it is not totally the same. As far as I understand, the random_state variable described in the documentation is for randomly splitting…
kensaii
  • 314
  • 5
  • 16
2
votes
1 answer

Boosting has no effect in a Boolean-filtered query in Elasticsearch

I'm trying to add a boost to documents that match to a term filter. The basis is a Boolean/MatchAll query. But the boosting in my Elasticsearch query has no effect. All result scores are set to 1: curl -XPOST…
muwnd
  • 31
  • 7
2
votes
1 answer

Gradient Boosting Classifier Loss Function with sklearn - operands could not be braodcast together

I am having a problem with the estimator.loss_ method for the sklearn Gradient Boosting Classifier. I am trying to graph the test error in comparison to the training error over time. Here is some of my data prep: # convert data to numpy array train…
2
votes
1 answer

ElasticSearch - Boost score for fuzzy words

I want perform fuzzy search on user search words(apple iphone 5s). I want to give more score value to first(apple), little less for second and so on. I started with the query given below but not working as I expected: { "query": { …
Mahesh
  • 1,651
  • 5
  • 26
  • 47
1
vote
0 answers

How correctly choose number of jobs for estimator and validation?

I have classification problem to solve and use different classificators to solve the task. I use cross_val_score and cross_val_predict for validation and prediction. Both of them and estimator, e.g. LGBMClassifier support parallelizing. I have 46…
Nourless
  • 729
  • 1
  • 5
  • 18
1
vote
1 answer

Tree based algorithm different behavior with duplicated features

I don't understand why I have three different behaviors depending on the classifier I use, even though they should go hand in hand. This is the code in order to go deeply in the question: from sklearn import datasets from sklearn.ensemble import…
mat
  • 181
  • 14
1
vote
1 answer

How to use a GradientBoostingRegressor in scikit-learn with 3 output dimensions

I am trying to map 13-dimensional input data to 3-dimensional output data by using RandomForest and GradientBoostingRegressor of scikit-learn. While for the RandomForest regressor this works fine, I get a ValueError for the GradientBoostingRegressor…
PeterBe
  • 700
  • 1
  • 17
  • 37
1
vote
1 answer

elasticsearch priorities search result in match query, composite bool query

I have below elasticsearch query and I want to set the priority order in my query. irrespetive of scoure. eg: like if I set priority of attack_id > name > description in the match query, then the result should come in this sorted order …
1
vote
0 answers

XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

When training XGBoost iteratively for data too large to fit in memory, one may want to use "batches". The problem is, however, that each batch may not contain all 0,...,C labels. This leads to the error ValueError: The label must consist of integer…
Julian L
  • 84
  • 1
  • 3
  • 10
1
vote
1 answer

Solr - how to use only the top boost value to rank the result?

In a Solr query with boosting, I would like that solr uses only the top boost value to rank the result, ignoring the secondary score matches. For example: q=field_1=123^100 OR field_2=123^50 OR field_3=123^10 If one document match with two fields,…
David
  • 83
  • 3
1
vote
0 answers

Read hyperparamters from lightgbm.basic.Booster object

How do you read the hyperparameters from an lightgbm.basic.Booster object? The object is created from file: model = pickle.load(open(filename, 'rb')) Stuff like n_estimators, boosting_type, learning_rate is not available from model.dump_model()
Endre Moen
  • 695
  • 2
  • 9
  • 19
1
vote
1 answer

scikit pipeline is not proceeded correctly with GridsearchCV

I am trying to feed a dataset with categorical and numerical variable. So I one hot encode the categorical features and input it into a pipeline used in gridsearchCV. The error is at the last row when I try to fit the model. My understanding is it…
delalma
  • 838
  • 3
  • 12
  • 24
1
vote
1 answer

LightGBM: Intent of lightgbm.dataset()

What is the purpose of lightgbm.Dataset() as per the docs when I can use the sklearn API to feed the data and train a model? Any real world examples explaining the usage of lightgbm.dataset() would be interesting to learn?
1
vote
1 answer

Fitting Ensemble Regressor within a loop generates repeat values

I'm trying to use an ensemble regressor to predict production based on a couple of material measurements. My data is annual, going back to 1965. (Some details stripped out and random data used because this is for a work project using sensitive…
Bob
  • 35
  • 8
1
vote
0 answers

Negative R2_score Bad predictions for my Sales prediction problem using LightGBM

My project involves trying to predict the sales quantity for a specific item across a whole year. I've used the LightGBM package for making the predictions. The params I've set for it are as follows: params = { 'nthread': 10, 'max_depth': 5,…