c TAKES

2014-09-20 19:29:47 UTC

Any help someone might be able to provide with this error would be greatly

appreciated.

I was trying to familiarize myself with sklearn's decision tree source code

by playing around with some of the lower level classes and functions with a

portion of the iris data set. I'm trying to build a single tree using

BestFirstTreeBuilder and I'm using code I borrowed from tree.py. The last

line in my code is creating the error:

builder.build(tree_, x, y)

The full error is:

Exception MemoryError: MemoryError() in 'sklearn.tree._tree.Tree._resize'

ignored

I've attached the full code I'm using (one page) and also pasted it below.

The version of sklearn that's being used is the pipped version rather than

the most recent version on github. Also trying this on Windows.

Anyone recognize this error?

Best, and thanks so much!

Ken

import numpy as np

from sklearn.tree._tree import BestFirstTreeBuilder

from sklearn import datasets

from sklearn.tree import _tree

from sklearn.tree._tree import Tree

from sklearn.utils import check_random_state

iris = datasets.load_iris()

x = iris.data[:, :2] # only the first two features.

y = iris.target

CRITERIA_CLF = {"gini": _tree.Gini, "entropy": _tree.Entropy}

CRITERIA_REG = {"mse": _tree.MSE, "friedman_mse": _tree.FriedmanMSE}

SPLITTERS = {"best": _tree.BestSplitter,

"presort-best": _tree.PresortBestSplitter,

"random": _tree.RandomSplitter}

criterion = "gini"

splitter = "best"

min_samples_split = 2

min_samples_leaf = 1

#min_weight_leaf = 0

max_depth = (2 ** 31) - 1

max_leaf_nodes = (2 ** 31) - 1

random_state = 0

random_state = check_random_state(random_state)

y = np.reshape(y, (-1, 1))

n_samples, n_features_ = x.shape

max_features_ = n_features_

n_outputs_ = y.shape[1]

n_classes_ = [1] * n_outputs_

n_classes_ = np.array(n_classes_, dtype=np.intp)

criterion = CRITERIA_CLF[criterion](n_outputs_,

n_classes_)

splitter = SPLITTERS[splitter](criterion,

max_features_,

min_samples_leaf,

random_state)

# Note: current pipped version of sklearn omits min_weight_leaf

builder = BestFirstTreeBuilder(splitter, min_samples_split,

min_samples_leaf,

#min_weight_leaf,

max_depth,

max_leaf_nodes)

tree_ = Tree(n_features_, n_classes_, n_outputs_)

builder.build(tree_, x, y)

appreciated.

I was trying to familiarize myself with sklearn's decision tree source code

by playing around with some of the lower level classes and functions with a

portion of the iris data set. I'm trying to build a single tree using

BestFirstTreeBuilder and I'm using code I borrowed from tree.py. The last

line in my code is creating the error:

builder.build(tree_, x, y)

The full error is:

Exception MemoryError: MemoryError() in 'sklearn.tree._tree.Tree._resize'

ignored

I've attached the full code I'm using (one page) and also pasted it below.

The version of sklearn that's being used is the pipped version rather than

the most recent version on github. Also trying this on Windows.

Anyone recognize this error?

Best, and thanks so much!

Ken

import numpy as np

from sklearn.tree._tree import BestFirstTreeBuilder

from sklearn import datasets

from sklearn.tree import _tree

from sklearn.tree._tree import Tree

from sklearn.utils import check_random_state

iris = datasets.load_iris()

x = iris.data[:, :2] # only the first two features.

y = iris.target

CRITERIA_CLF = {"gini": _tree.Gini, "entropy": _tree.Entropy}

CRITERIA_REG = {"mse": _tree.MSE, "friedman_mse": _tree.FriedmanMSE}

SPLITTERS = {"best": _tree.BestSplitter,

"presort-best": _tree.PresortBestSplitter,

"random": _tree.RandomSplitter}

criterion = "gini"

splitter = "best"

min_samples_split = 2

min_samples_leaf = 1

#min_weight_leaf = 0

max_depth = (2 ** 31) - 1

max_leaf_nodes = (2 ** 31) - 1

random_state = 0

random_state = check_random_state(random_state)

y = np.reshape(y, (-1, 1))

n_samples, n_features_ = x.shape

max_features_ = n_features_

n_outputs_ = y.shape[1]

n_classes_ = [1] * n_outputs_

n_classes_ = np.array(n_classes_, dtype=np.intp)

criterion = CRITERIA_CLF[criterion](n_outputs_,

n_classes_)

splitter = SPLITTERS[splitter](criterion,

max_features_,

min_samples_leaf,

random_state)

# Note: current pipped version of sklearn omits min_weight_leaf

builder = BestFirstTreeBuilder(splitter, min_samples_split,

min_samples_leaf,

#min_weight_leaf,

max_depth,

max_leaf_nodes)

tree_ = Tree(n_features_, n_classes_, n_outputs_)

builder.build(tree_, x, y)

I read the RGF paper. Interesting method but definitely too early to

include it in scikit-learn (we focus on mature algorithms). This also looks

more complicated to implement than gradient boosting, since tree induction

and boosting are interleaved.

The paper also clarified what "fully corrective" means, thanks. To

summarize, here are different strategies for setting the weights in

1. using the same constant value (`learning_rate`) for all estimators

2. setting the weight of the last estimator by line search (other weights

are kept fixed)

3. setting one separate weight per leaf node of the last estimator, by

line search

4. refitting all estimators weights (including the past ones)

5. refitting all leaf nodes of all estimators?

Some authors [1] argued that option 1 is sufficient in practice to get

good performance since our goal is to do well on test data, not training

data.

Option 3 is what scikit-learn implements. Option 4 is the fully corrective

case. It basically amounts to a least squares for square loss or to

logisic regression for log loss (using each estimator predictions as

features). One thing though is that our implementation doesn't store the

estimator weights explicitly (leaves are updated directly). Setting a

penalty (l1 or l2) on the estimator weights (i.e., on the functional)

should prevent overfitting. Option 5 sounds pretty computationally

expensive, although the update doesn't need to be done at every iteration.

I recently re-implemented gradient boosting [2]. One difference in my

implementation is that it is possible to use any base learner (not just

trees). So my implementation stores estimator weights explicitly and uses

option 2 above. The fully corrective updates (option 4) might be easier to

add to my implementation.

BTW, the notion of fully corrective updates is also present in the

matching pursuit (-> orthogonal matching pursuit) and frank-wolfe

literatures.

Mathieu

[1] "Boosting Algorithms: Regularization, Prediction and Model Fitting",

Peter B Ìuhlmann and Torsten Hothorn (thanks to Peter for telling me about

this paper)

[2]

https://github.com/mblondel/ivalice/blob/master/ivalice/impl/gradient_boosting.py

Mathieu

include it in scikit-learn (we focus on mature algorithms). This also looks

more complicated to implement than gradient boosting, since tree induction

and boosting are interleaved.

The paper also clarified what "fully corrective" means, thanks. To

summarize, here are different strategies for setting the weights in

1. using the same constant value (`learning_rate`) for all estimators

2. setting the weight of the last estimator by line search (other weights

are kept fixed)

3. setting one separate weight per leaf node of the last estimator, by

line search

4. refitting all estimators weights (including the past ones)

5. refitting all leaf nodes of all estimators?

Some authors [1] argued that option 1 is sufficient in practice to get

good performance since our goal is to do well on test data, not training

data.

Option 3 is what scikit-learn implements. Option 4 is the fully corrective

case. It basically amounts to a least squares for square loss or to

logisic regression for log loss (using each estimator predictions as

features). One thing though is that our implementation doesn't store the

estimator weights explicitly (leaves are updated directly). Setting a

penalty (l1 or l2) on the estimator weights (i.e., on the functional)

should prevent overfitting. Option 5 sounds pretty computationally

expensive, although the update doesn't need to be done at every iteration.

I recently re-implemented gradient boosting [2]. One difference in my

implementation is that it is possible to use any base learner (not just

trees). So my implementation stores estimator weights explicitly and uses

option 2 above. The fully corrective updates (option 4) might be easier to

add to my implementation.

BTW, the notion of fully corrective updates is also present in the

matching pursuit (-> orthogonal matching pursuit) and frank-wolfe

literatures.

Mathieu

[1] "Boosting Algorithms: Regularization, Prediction and Model Fitting",

Peter B Ìuhlmann and Torsten Hothorn (thanks to Peter for telling me about

this paper)

[2]

https://github.com/mblondel/ivalice/blob/master/ivalice/impl/gradient_boosting.py

Mathieu

yes - In fact my real goal is to implement RGF ultimately, though I had

considered building/forking off the current GradientBoostingRegressor

package as a starting point A) b/c I'm new to developing for scikit-learn

and B) to maintain as much consistency as possible with the rest of the

package.

Upon further review though, I don't think there's much point in adding

fully corrective updates to GBR because without the regularization (the

rest of RGF) it is probably useless and leads to over fitting, as per

http://stat.rutgers.edu/home/tzhang/software/rgf/tpami14-rgf.pdf.

So it would likely be more useful to go ahead and create RGF as an

entirely separate module. But that could take some time, of course.

Is anyone working on RGF for sklearn?

Arnaud, thanks for directing me to your sparse implementation, I will

have a look! It would certainly be excellent to have this work for all

decision tree algorithms.

Ken

On Tue, Sep 16, 2014 at 11:16 AM, Peter Prettenhofer <

considered building/forking off the current GradientBoostingRegressor

package as a starting point A) b/c I'm new to developing for scikit-learn

and B) to maintain as much consistency as possible with the rest of the

package.

Upon further review though, I don't think there's much point in adding

fully corrective updates to GBR because without the regularization (the

rest of RGF) it is probably useless and leads to over fitting, as per

http://stat.rutgers.edu/home/tzhang/software/rgf/tpami14-rgf.pdf.

So it would likely be more useful to go ahead and create RGF as an

entirely separate module. But that could take some time, of course.

Is anyone working on RGF for sklearn?

Arnaud, thanks for directing me to your sparse implementation, I will

have a look! It would certainly be excellent to have this work for all

decision tree algorithms.

Ken

On Tue, Sep 16, 2014 at 11:16 AM, Peter Prettenhofer <

The only reference I know is the Regularized Greedy Forest paper by

Johnson and Zhang [1]

I havent read the primary source (by Zhang as well).

[1] http://arxiv.org/abs/1109.0887

Peter Prettenhofer

------------------------------------------------------------------------------

Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Johnson and Zhang [1]

I havent read the primary source (by Zhang as well).

[1] http://arxiv.org/abs/1109.0887

Could you give a reference for gradient boosting with fully corrective

updates?

Since the philosophy of gradient boosting is to fit each tree against

the residuals (or negative gradient) so far, I am wondering how such fully

corrective update would work...

Mathieu

Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--updates?

Since the philosophy of gradient boosting is to fit each tree against

the residuals (or negative gradient) so far, I am wondering how such fully

corrective update would work...

Mathieu

Is anyone working on making Gradient Boosting Regressor work with

sparse matrices?

Or is anyone working on adding an option for fully corrective gradient

boosting, I.E. all trees in the ensemble are re-weighted at each iteration?

These are things I would like to see and may be able to help with if

no one is currently working on them.

------------------------------------------------------------------------------

Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------sparse matrices?

Or is anyone working on adding an option for fully corrective gradient

boosting, I.E. all trees in the ensemble are re-weighted at each iteration?

These are things I would like to see and may be able to help with if

no one is currently working on them.

------------------------------------------------------------------------------

Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Peter Prettenhofer

------------------------------------------------------------------------------

Want excitement?

Manually upgrade your production database.

When you want reliability, choose Perforce.

Perforce version control. Predictably reliable.

http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general