Create a template for advanced optimizers and implement Adam with easy customization

class Parameter[source]

Parameter(data=None, requires_grad=True)

Defines a base class for all parameters that need to be learned by the model

Steppers

sgd_step[source]

sgd_step(p, lr, **kwargs)

Perofms a basic sgd step for the optimizer

l2_reg_step[source]

l2_reg_step(p, wd, **kwargs)

Adds weight decay regularization to the gradients

update_defaults[source]

update_defaults(defaults, passed)

A way to update the default hyperparameters of the network

Optimizer v2

Class for optimizer with param groups and default hyperparameters

class NewOptimizer():
    "Optimizer to handle different step functions and hyperparameters"
    def __init__(self, params, step_fcns=[sgd_step], **defaults):
        _defaults = {'lr':0.1, 'wd':1e-4}
        defaults = update_defaults(_defaults, defaults)
        
        #Make params a list of lists
        self.params = list(params)
        if not isinstance(self.params[0],list): self.params = [self.params]
            
        self.hypes = [{**defaults} for p in self.params]
        self.step_fcns = step_fcns

    def step(self):
        for pg,hype in zip(self.params,self.hypes): 
            for p in pg:
                for step in self.step_fcns:
                    step(p, **hype)

    def zero_grad(self):
        for pg in self.params: 
            for p in pg:
                p.zero_grad()

class Learner[source]

Learner(model, loss_func, optimizer, db, lr=0.5)

m,_,lf = get_linear_model(0.1)
o = NewOptimizer
db = get_mnist_databunch()
learn = Learner(m,lf,o,db, step_fcns=[l2_reg_step, sgd_step])
run = Runner(learn,[Stats([accuracy]), ProgressCallback()])
run.fit(1, 0.5)
epoch train_loss train_accuracy valid_loss valid_accuracy time
0 0.287021 0.914040 0.194590 0.941180 00:01

Momentum

update_default_states[source]

update_default_states(stats, state, init)

A way to populate a dictionary with default states

class StatedOptimizer[source]

StatedOptimizer(params, step_fcns=[<function sgd_step at 0x112064ae8>], stats=[], **defaults)

Optimizer with ability to keep and update various parameter states

class OptimStat[source]

OptimStat()

Base class for stats to be kept track of in the optimizer

class AverageGrad[source]

AverageGrad(dampening:bool=False) :: OptimStat

Keeps track of the exponentially weighted moving average of the gradients

momentum_step[source]

momentum_step(p, lr, grad_avg, **kwargs)

Does a step of the learning rate based on the exponential moving average of the gradient

m,_,lf = get_linear_model(0.1)
o = StatedOptimizer
db = get_mnist_databunch()
learn = Learner(m,lf,o,db, step_fcns=[momentum_step, l2_reg_step], stats=[AverageGrad()])
run = Runner(learn,[Stats([accuracy]), ProgressCallback()])
run.fit(1,0.001)
epoch train_loss train_accuracy valid_loss valid_accuracy time
0 0.306740 0.906840 0.172461 0.947620 00:01

Adding Dampening

class OptimCounter[source]

OptimCounter() :: OptimStat

Keeps track of how many optimizer steps were taken

class AverageGrad[source]

AverageGrad(dampening:bool=False) :: OptimStat

Keeps track of the exponentially weighted moving average of the gradients

class AverageSquaredGrad[source]

AverageSquaredGrad(dampening:bool=True) :: OptimStat

Keeps track of the exponentially weighted moving average of the gradients

debias[source]

debias(mom, damp, step)

Debiases the terms if momentum is or isn't used

adam_step[source]

adam_step(p, lr, mom, damp_mom, steps_taken, sqr_mom, sqr_damp_mom, grad_avg, sqr_grad_avg, eps=1e-05, **kwargs)

Performs an Adam step of the optimizer

adam_opt[source]

adam_opt(beta1=0.9, beta2=0.99)

Returns an adam optimizer with momentum parameters beta1 and beta2

m,lf = get_conv_model(), CrossEntropy()
o = adam_opt()
db = get_mnist_databunch()
learn = Learner(m,lf,o,db)
run = Runner(learn,[Stats([accuracy]), ProgressCallback(), HyperRecorder(['lr'])])
run.fit(1, 0.001)
epoch train_loss train_accuracy valid_loss valid_accuracy time
0 0.397584 0.884680 0.301798 0.915960 00:50
run.cbs[3].plot_loss()