Optimizer using Evolution Strategies¶

EvolutionStrategiesOptimizer¶

class l2l.optimizers.evolutionstrategies.optimizer.EvolutionStrategiesOptimizer(traj, optimizee_create_individual, optimizee_fitness_weights, parameters, optimizee_bounding_func=None)[source]¶

Bases: l2l.optimizers.optimizer.Optimizer

Class Implementing the evolution strategies optimizer

as in: Salimans, T., Ho, J., Chen, X. & Sutskever, I. Evolution Strategies as a Scalable Alternative to: Reinforcement Learning. arXiv:1703.03864 [cs, stat] (2017).

In the pseudo code the algorithm does:

For n iterations do:

Perturb the current individual by adding a value with 0 mean and noise_std standard deviation
If mirrored sampling is enabled, also perturb the current individual by subtracting the same values that were added in the previous step
evaluate individuals and get fitness
Update the fitness as

theta_{t+1} <- theta_t + alpha * sum{F_i * e_i} / (n * sigma^2)

where F_i is the fitness and e_i is the perturbation
If fitness shaping is enabled, F_i is replaced with the utility u_i in the previous step, which is calculated as:

u_i = max(0, log(n/2 + 1) - log(k)) / sum_{k=1}^{n}{max(0, log(n/2 + 1) - log(k))} - 1 / n

As in the paper: Wierstra, D. et al. Natural Evolution Strategies. Journal of Machine Learning Research 15,

949–980 (2014).

where k and i are the indices of the individuals in descending order of fitness F_i

NOTE: This is not the most efficient implementation in terms of communication, since the new parameters are communicated to the individuals rather than the seed as in the paper. NOTE: Doesn’t yet contain fitness shaping and mirrored sampling

Parameters:

traj (Trajectory) – Use this trajectory to store the parameters of the specific runs. The parameters should be initialized based on the values in parameters
optimizee_create_individual – Function that creates a new individual. All parameters of the Individual-Dict returned should be of numpy.float64 type
optimizee_fitness_weights – Fitness weights. The fitness returned by the Optimizee is multiplied by these values (one for each element of the fitness vector)
parameters – Instance of namedtuple() CrossEntropyParameters containing the parameters needed by the Optimizer

post_process(traj, fitnesses_results)[source]¶: See post_process()

end(traj)[source]¶: See end()

EvolutionStrategiesParameters¶

class l2l.optimizers.evolutionstrategies.optimizer.EvolutionStrategiesParameters¶

Bases: tuple

Parameters:

learning_rate – Learning rate
noise_std – Standard deviation of the step size (The step has 0 mean)
mirrored_sampling_enabled – Should we turn on mirrored sampling i.e. sampling both e and -e
fitness_shaping_enabled – Should we turn on fitness shaping i.e. using only top fitness_shaping_ratio to update current individual?
pop_size – Number of individuals per simulation.
n_iteration – Number of iterations to perform
stop_criterion – (Optional) Stop if this fitness is reached.
seed – The random seed used for generating new individuals

fitness_shaping_enabled¶

learning_rate¶

mirrored_sampling_enabled¶

n_iteration¶

noise_std¶

pop_size¶

seed¶

stop_criterion¶

Table of Contents

Previous topic

Next topic

This Page

Optimizer using Evolution Strategies¶

EvolutionStrategiesOptimizer¶

EvolutionStrategiesParameters¶