macop.policies.reinforcement¶
Reinforcement learning policy classes implementations for Operator Selection Strategy
Classes
|
Upper Confidence Bound (UCB) policy class which is used for applying UCB strategy when selecting and applying operator |
-
class
macop.policies.reinforcement.
UCBPolicy
(operators, C=100.0, exp_rate=0.5)[source]¶ Upper Confidence Bound (UCB) policy class which is used for applying UCB strategy when selecting and applying operator
Rather than performing exploration by simply selecting an arbitrary action, chosen with a probability that remains constant, the UCB algorithm changes its exploration-exploitation balance as it gathers more knowledge of the environment. It moves from being primarily focused on exploration, when actions that have been tried the least are preferred, to instead concentrate on exploitation, selecting the action with the highest estimated reward.
-
operators
¶ {[Operator]} – list of selected operators for the algorithm
-
C
¶ {float} – The second half of the UCB equation adds exploration, with the degree of exploration being controlled by the hyper-parameter C.
-
exp_rate
¶ {float} – exploration rate (probability to choose randomly next operator)
-
rewards
¶ {[float]} – list of summed rewards obtained for each operator
-
occurrences
¶ {[int]} – number of use (selected) of each operator
Example:
>>> # operators import >>> from macop.operators.discrete.crossovers import SimpleCrossover >>> from macop.operators.discrete.mutators import SimpleMutation >>> # policy import >>> from macop.policies.reinforcement import UCBPolicy >>> # solution and algorithm >>> from macop.solutions.discrete import BinarySolution >>> from macop.algorithms.mono import IteratedLocalSearch >>> # evaluator import >>> from macop.evaluators.knapsacks import KnapsackEvaluator >>> # evaluator initialization (worths objects passed into data) >>> worths = [ random.randint(0, 20) for i in range(20) ] >>> evaluator = KnapsackEvaluator(data={'worths': worths}) >>> # validator specification (based on weights of each objects) >>> weights = [ random.randint(5, 30) for i in range(20) ] >>> validator = lambda solution: True if sum([weights[i] for i, value in enumerate(solution._data) if value == 1]) < 200 else False >>> # initializer function with lambda function >>> initializer = lambda x=20: BinarySolution.random(x, validator) >>> # operators list with crossover and mutation >>> operators = [SimpleCrossover(), SimpleMutation()] >>> policy = UCBPolicy(operators) >>> algo = IteratedLocalSearch(initializer, evaluator, operators, policy, validator, maximise=True, verbose=False) >>> policy._occurences [0, 0] >>> solution = algo.run(100) >>> type(solution).__name__ 'BinarySolution' >>> policy._occurences # one more due to first evaluation [51, 53]
-
apply
(solution)[source]¶ Apply specific operator chosen to create new solution, computes its fitness and returns solution
fitness improvment is saved as rewards
selected operator occurence is also increased
- Parameters
solution – {Solution} – the solution to use for generating new solution
- Returns
{Solution} – new generated solution
-