2017-10-05

## Solving problem by searching

A problem can be defined by 5 components:

1. initial state, In(Arad)
2. action, Go(Sibiu)
3. transition model, RESULT(In(A), Go(B)) = In(B)
• a function that returns the state that results from doing action a in state s
• successor: any state reachable from a given state by a single action
• state space: sets of all states reachable from the initial state by any sequence of actions
• path in state space: a sequence of states connected by a sequence of actions
4. goal test: determine whether a given state is a goal state.
5. path cost: assigns a numeric cost to each path
• step cost: c(s, a, s'), taking action a in state s to reach state s'
• solution: a sequence of actions leading from initial state to goal state

### Searching for solutions

function TREE-SEARCH(problem) returns a solution, or failure

initialize the frontier using the initial state of problem
loop do

if the frontier is empty then return failure
choose a leaf node and remove it from the frontier
if the node contains a goal state then return the corresponding solution
expand the chosen node, adding the resulting nodes to the frontier

function GRAPH-SEARCH(problem) returns a solution, or failure

initialize the frontier using the initial state of problem

initialize the explored set to be empty

loop do

if the frontier is empty then return failure

choose a leaf node and remove it from the frontier

if the node contains a goal state then return the corresponding solution

add the node to the explored set

expand the chosen node, adding the resulting nodes to the frontier only if not in the frontier or explored set

#### Data structure

Node:

• state
• parent
• action
• path-cost

Frontier, use priority queue.

Explored set: hash table

#### Measuring problem-solving performance

1. Completeness: Is algorithm guaranteed to find a solution when there is one?
2. Optimality: Does the strategy find the optimal solution?
• optimal solution has the lowest path cost among all solutions
3. Time complexity: How long does it take to find a solution?
4. Space complexity: How much memory is needed to perform the search?
• branching factor: b, maximum number of successors of any node
• depth: d, depth of the shallowest goal node
• length: m, maximum length of any path in the state space

Total cost combines

• search cost: time complexity
• path cost: memory usage

The UNINFORMED term means that the strategies have no additional information about states beyond that provided in the problem definition. All they can do is generate successors and distinguish a goal state from a non-goal state.

FIFO queue

• complete
• the shallowest goal node is not necessarily the optimal one; technically, breadth-first search is optimal if the path cost is a nondecreasing function of the depth of the node.
• time: $O(b^d)$
• space: $O(b^d)$ in total; explored set $O(b^{d-1})$; frontier $O(b^d)$
• memory is a bigger problem than time

Use priority queue for frontier, guided by path cost instead of depth

function UNIFORM-COST-SEARCH(problem) returns a solution, or failure

node $\leftarrow$ a node with STATE = problem.INITIAL-STATE, PATH-COST = 0

frontier $\leftarrow$ a priority queue ordered by PATH-COST, with node as the only element

explored $\leftarrow$ an empty set

loop do

if EMPTY?( frontier) then return failure

node $\leftarrow$ POP( frontier ) /* chooses the lowest-cost node in frontier */

if problem.GOAL-TEST(node.STATE) then return SOLUTION(node)

for each action in problem.ACTIONS(node.STATE) do

child ←CHILD-NODE(problem, node, action)

if child .STATE is not in explored or frontier then

frontier $\leftarrow$ INSERT(child , frontier )

else if child.STATE is in frontier with higher PATH-COST then

replace that frontier node with child

Difference with BFS

• goal test while expanding this node to get optimal instead of suboptimal; BFS does it when the node is generated
• a test is added in case a better path is found to a node currently on the frontier
1. optimal, uniform-cost search expands nodes in order of their optimal path cost
2. Completeness is guaranteed provided the cost of every step exceeds some small positive constant $\epsilon$
3. Let $C^{*}$ be the cost of the optimal solution and assume that every action costs at least $\epsilon$. Worst case time and space complexity: $O(b^{1+[C^{*} / \epsilon]})$. It can be much greater than $b^d$

#### DFS

LIFO queue.

1. complete for graph-search in finite state spaces; not complete for tree-search
2. non-optimal
3. time: $O(b^m)$
4. space: no advantage if graph-search; for tree search, $O(bm)$

A variant version is backtracking search. Space is $O(m)$.

Choose a predetermined depth limit $l$. It solves the infinite-path problem.

1. incompleteness, if $% $
2. nonoptimal if $% $
3. time is $O(b^l)$
4. space is $O(bl)$

function DEPTH-LIMITED-SEARCH(problem, limit ) returns a solution, or failure/cutoff

return RECURSIVE-DLS(MAKE-NODE(problem.INITIAL-STATE), problem, limit )

function RECURSIVE-DLS(node, problem, limit ) returns a solution, or failure/cutoff

if problem.GOAL-TEST(node.STATE) then return SOLUTION(node)

else if limit = 0 then return cutoff

else

cutoff occurred? $\leftarrow$ false

for each action in problem.ACTIONS(node.STATE) do

child ←CHILD-NODE(problem, node, action)

result ←RECURSIVE-DLS(child , problem, limit − 1)

if result = cutoff then cutoff occurred? $\leftarrow$ true

else if result != failure then return result

if cutoff occurred? then return cutoff else return failure

#### Iterative deepening DFS

Find the best depth limit by gradually increasing the limit, 0, 1, 2… until a goal is found.
In general, iterative deepening is the preferred uninformed search method when the search space is large and the depth of the solution is not known.

1. complete when branching factor is finite
2. optimal when the path cost is a nondecreasing function of the depth of the node
3. time: $O(b^d)$
4. space: $O(bd)$

function ITERATIVE-DEEPENING-SEARCH(problem) returns a solution, or failure

for depth = 0 to $\infty$ do

result $\leftarrow$ DEPTH-LIMITED-SEARCH(problem, depth)

if result != cutoff then return result

iterative lengthening search: use increasing path-cost limits instead of increasing depth limits

#### Comparison

Criterion BFS Uniform-Cost DFS Depth-limited Iterative Deepening Bidirectional
Complete? Yes Yes No No Yes Yes
Optimal? Yes Yes No No Yes Yes
Time $O(b^d)$ $O(b^{1+[C^{*} / \epsilon]})$ $O(b^m)$ $O(b^l)$ $O(b^d)$ $O(b^{d/2})$
Space $O(b^d)$ $O(b^{1+[C^{*} / \epsilon]})$ $O(bm)$ $O(bl)$ $O(bd)$ $O(b^{d/2})$

The generic best-first search algorithm selects a node for expansion according to an evaluation function $f(n)$. The evaluation function is constructed as a cost estimate, so the node with the lowest evaluation is expanded first.

Greedy best-first search expands nodes with minimal $h(n)$(estimated cost of the cheapest path from the state at node n to a goal state). $f(n) = h(n)$. It is not optimal but is often efficient.

A∗ search expands nodes with minimal $f(n) = g(n) + h(n)$. $f(n)$ is estimated cost of the cheapest solution through n. A∗ is complete and optimal, provided that h(n) is admissible (for TREE-SEARCH) or consistent (for GRAPH-SEARCH). The space complexity of A∗ is still prohibitive.

If $C^∗$ is the cost of the optimal solution path, then we can say the following:

1. A∗ expands all nodes with $% $.
2. A∗ might then expand some of the nodes right on the “goal contour” (where $f(n) = C^∗$) before selecting a goal node.
• Completeness requires that there be only finitely many nodes with cost less than or equal to $C^∗$.
• A condition that is true if all step costs exceed some finite $\epsilon$ and if b is finite.
3. Notice that A∗ expands no nodes with $f(n) > C^*$

A* is optimally efficient for any given consistent heuristic. No other optimal algorithm is guaranteed to expand fewer nodes than A∗. This is because any algorithm that does not expand all nodes with $% $ runs the risk of missing the optimal solution.

1. The first condition we require for optimality is that h(n) be an admissible heuristic. An admissible heuristic is one that never overestimates the cost to reach the goal. This is for tree-search.
2. A second, slightly stronger condition called consistency (or sometimes monotonicity) is required only for applications of A* to graph search. Estimated cost of a node n to goal is no greater than getting to successor of n then getting to goal. This is for graph-search.

Properties:

1. time: $O(b^{\epsilon d})$
• relative error $\epsilon = (h^* - h) / h^*$
2. space is main drawback, storing all nodes.

#### Search better

metalevel state space: Each state in a metalevel state space captures the internal (computational) state of a program that is searching in an object-level state space. For example, the internal state of the A∗ algorithm consists of the current search tree.

For harder problems, there will be many missteps. A metalevel learning algorithm can learn from these experiences to avoid exploring unpromising subtrees. The goal of learning is to minimize the total cost of problem solving, trading off computational expense and path cost.

### Heuristic functions

It is generally better to use a heuristic function with higher values, provided it is consistent and that the computation time for the heuristic is not too long.

#### relaxed problem

A problem with fewer restrictions on the actions is called a relaxed problem. The state-space graph of the relaxed problem is a supergraph of the original state space because the removal of restrictions creates added edges in the graph. Because the relaxed problem adds edges to the state space, any optimal solution in the original problem is, by definition, also a solution in the relaxed problem; but the relaxed problem may have better solutions if the added edges provide short cuts.The cost of an optimal solution to a relaxed problem is an admissible heuristic for the original problem.

Original: A is horizontally or vertically adjacent to B and B is blank

Relaxed: A tile can move from square A to square B if A is adjacent to B. Or A tile can move from square A to square B if B is blank.

#### subproblem, pattern databases

Clearly, the cost of the optimal solution of this subproblem is a lower bound on the cost of the complete problem.

The idea behind pattern databases is to store these exact solution costs for every possible
subproblem instance. Then we compute an admissible heuristic $h_{DB}$ for each complete state encountered during a search simply by looking up the corresponding subproblem configuration in the database. (h(12345678) = h(1234) + h(5678)). For h(1234), only count moves related to 1234.

#### experience

If the path to the goal does not matter, we might consider a different class of algorithms,
ones that do not worry about paths at all. Local search algorithms operate using a single current node (rather than multiple paths) and generally move only to neighbors of that node.

In addition to finding goals, local search algorithms are useful for solving pure optimization problems, in which the aim is to find the best state according to an objective function.

state-space landscape: A landscape has both “location” (defined by the state) and “elevation” (defined by the value of the heuristic cost function or objective function). If elevation corresponds to cost, then the aim is to find the lowest valley—a global minimum. A complete local search algorithm always finds a goal if one exists; an optimal algorithm always finds a global minimum/maximum

function HILL-CLIMBING(problem) returns a state that is a local maximum

current $\leftarrow$ MAKE-NODE(problem.INITIAL-STATE)

loop do

neighbor $\leftarrow$ a highest-valued successor of current

if neighbor.VALUE $\le$ current.VALUE then return current.STATE

current $\leftarrow$ neighbor

Hill climbing is sometimes called greedy local search because it grabs a good neighbor state without thinking ahead about where to go next.

hill climbing often gets stuck for the following reasons:

1. Local maxima: a local maximum is a peak that is higher than each of its neighboring states but lower than the global maximum.
2. Ridges: Ridges result in a sequence of local maxima that is very difficult for greedy algorithms to navigate.
3. Plateaux: a plateau is a flat area of the state-space landscape. It can be a flat local maximum, from which no uphill exit exists, or a shoulder, from which progress is possible. A hill-climbing search might get lost on the plateau.

#### Variants

1. Stochastic hill climbing chooses at random from among the uphill moves; the probability of selection can vary with the steepness of the uphill move.
2. First-choice hill climbing implements stochastic hill climbing by generating successors randomly until one is generated that is better than the current state.
3. Random-restart hill climbing conducts a series of hill-climbing searches from randomly generated initial states, until a goal is found.

NP-hard problems typically have an exponential number of local maxima to get stuck on. Despite this, a reasonably good local maximum can often be found after a small number of restarts.

### Simulated annealing

Instead of picking the best move, however, it picks a random move. If the move improves the situation, it is always accepted. Otherwise, the algorithm accepts the move with some probability less than 1.

The probability decreases exponentially with the “badness” of the move—the amount ΔE by which the evaluation is worsened.

The probability also decreases as the “temperature” T goes down: “bad” moves are more likely to be allowed at the start when T is high, and they become more unlikely as T decreases. If the schedule lowers T slowly enough, the algorithm will find a global optimum with probability approaching 1.

SimulatedAnnealing(init,successors(),score(),max_iterations,schedule())

curr $\leftarrow$ init

for i=1..max_iterations:

T = schedule(i)

next $\leftarrow$ random_choice(successors(curr))

$\Delta E$ = score(next)-score(curr)

if $\Delta E > 0$ then curr $\leftarrow$ next

else:

$p = e^{\Delta E/T}$ // since $% $, $0 \le p \le 1$

if $% $ then curr $\leftarrow$ next // accept with probability $e^{\Delta E/T}$

The local beam search algorithm keeps track of k states rather than just one. It begins with k randomly generated states. At each step, all the successors of all k states are generated. If any one is a goal, the algorithm halts. Otherwise, it selects the k best successors from the complete list and repeats.

In a random-restart search, each search process runs independently of the others. In a local beam search, useful information is passed among the parallel search threads. In effect, the states that generate the best successors say to the others. The algorithm quickly abandons unfruitful searches and moves its resources to where the most progress is being made.

In its simplest form, local beam search can suffer from a lack of diversity among the k states—they can quickly become concentrated in a small region of the state space, making the search little more than an expensive version of hill climbing.

A variant called stochastic beam search, analogous to stochastic hill climbing, helps alleviate this problem. Instead of choosing the best k from the the pool of candidate successors, stochastic beam search chooses k successors at random, with the probability of choosing a given successor being an increasing function of its value.

function SIMULATED-ANNEALING(problem, schedule) returns a solution state

inputs: problem, a problem

schedule, a mapping from time to temperature

current ←MAKE-NODE(problem.INITIAL-STATE)

for t = 1 to $\infty$ do

T ←schedule(t )

if T = 0 then return current

next←a randomly selected successor of current

$\Delta E$ ←next.VALUE – current.VALUE

if $\Delta E$ > 0 then current ←next

else current $\leftarrow$ next only with probability $e^{\Delta E/T}$

### Games

Def:

1. $S_0$: The initial state, how the game is set up at the start
2. PLAYER(s): Defines which player has the move in a state.
3. ACTIONS(s): Returns the set of legal moves in a state.
4. RESULT(s, a): The transition model, which defines the result of a move.
5. TERMINAL-TEST(s): A terminal test, which is true when the game is over and false otherwise. States where the game has ended are called terminal states.
6. UTILITY(s, p): A utility function defines the final numeric value for a game that ends in terminal state s for a player p. objective function or payoff function
• A zero-sum game is (confusingly) defined as one where the total payoff to all players is the same for every instance of the game.

### Optimal decisions in games

Given a game tree, the optimal strategy can be determined from the minimax value
of each node, which we write as $MINIMAX(n)$.

$MINIMAX(s) = \left\{ \begin{array}{cc} UTILITY(s) & \text{if TERMINAL-TEST(s)} \\ max_{a \in Actions(s)} MINIMAX(RESULT(s, a)) & \text{if PLAYER(s) = MAX} \\ min_{a \in Actions(s)} MINIMAX(RESULT(s, a)) & \text{if PLAYER(s) = MIN}\end{array} \right.$

#### The minimax algorithm

The minimax algorithm computes the minimax decision from the current state. It uses a simple recursive computation of the minimax values of each successor state, directly implementing the defining equations. The recursion proceeds all the way down to the leaves of the tree, and then the minimax values are backed up through the tree as the recursion unwinds.

the maximum depth of the tree is m

there are b legal moves at each point

• time complexity of the minimax algorithm is $O(b^m)$
• space complexity is O(bm)

function MINIMAX-DECISION(state) returns an action

return $argmax_{a \in ACTIONS(s)} MIN-VALUE(RESULT(state, a))$

function MAX-VALUE(state) returns a utility value

if TERMINAL-TEST(state) then return UTILITY(state)

for each a in ACTIONS(state) do

v $\leftarrow$ MAX(v, MIN-VALUE(RESULT(s, a)))

return v

function MIN-VALUE(state) returns a utility value

if TERMINAL-TEST(state) then return UTILITY(state)

for each a in ACTIONS(state) do

v $\leftarrow$ MIN(v, MAX-VALUE(RESULT(s, a)))

return v

If with multiplayer or not zero-sum game, collaboration can occur.

### Alpha-beta pruning

alpha: increase from $- \infty$, best option for max

beta: decrease from $\infty$, best option for min

function ALPHA-BETA-SEARCH(state) returns an action

return the action in ACTIONS(state) with value v

function $MAX-VALUE(state,\alpha, \beta)$ returns a utility value

if TERMINAL-TEST(state) then return UTILITY(state)

for each a in ACTIONS(state) do

if $v \ge \beta$ then return v

return v

function $MIN-VALUE(state,\alpha, \beta)$ returns a utility value

if TERMINAL-TEST(state) then return UTILITY(state)

for each a in ACTIONS(state) do

if $v \le \alpha$ then return v

return v

#### Move ordering

If this can be done, then it turns out that alpha–beta needs to examine only $O(b^{m/2})$
nodes to pick the best move, instead of $O(b^m)$ for minimax. Adding dynamic move-ordering schemes, such as trying first the moves that were found to be best in the past, brings us quite close to the theoretical limit.

$EXPECTMINIMAX(s) = \left\{ \begin{array}{cc} UTILITY(s) & \text{if TERMINAL-TEST(s)} \\ max_{a} EXPECTMINIMAX(RESULT(s, a)) & \text{if PLAYER(s) = MAX} \\ min_{a} EXPECTMINIMAX(RESULT(s, a)) & \text{if PLAYER(s) = MIN} \\ \sum_r P(r)EXPECTMINIMAX(RESULT(s, a)) & \text{if PLAYER(s) = CHANCE} \end{array} \right.$

Usually, it is not feasible to consider the whole game tree (even with alpha–beta), so we need to cut the search off at some point and apply a heuristic evaluation function that estimates the utility of a state.

Many game programs precompute tables of best moves in the opening and endgame so that they can look up a move rather than search.

Games of chance can be handled by an extension to the minimax algorithm that evaluates a chance node by taking the average utility of all its children, weighted by the probability of each child.

Optimal play in games of imperfect information, such as Kriegspiel and bridge, requires reasoning about the current and future belief states of each player. A simple approximation can be obtained by averaging the value of an action over each possible configuration of missing information.

## Constraint satisfaction problems

A constraint satisfaction problem consists of three components, X, D and C:

• X is a set of variables, $X_1,..., X_n$
• D is a set of domains, $D_1,..., D_n$, one for each variable.
• Each domain $D_i$ consists of a set of allowable values, ${v_1,..., v_k}$ for variable $X_i$
• C is a set of constraints that specify allowable combinations of values.
• Each constraint $C_i$ consists of a pair <scope, rel>, where scope is a tuple of variables that participate in the constraint and rel is a relation that defines the values that those variables can take on. For example: <(X1, X2), X1 NE X2> or <(X1, X2), [(A, B), (B, A)]>

Solution:

• Each state in a CSP is defined by an assignment of values to some or all of the variables.
• An assignment that does not violate any constraints is called a consistent or legal assignment.
• A complete assignment is one in which every variable is assigned
• A solution to a CSP is a consistent, complete assignment
• A partial assignment is one that assigns values to only some of the variables.

It can be shown that no algorithm exists for solving general nonlinear constraints on integer variables.

The best-known category of continuous-domain CSPs is that of linear programming problems, where constraints must be linear equalities or inequalities. Linear programming problems can be solved in time polynomial in the number of variables.

### Constraint propagation

using the constraints to reduce the number of legal values for a variable, which in turn can reduce the legal values for another variable, and so on. The key idea is local consistency.

#### Node consistency

A single variable (corresponding to a node in the CSP network) is node-consistent if all the values in the variable’s domain satisfy the variable’s unary constraints. It is always possible to eliminate all the unary constraints in a CSP by running node consistency. It is also possible to transform all n-ary constraints into binary ones.

#### Arc consistency

A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s binary constraints.

More formally, $X_i$ is arc-consistent with respect to another variable $X_j$ if for every value in the current domain $D_i$ there is some value in the domain $D_j$ that satisfies the binary constraint on the arc $(X_i,X_j)$.

#### AC-3

function AC-3(csp) returns false if an inconsistency is found and true otherwise

inputs: csp, a binary CSP with components (X, D, C)

local variables: queue, a queue of arcs, initially all the arcs in csp

while queue is not empty do

(Xi, Xj) $\leftarrow$ REMOVE-FIRST(queue)

if REVISE(csp, Xi, Xj ) then

if size of Di = 0then return false

for each Xk in Xi.NEIGHBORS - {Xj} do

return true

function REVISE(csp, Xi, Xj ) returns true iff we revise the domain of Xi

revised $\leftarrow$ false

for each x in Di do

if no value y in Dj allows (x ,y) to satisfy the constraint between Xi and Xj then

delete x from Di

revised $\leftarrow$ true

return revised

Assume a CSP with n variables, each with domain size at most d, and with c binary constraints (arcs).

Time: $O(cd^3)$

#### Path consistency

PC-2 algorithm

A two-variable set {Xi,Xj} is path-consistent with respect to a third variable Xm if, for every assignment {Xi = a,Xj = b} consistent with the constraints on {Xi,Xj}, there is an assignment to Xm that satisfies the constraints on {Xi,Xm} and {Xm,Xj}.

A CSP is k-consistent if, for any set of k − 1 variables and for any consistent assignment to those variables, a consistent value can always be assigned to any kth variable.

A CSP is strongly k-consistent if it is k-consistent and is also (k − 1)-consistent, (k − 2)-consistent,…, all the way down to 1-consistent. Time is $O(n^2d)$. Any algorithm for establishing n-consistency must take time exponential in n in the worst case. Worse, n-consistency also requires space that is exponential in n.

#### Global constraints

A global constraint is one involving an arbitrary number of variables (but not necessarily all variables).

• Resource constraint
• Bounds propagation
• domains are represented by upper and lower bounds
• Bounds consistent
• MRV, minumum remaining values heuristic. choosing the variable with the fewest “legal” values.
• LCV, least constraining value heuristic. For a variable, choose the value that rules out the fewest choices for the neighboring variables in the constraint graph, trying to leave the maximum flexibility for subsequent variable assignments.
• forward checking, same as arc consistency. Whenever a variable X is assigned, the forward-checking process establishes arc consistency for it: for each unassigned variable Y that is connected to X by a constraint, delete from Y ‘s domain any value that is inconsistent with the value chosen for X.
• backjumping method backtracks to the most recent assignment in the conflict set. to backtrack to a variable that might fix the problem—a variable that was responsible for making one of the possible values of new variable impossible.
• conflict-directed backjumping
• no-good: a minimum set of variables from the conflict set that causes the problem; record these values
• constraint learning

### Local search for CSPs

The point of local search is to eliminate the violated constraints. In choosing a new value for a variable, the most obvious heuristic is to select the value that results in the minimum number of conflicts with other variables—the min-conflicts heuristic. Min-conflicts is surprisingly effective for many CSPs.

function MIN-CONFLICTS(csp,max steps) returns a solution or failure

inputs: csp, a constraint satisfaction problem

max steps, the number of steps allowed before giving up

current $\leftarrow$ an initial complete assignment for csp

for i = 1 to max steps do

if current is a solution for csp then return current

var $\leftarrow$ a randomly chosen conflicted variable from csp.VARIABLES

value $\leftarrow$ the value v for var that minimizes CONFLICTS(var, v, current , csp)

set var =value in current

return failure

• Plateau search—allowing sideways moves to another state with the same score, can help local search find its way off this plateau.
• The landscape of a CSP under the min conflicts heuristic usually has a series of plateaux.
• constraint weighting, can help concentrate the search on the important constraints.
• Each constraint is given a numeric weight
• At each step of the search, the algorithm chooses a variable/value pair to change that will result in the lowest total weight of all violated constraints. The weights are then adjusted by incrementing the weight of each constraint that is violated by the current assignment.
• It adds topography to plateaux, making sure that it is possible to improve from the current state
• It also, over time, adds weight to the constraints that are proving difficult to solve.

Another advantage of local search is that it can be used in an online setting when the problem changes, which is important in scheduling problems. It can repair with a minimum number of changes. A backtracking search with the new set of constraints usually requires much more time and might find a solution with many changes from the current schedule.