Project 4: Battleship

Due 11/12 at 11:59pm

[11/07/07] Updates posted to and due date.


In this project, you will design agents for a variant of the game of Battleship.  In this game, you must locate the opponent's ships on a grid using sensors, and then guess their locations.  For this project, the ships do not move.

The code for this project contains the following files, available as a zip archive.

Battleship The main code for the game of Battleship. You should familiarize yourself with the general outline of this code, which you will be interacting with for the next project as well. Tutorial script -- start here! Plug-in for the GUI interface. You can ignore this file. The graphical user interface for Battleship. You can ignore this file. Graphics utilities. You can ignore this file. Tools used in battleship. You may be familiar with some of these by now, and they will save you a lot of time.
Agents This is where all your code will go.


What to submit: You will fill in portions of (only) during the assignment, and submit it.

Evaluation: Your code will be autograded for technical correctness. Please do not change the names of any provided functions or classes within the code, or you will wreak havoc on the autograder.

Academic Dishonesty: We will be checking your code against other submissions in the class for logical redundancy. If you copy someone else's code and submit it with minor changes, we will know. These cheat detectors are quite hard to fool, so please don't try. We trust you all to submit your own work only; please don't let us down. Instead, contact the course staff if you are having trouble.

Battleship and BNs

In this cs188 version game of battleship, the goal is to locate and bomb the battleship(s) hiding on the grid.  Unlike in classic battleship, however, you aren't guessing at random.  You can obtain information about the ships' whereabouts by dropping sensors at grid positions.  These sensors return readings, which are correlated with the distance to the closest ship.  You only have one bomb per ship, and bombing, successful or not, ends the game.  There are two tasks of interest in this game.  First, we would like to compute probability distributions over ship locations given sensor readings.  This is the inference problem.  Second, we want to decide, given our current beliefs about ship locations, whether to sense or bomb, and where.  This is the decision problem.  You will build agents to solve both of these problems.

You can probably figure out the rules just by playing, but here's the official version:

At any time, you can sense or bomb.  If you wish to sense, left-click a grid location, and you will reveal the sensor reading at that location.  You cannot "re-sense" the same location.  The intuition is that time is frozen and your observation, though noisy, will not change until the world changes (which it will not do until the next checkpoint).  The observation you get from a sensor will be one of RED, ORANGE, YELLOW, or GREEN, which roughly indicates how close the closest ship is to the sensor location.  The exact sensor model is given in  Your agent loses one point per sensor revealed.

Once you are ready to bomb, click the BOMB button. It will turn red to indicate that you are in bombing mode.  All you can do now is drop as many bombs as there are ships on the board.  Once all your bombs are used up, you will see which were hits and which were misses.  All ships in any squares you bomb are hit (you can hit multiple ships with a single bomb if you're lucky).  Your agent gains 50 points for each ship that is hit.

In the next project, you will also have the ability to advance time, at which point the ships will move and the observations will be updated.

To get started with battleship, let's play a few games.  Run battleship from the command line:

  python -w -q

As before, there are many command-line options, which you can display with the -h option.  In this case, the -w flag shows the true locations of the ships, and the -q flag suppresses the display of agent beliefs (since you have no agent yet).

Left-click to sensor a square or click the bomb button to begin bombing.  Remember that once you bomb, the game will end, whether you hit the ship or not.  Try a bigger layout using (-l medium).  Now, using (-l test), a 3 by 3 layout, try sensing all 9 squares. Notice that there is no noise in the sensor readings -- that is, the information provided by RED, ORANGE, YELLOW, and GREEN is deterministic given the location of the ship.  You can set the sensor reading distribution at the command line with (-s noisy). Take a look at to see what the deterministic and noisy distributions look like.  It should be more difficult to find the ship using the noisy distribution.  Try some games on the medium layout with noisy sensors and no true locations to get an idea of what we're asking the agents to do!

Question 0 (no points)  Work through the brief tutorial (, which provides an introduction to the major battleship classes and functions you will find helpful.

Let's formalize the static battleship domain as a Bayes' net.  The full network structure is shown below (for a 2x3 layout):

The S node represents the tuple of ship positions.  You could imagine having a separate variable for each ship; this formulation is equivalent.  You can get the prior distribution over ship tuples from Game.getShipTupleDistribution().  In general, there are a lot of important Game.getX() methods, and your agents and inference modules will have objects.  Note that if there is only one ship in play, you will still get singleton tuples of locations, not just a bare location.  There is a random variable Ri,j for each position (row, col) = (i, j).  Each reading depends only on the ship position.  The conditional distribution for Ri,j given a value for S can be fetched with Game.getReadingDistributionGivenShipTuple().

Question 1 (4 points)  Try running without the -q flag, if you haven't already.  The GUI will now display the agent's posterior beliefs about the location of the ship (S) given the revealed sensors ({Ri,j=ri,j}).  However, as you haven't written anything yet, there is a placeholder which always returns the uniform distribution over locations.  Look at the StaticKeyboardAgent, which is actually playing the game: it delegates action choice to you, the human, and uses an ExactStaticInferenceModule to compute distributions over ships.  You will implement the function getShipTupleDistributionGivenObservations() in the ExactStaticInferenceModule class in  It takes a dictionary of location / reading pair and should calculate the posterior distribution over the location of a ship given those observations.  The returned object should be a dictionary which has all singleton tuples of all the grid locations as keys and posterior probabilities (that the ship is in that position) as values.  You might try printing out the observations as you play to see what they look like.  Test your agent by playing with noiseless sensors (-s deterministic, which is the default). You also might use -w to show the true ship position for debugging.  Once your code seems to be working, try noisy sensor distributions (-s noisy).  Note that thanks to your results from the homework, you can ignore all unobserved reading variables when you compute the posterior over S. That is, you can treat the network on the left as if it were the one on the right:


Question 2 (3 points) You can put multiple ships on the board with the -k option.  Try a game with -k 2 -q.  You now need to make sure your inference works correctly for the case of multiple ships; it may already be correct, depending on how you coded your answer to question 1 (you will only turn in this more general version).  Whereas before S was essentially a ship position, it is now a ship tuple, and so we are calculating posteriors over ship tuples.  For example, if we have 2 ships, we calculate the posterior probability of (ship1, ship2) being ((0,0), (0,0)), ((0,0), (0,1)), and so on.  Because the ships are interchangeable, you will never know which exact ship is in which location, but you may know which k locations are occupied.  When there are multiple ships, the GUI will display the expected number of ships at each location.  We have provided a function in the mostly abstract StaticBattleshipAgent class called getExpectedShipCounts() which extracts expected counts from your posterior over S; make sure you understand this method.  In the single-ship case, the expected ships counts are just the beliefs we compute in getShipTupleDistributionGivenObservations().  In the k-ship case, however, the expected number of ships across all locations will be k.  For example, you may have a posterior that ((0,0), (0,1)) is 0.5 and ((0,1), (0,0)) is 0.5, at which point both of (0,0) and (0,1) will have expected counts of 1.0.  Try your code with 2 and 3 ships. Make sure you understand why inference slows down as k increases.

You will now write a new agent, which makes decisions using your inference code.  The simplest decision to make is where to bomb given current beliefs over ship locations S.  This computation is basically an expectimax computation, shown diagrammatically below.  At the root max node, you can select any bombing option (tuple of k squares), and the expected utility (score) is the max over all actions' expected utilities.  Each action will produce an expected utility which is the expected number of ships in those squares times the score per ship, BATTLESHIP_SCORE.

However, you need not bomb right away.  Instead, you might sense first, revealing a single new reading, and then bomb.  In this case, you would have various sensing actions available to you.  Each sensing action leads to a chance node with a distribution P(Ri,j | {r}), which describes the agent's beliefs about what reading will emerge at that location, given the previous readings.  For each such new reading, we will have an optimal bombing option and the new accompanying expected utility.  You agent should usually take the sensing action which reveals the reading which has the largest value of information, that is, expected gain in maximum expected utility.  However, once the value of the information is less that 1 point, it should stop sensing and bomb.  In deciding its actions, your agent does not evaluate the futures in which it senses multiple times (though it may indeed end up sensing multiple times).  This decision process is therefore not optimal; the optimal agent would also explicitly consider the possibility of sensing more than once before bombing.  In other words, you have an agent with look-ahead depth 2 for its action selection.

The search tree for this process is represented in the following diagram:

Question 3 (5 points) Build an agent which not only computes posterior beliefs, but also makes decisions, using the at-most-one-sensor look-ahead described above.  You will need to fill in code in StaticVPIAgent.getAction() and ExactStaticInferenceModule.getReadingDistributionGivenObservations().  You may wish to think first about the one-ship case before generalizing to the multi-ship case, but the code should be the same either way.  Test your VPI agent with (-p vpi), first using deterministic sensors (-s deterministic) and then with noisy sensors (-s noisy).  It should work very well in the deterministic case, but might sometimes make obviously non-optimal decisions in the noisy case, because of the limited look-ahead.