Project 4 - Distributed Key-Value Store

Overview

In project 4, you will implement a distributed key-value store that runs across multiple nodes.

Multiple clients will be communicating with a single master server in a given messaging format (KVMessage) using a client library (KVClient). The master contains a set-associative cache (KVCache), and it uses the cache to serve GET requests without going to the key-value (slave) servers it coordinates. The slave servers are contacted for a GET only upon a cache miss on the master. The master will use the TPCMaster library to forward client requests for PUT and DEL to multiple slave servers and follow the TPC protocol for atomic PUT and DEL operations across multiple slave servers. KVServers remain the same as project 3.

Concepts to learn: two-phase commit, logging and recovery, consistent hashing, failure detection using timeouts

Figure: A distributed key-value store with replication factor of two.

Architecture Diagram

Figure: Functional diagram of a distributed key-value store. Components in colors other than black are the key ones you will be developing for this project. Blue depicts GET execution path, while green depicts PUT and DEL; purple ones are shared between operations. Note that, components in black might also require minor modifications to suit your purposes. Not shown in the figure: PUT and DEL will involve updating the write-through cache in the master.

Architecture Diagram

Task Outline

  1. [7%] Modify KVMessage to allow two-phase commit messages (if necessary). See message specs in tables below. Then finish the TPCLog class that slaves will use to log transactions and rebuild the server during recovery.
  2. [20%] Implement registration logic in the TPCRegistrationHandler, TPCSlaveInfo, and TPCMasterHandler.registerWithMaster.
  3. [19%] Implement the methods registerSlave, findFirstReplica and findSuccessor in TPCMaster which will be used to find the slaves involved in a transaction. Then implement TPCClientHandler.
  4. [29%] Implement the rest of the TPCMaster to handle all for two-phase commit logic on the master node (handleTPCRequest and handleGet).
  5. [25%] Implement TPCMasterHandler that handles TPC logic on the slaves.

Note that this project may require a bit more collaboration and understanding between tasks than previous projects as some tasks are dependent on the implementation of others: (2,3), (3,4), (4,5), (5,1).

Requirements

Unless otherwise specified below, you will have to satisfy the requirements described in project 3. Again, you should bulletproof your code such that the nodes do not crash under normal operation.

You are required to keep up with this Piazza thread for ongoing clarifications.

Two-phase Commit

Figure: Sequence diagram of concurrent read/write operations using the TPC protocol in project 4. Project 3 blocks in the diagram refer to the activities when clients write to a single-node slave server, where the master is the client to individual slaves. GET request from Client 1 is hitting the cache in the above diagram; if it had not, the GET request would have been forwarded to each of the slave servers until it is found.

Sequence Diagram of Concurrent R/W Operations in Project 4

Failures, Timeouts, and Recovery

At any moment there will be at most one slave server down. Upon revival, slave servers always come back with the same ID. For this particular project, you can assume that the master will never go down, meaning there is no need to log its states. Individual slave servers, however, must log necessary information to survive from failures.

Your code should be able to handle the slave server failing at any point in the TPC transaction.

Registration

These are the message formats for registration. You are free to create the registration message itself with string concatenation.

register
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="register">
<Message>SlaveServerID@HostName:Port</Message>
</KVMessage>
resp
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="resp">
<Message>Successfully registered SlaveServerID@HostName:Port</Message>
</KVMessage>

Consistent Hashing

Figure: Consistent Hashing. Four slave servers and three hashed keys along with where they are placed in the 64-bit address space. In this figure for example, the different servers split the key space into S1:; [216 + 1, 223], S2: [223 + 1, 235], S3: [235 + 1, 255] and finally note that the last server owns the key space S4: [255 + 1, 264 - 1] and [0, 216]. Now when a key is hash to say a value 255 + 1, it will be stored in the server that owns the key space, i.e, S4 as well as the immediately next server in the ring S1.

Consistent Hashing

TPC message formats

put/del/get requests should remain unchanged from project 3 since they have no two-phase semantics. There are no newlines in these messages. Italicized fields should be replaced with the actual value.

putreq
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="putreq">
<Key>key</Key>
<Value>value</Value>
</KVMessage>
delreq
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="delreq">
<Key>key</Key>
</KVMessage>
getreq
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="getreq">
<Key>key</Key>
</KVMessage>
ready vote
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="ready">
</KVMessage>
abort vote
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="abort">
<Message>Error Message</Message>
</KVMessage>
commit decision
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="commit">
</KVMessage>

abort decision

<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="abort">
</KVMessage>
ack
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="ack">
</KVMessage>

Testing

Testing for this project is complicated. Expect to spend a significant portion of your time testing (~half the time of the project). To get you started, a basic end-to-end test has been provided in TPCEndToEndTemplate and TPCEndToEndTest. For information about the testing framework, please refer back to the project 3 spec. We expect extensive use of Mockito and/or PowerMock.

Skeleton

Project 4 builds on top of the single-server key-value store developed in project 3. Below is a package of files to be added to be added to your project. You may define additional classes and methods as you see fit.

http://www-inst.eecs.berkeley.edu/~cs162/sp14/kvstore/proj4-skeleton.tar.gz

To add these files to your repository, you can simply unpack the tar in your groupN-kvstore directory (and outside your kvstore directory).

The following describes the files new to project 4:

Deliverables

Initial design (Tue 4/29)

Follow the general outline of our design doc template. 15 page maximum. For testing, include a one to two sentence description of each test case you plan on implementing for each class and the reason for including them (what they test).

Code with JUnit tests (Thu 5/8)

Submit a tarball (cs162-proj4.tar.gz) of your code and tests to proj4-code by running the following command (use a hive server if you don't have ant installed). You will need to edit the handin target in your build.xml file to create a cs162-proj4.tar.gz file instead. This is also equivalent to just renaming the file without modify your build.xml file.

$ ant handin

Final design and Group Evals (Fri 5/9)

For testing, include a description on each test case you have implemented for each class. Our evaluation of your test cases will be worth 10 of the 20 points on your final design. We are looking for unit tests with good code coverage and integration tests that verify the behavior of your system. 18 page maximum (extra 3 pages for tests).