: wwo Distributed Computing

CSE/EE 597 - Distributed Computing, Spring 2018

Distributed algorithms form the foundations of modern cloud computing systems. In a distributed system, multiple processes, possibly spanning different machines, have to co-ordinate their actions to accomplish a distributed computation. Even for executing seemingly simple tasks, distributed algorithms can be significantly more complicated and challenging compared to their sequential counterparts because the tasks have to be executed correctly despite having only local information, lack of synchronization and machine or network failures. This course will introduce the student to the elegant theory of distributed algorithms. We will focus on problems, models, algorithms and impossibility results. Theoretical discussions will be supplemented via demonstrations of real-world practical systems that use the ideas developed. The course will also involve a project that will allow the student to apply concepts discussed in class. The specific topics covered will be
  1. Modeling of Asynchronous Systems
  2. Global states and ordering of events
  3. Consistency, definitions and properties
  4. Consistent Shared Memory Emulation
  5. Fault tolerant consensus, state machine replication
Time permitting, additional topics related to distributed snapshots, Byzantine fault tolerance and transactional memory may be covered.

PREREQUISITES

The most important requirements are: (i) ability to develop correct proofs, or the willingness to work towards it, (ii) knowledge of a high level programming language, or the willingness to learn. The ideal student will have a background in sequential algorithms (CSE 465 and CSE 565), however, the ability or the willingness to learn and work with formal mathematical models and "algorithmic thinking'' is a good substitute.

LOGISTICS

LOCATION: EES 118

CLASS HOURS: MW 4:00 PM - 5:15 PM

INSTRUCTORS: Viveck Cadambe (231 EE West, viveck@engr.psu.edu), Bhuvan Urgaonkar (W371 Westgate Building, bhuvan@cse.psu.edu)

OFFICE HOURS: Instructors available after class or by appointment.

TEXTBOOK AND REFERENCE BOOKS

No textbook is required for the course. The material for the course will be disseminated through several papers that will be linked on the course website, and slides on Canvas.

REFERENCE MATERIALS

Lynch, Nancy A., Distributed algorithms, Morgan Kaufmann, 1996.

Attiya, Hagit, and Jennifer Welch, Distributed computing: fundamentals, simulations, and advanced topics, Vol. 19. John Wiley & Sons, 2004.

Herlihy, Maurice, and Nir Shavit, The art of multiprocessor programming, Morgan Kaufmann, 2011.

GRADING POLICY

Homeworks (50%), Final Project (40%), Class Participation (10%)

Collaboration is allowed for homework questions, but not encouraged. Copying homework answers is NOT allowed. In case a student has collaborated with another student in solving a question, the student must list the name of the collaborator, and briefly describe the nature of the collaboration in the submitted homework. In any case, each student must turn in his or her own homework, which should contain the student's understanding of how the problem is solved.

Project submission will involve an in-class presentation and a detailed report. Project ideas will be provided mid-way during the semester. The credit for class participation will be given based on questions asked in/or after class, and responses to frequent quiz questions that will be posed by the instructors in class or in Canvas.

ACADEMIC INTEGRITY

All Penn State policies regarding ethics and honorable behavior apply to this course.

LECTURES

The lecture schedule is tentative, and the table will be populated and/or edited as the semester progresses.

Ben-Or, Michael. "Another advantage of free choice (extended abstract): Completely asynchronous agreement protocols." Proceedings of the second annual ACM symposium on Principles of distributed computing. ACM, 1983. Extra reading: Aguilera, M. K., Toueg, S. (2012). The correctness proof of Ben-Or’s randomized consensus algorithm. Distributed Computing, 25(5), 371-381.
Date/Week. Topic Comments Reading Material
Week 1, 8/1, 10/1 Motivating Example: Mutual Exclusion over shared memory. Peterson's and Bakery Algorithms. Course overview and logistics Lamport, Leslie. "A new solution of Dijkstra's concurrent programming problem." Communications of the ACM 17.8 (1974): 453-455.
Week 2, 17/1 Modeling: Asynchronous system. Shared Memory versus Message passing. Liveness and Safety properties. No class on 15/1 (Martin Luther King Day)
Week 3, 22/1, 24/1 Causal Ordering of Events. Lamport Clocks. Lamport, Leslie. "Time, clocks, and the ordering of events in a distributed system." Communications of the ACM (1978).
Week 4, 29/1, 31/1 Distributed Snapshots (Chandy-Lamport). Distributed mutual exclusion. Vector clocks. Chandy, K. Mani, and Leslie Lamport. "Distributed snapshots: Determining global states of distributed systems." ACM Transactions on Computer Systems (TOCS) 1985.

Babaoglu, Ozalp, and Keith Marzullo. "Consistent global states of distributed systems: Fundamental concepts and mechanisms." Distributed Systems 1993.

Week 5, 2/5, 2/7 Distributed Mutual Exclusion. Replicated state machines. No lecture on 2/7 due to weather
Week 6, 2/12, 2/14 Definitions of Linearizability and Sequential Consistency. Non-blocking property and composability.
Week 7, 2/19, 2/21 Lineariable Shared Memory Emulation Attiya, Hagit, Amotz Bar-Noy, and Danny Dolev. "Sharing memory robustly in message-passing systems." Journal of the ACM (JACM) 42.1 (1995): 124-142. (Dijsktra Prize Winner)
Week 8, 2/26, 2/28 Sequentially Consistent Shared Memory Emulation, Lower Bounds on linearizable and sequentially consistent shared memory emulation. Attiya, Hagit, and Jennifer L. Welch. "Sequential consistency versus linearizability." ACM Transactions on Computer Systems (TOCS) 12.2 (1994): 91-122.
Spring Break
Week 9, 3/12, 3/14 Causal Consistency. Causally consistent shared memory emulation. Ahamad, Mustaque, Gil Neiger, James E. Burns, Prince Kohli, and Phillip W. Hutto. "Causal memory: Definitions, implementation, and programming." Distributed Computing 9, no. 1 (1995): 37-49.
Week 9, 3/12, 3/14 Implementation of causally consistent data stores. Lloyd, W., Freedman, M. J., Kaminsky, M., & Andersen, D. G. (2011, October). Don't settle for eventual: scalable causal consistency for wide-area storage with COPS. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (pp. 401-416). ACM.
Week 11, 3/26, 3/28 Consistency, Availability, and convergence. Mahajan, Prince, Lorenzo Alvisi, and Mike Dahlin. "Consistency, availability, and convergence." University of Texas at Austin Tech Report 11 (2011). Extra Reading: Attiya, H., Ellen, F., & Morrison, A. (2017). Limitations of highly-available eventually-consistent data stores. IEEE Transactions on Parallel and Distributed Systems, 28(1), 141-155.
Week 12, 4/2, 4/4 FLP Impossibility. Fischer, Michael J., Nancy A. Lynch, and Michael S. Paterson. "Impossibility of distributed consensus with one faulty process." Journal of the ACM (JACM) 32.2 (1985): 374-382.
Week 13, 4/9, 4/13 Paxos, Randomized Consensus Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25.

Lamport, Leslie. "Fast paxos." Distributed Computing 19.2 (2006): 79-103.
Week 14, 4/16, 4/18 Multi Paxos, Byzantine Paxos Lamport, Leslie. "Byzantizing Paxos by refinement." International Symposium on Distributed Computing. Springer, Berlin, Heidelberg, 2011. Lamport, L. (2006). Fast paxos. Distributed Computing, 19(2), 79-103. Junqueira, Flavio, Yanhua Mao, and Keith Marzullo. "Classic paxos vs. fast paxos: caveat emptor." Proceedings of USENIX Hot Topics in System Dependability (HotDep) (2007).
Week 15, 4/23, 4/25 Projects