🌐

Distributed Computing

TagsComputer scienceNetwork system
Created
Updated

Preface

Prerequisites

Learning ethics

Introduction

What is Distributed Computing?

Characteristics.

Transparency.

Global Scheduler.

Design. Interfaces. Open-close principle.

High-performance distributed computing.

Pervasive systems (Supranet)

The model

A misconception is that distributed systems are microservices, indeed the latter is an implementation of the former.

Types of distribution transparency

Access

Location

Relocation

Migration

Replication

Concurrency

Failure.

Pitfalls

Types of distributed systems

Grid computing.

Cloud computing.

Cluster computing.

Distributed information systems

Enterprise Application Integration

Primitives.

ACID properties.

Pervasive systems

Mobile systems

Embedded Devices

Sensor Networks

Why does Distributed Computing matter to you?

If you have performance issues where the process is parallelizable, you have intensive IO operations, or you need a reliable system you must check out distributed computing.

Research

Ecosystem

Standards, jobs, industry, roles, …

Clouds

Hadoop

Docker

Story

Case of study

Google File System (GFS)

Summary

FAQ

Worked examples

Concurrent and Parallel Principles

Key concepts

Formally, we said the operationoperation to be idempotent if operation(x,x)=xoperation(x,x)=x for all xXx \in X.

For example, suppose the operation consults the money in my bank account you could ask for it a lot of times but always you have the same amount, so it is idempotent. However, transfer $3145 from my bank account is not idempotent because you’re changing the state.

ModelPropertiesExample
Multi-threadingParallelism, blocking system callsWorker in the browsers
Single-threaded processNo parallelism, blocking system callsTypical synchronous function
Finite-state machineParallelism, nonblocking system calls, Asynchronous operationEvent loop in the browser.

Case of study. JavaScript.

Event loop.

Stack, microtasks, macrotasks, promises, callbacks, html elements, and custom events, subscribers.

Case of study. Python.

Case of study. Go.

Case of study. C and C++.

Case of study. Java.

https://replit.com/@sanchezcarlosjr/distributed-systems-2023-1

Case of study. Haskell.

We classify programming languages as stateful and stateless to abstract the concurrent and parallel principles, even though people have named the in different ways over the years. Stateful languages are those that encourage us to assign variables, on another hand, stateless languages are those that don’t encourage us to assign variables. JavaScript, Go, and C is stateful languages meanwhile Haskell and Erlang are stateless. Immutability. Mutability.

Semigroups and monoids

Group theory in cryptography

Case of study. Erlang.

Coroutines

Foundations

Timed Input/Output Automaton (TIOA)

Distributed algorithms

Process calculus

Process Algebra

Algebraic Theory of Processes

π-calculus

References

Distributed Algorithms

Pierce, Benjamin (1996-12-21). "Foundational Calculi for Programming Languages". The Computer Science and Engineering Handbook. CRC Press. pp. 2190–2207. ISBN 0-8493-2909-4.

MIT OpenCourseWare. (2023, January 27). MIT OpenCourseWare. Retrieved from https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/resources/lecture-20-asynchronous-distributed-algorithms-shortest-paths-spanning-trees

http://profesores.elo.utfsm.cl/~agv//elo330/

Architectures

Architectural styles

Architectural styles have two main views the software and hardware organization such that the style are a set of components connected to each other and how these are configured into a system. A component is a replaceable modular unit with interfaces [OMG, 2004b] where “replaceable” means during that system operation you can change the implementation or interface to another one, but you must consider changing interfaces is a pain in the ass. You connect them by “connector”, a mechanism that mediates communication, coordination, or cooperation among them. Some of them are remote procedure calls, message passing, or streaming data.

Now, we focus on software architecture, that is, the logical organization into software components of which the most important architectural styles are layered architecture, object-based architecture, and resource-centered architecture.

https://www.youtube.com/watch?v=nH4qjmP2KEE&ab_channel=ByteByteGo

Application layering

Object-based and service-oriented architectures

Microservices

Resources-based architectures

Middleware organization

System architecture

Decentralized organizations

System design

Algorithms

https://twitter.com/bytebytego

Case of study

Astrolabe

Robbert Van Renesse, Kenneth P. Birman, and Werner Vogels. 2003. Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21, 2 (May 2003), 164–206. https://doi.org/10.1145/762483.762485

Globule

Globule – Scalable Web application hosting. (2023, February 09). Retrieved from http://www.globule.org

Jade

Jade Site | Java Agent DEvelopment Framework. (2023, February 09). Retrieved from https://jade.tilab.com

Summary

Worked examples

Processes

Threads in distributed systems

The most important thread property is allowing blocking system calls but not blocking the entire process in which the thread is running because it makes it much easier to express multiple logical connections at the same time.

Virtualization

Examples are Docker, Virtual Machines, and Emulators.

Docker,OpenVZ/Virtuozzo,and LXC/LXD.

Clients

If you must establish a high degree of distribution transparency on the client side, that is, you should hide the communication latencies, then the most common is to initiate the communication right off the bat, display something quickly to users, and not block the user interface. All together can be done via threads.

Replicated calls from a client

A replicate call is a remote procedure call (RPC) between a node and the replicas, that is, the other nodes in the system that holds a copy of the data or chunk being requested, where a response may or may not be received due to system design or network problems.

Typically, a replicated call refers to a situation where multiple replicas of a server-side application are involved in processing a client's request.

Web browser

Servers

Window X

Window X designates a user’s terminal as hosting the server, while the application is known as the client. It makes sense because on one hand there is a limited resource and $n$ applications wanting to access them, so the Windows X server coordinates the clients and resource interruptions.

Code migration

Webhook

Checkpoint and restore

Parallel and concurrent process

Asynchronous and synchronous Programming

Queue system analysis

https://www.youtube.com/watch?v=z611FLLR3_8&ab_channel=GeoffreyMessier

https://www.youtube.com/watch?v=Hda5tMrLJqc&ab_channel=USENIX

https://www.youtube.com/watch?v=oQGreeij-OE&ab_channel=8thLightInc.

Summary

Exercises

Communication

Protocols

Routing algorithms

Deadlock-free packet switching

Wave and Traversal Algorithms

Election algorithms

Termination detection

Anonymous Networks

Snapshots

Sense of direction and orientation

Synchrony in networks

Remote procedure call (RPC)

Webhook

A remote procedure call is a transparent function or method call that internally sends and receives data from remote processes. It must hide the message-passing nature of network communication. You can find a lot of RPC examples such as REST, SOAP, gRPC, Ajax in Web Browsers, graphQL, CORBA, and Trihft.

Message-oriented communication

Multicast communication

Worked examples

Summary

@startmindmap
!theme materia
* Communication
** OSI Model
*** What is it?
**** Communication protocols
***** Application, Presentation, Session, Transport, Network, Data link, Physical
***** Message = payload + header for each protocol
**** Middleware protocols
***** What is it?
****** General-purpose protocols that warrant their own layer
***** Examples
****** DNS
****** Authentication and authorization protocols
*** Why?
**** It allows thinking about open systems to communicate with each other
*** Examples
**** Connectionless services
***** Email
**** Connection-oriented communication
***** Telephone
** Remote procedure call
*** Why?
**** Achieve transparency
*** What is it?
**** The real implementation is executed in another node over a network
*** Examples
**** API REST
**** GraphQL
**** gRPC
**** CORBA
**** Ajax in browsers
*** How?
**** Local stub communicates with the server via passing messages
**** Transport Procedure Call
**** Transparent calls
**** Marshaling and Serialization
**** Interface Definition Language (IDL)
***** Issue
****** Code migration and references
** Message-oriented communication
*** Issues
**** Does the communication persist?
**** Is the communication synchronous?
***  How?
**** Async communication
**** Sync communication
**** Technologies
***** Socket interfaces
***** Blocking calls
*** Solution
****  Design Patterns
***** Pipeline pattern
***** Publish-subscribe pattern
***** Request-reply pattern
***** Message-queuing systems
***** Message brokers
**** Message-oriented middleware
***** When?
******  When the RPCs are not appropriate.
***** Examples
****** MQTT
****** ZeroMQ
****** The Message-Passing Interface (MPI)
** Multicast comunication
*** What is it?
**** Disseminate information across multiple subscribers.
**** Examples
***** IPFS
*** How?
**** Peer-to-peer system
***** Structured overlay network
***** Every peer sets up a tree
***** Distributed hash table (DHT), Distributed data
**** Flooding messages across the network
**** Epidemic protocols
***** Simple
***** Robust
@endmindmap

Naming

Flat naming

Structured naming

Attribute-based naming

Worked examples

Summary

Coordination

Clock synchronization

Summary

Questions

Consistency and replication

Stateless and stateful

Atomic objects

Fault tolerance

Fault Tolerance in Distributed systems

Case of study

VMware FT

Circuit Breaker Design Pattern

Questions

Fault tolerance

Fault tolerance in Asynchronous systems

Fault tolerance in Synchronous systems

Failure detection

Stabilization

Process resilience

Reliable client-server communication

Distributed commit

CAP theorem

Summary

Security

Tutorials and assignments

We will explore some alternatives to perform distributed computing, both locally and in a network, starting from the most basic approaches and progressing to the use of the most feature-rich libraries.

The “&” operator

(sleep 1 && echo "abc") & (sleep 2 && echo "def") & (echo "efg")

GNU parallel

In my opinion, GNU Parallel works well for complex tasks when run locally, but it can be difficult to use on a cluster. You can see for yourself using https://www.msi.umn.edu/support/faq/how-can-i-use-gnu-parallel-run-lot-commands-parallel.

Examples

cd /tmp/ && touch {1..5}.my_random_file | find . -type f -name '*.my_random_file' | parallel gzip --best
parallel echo ::: 😀 😃 😄 😁 😆 😅 😂 🤣 🥲 done!
touch images.txt && parallel printf 'https://picsum.photos/200/300\\n%.0s' ::: {1..1000} | parallel -j+0 -k "echo {} | tee -a images.txt" && cat images.txt | parallel wget
touch images.txt && parallel -j+0 "printf 'https://picsum.photos/200/300\\n%.0s'" ::: {1..2} | parallel -j+0 "echo {} >> images.txt" && parallel -a images.txt -j+0 wget

xargs

cat images.txt | xargs -n 1 -P 10 curl -s | gawk '{print $1}'

Ray

On the other hand, Ray performs well for complex tasks on an on-premise cluster, but when used locally, it can be overwhelming.

https://www.ray.io/

A distributed Hello World program

Network File System

Hadoop or Spark

Open MPI

Low-level alternative

LibP2P

Location-addressing approach vs content addressing

Content identifier

Peer. A participant in a decentralized network that is equally privileged.

Queue messaging

https://mqtt.org/

MQTT

Message broker

Quality of service

OASIS (organization)

ZeroMQ

https://zeromq.org/

Quality of service

Message-oriented middleware

Message queuing service

Your own map-reducer

https://pdos.csail.mit.edu/6.824/labs/lab-mr.html

Other Alternatives

Object storage

1. Network Attached Storage (NAS) - a file-level storage architecture that allows multiple users to access shared files over a network. 2. Distributed File System (DFS) - a protocol that allows users to access files on multiple servers as if they were on a single shared drive. 3. iSCSI (Internet Small Computer System Interface) - a protocol used for connecting storage devices over a network as if they were directly attached to a computer. 4. Remote Direct Memory Access (RDMA) - a protocol used for direct memory access between servers over a network, often used for high-performance computing and data-intensive applications. 6. AFS (Andrew File System) - a distributed file system designed for large-scale collaboration and data sharing built by Carnegie Mellon University. 8. GlusterFS - a distributed file system designed for scalability and high availability, often used for cloud computing and big data applications. 9. Lustre - a parallel distributed file system designed for high-performance computing and scientific applications.

10. File Transfer Protocol (FTP) and Secure File Transfer Protocol (SFTP) - a protocol used for transferring files over a network, often used for website hosting.

version: '3'

services:
  sftp:
    image: atmoz/sftp
    ports:
      - "2222:22"
    environment:
      SFTP_USERS: sftpuser:pass:1001:/data
    volumes:
      - /path/to/sftpdata:/data

11. WebNative FileSystem (WNFS)

  1. MinIO - an open-source, self-hosted cloud-based object storage service that can be deployed on-premises or in a cloud environment.
  1. Ceph - an open-source, software-defined storage system that can be used to create a distributed object store.
  1. OpenStack Swift - an open-source object storage system that can be used to create a scalable and highly available object-store.
  1. Apache Cassandra - a distributed NoSQL database system that can be used to store and retrieve large amounts of data.
  1. GlusterFS - an open-source, distributed file system that can be used to create a scalable and highly available object store.
  1. Riak KV - an open-source, distributed NoSQL database system that can be used to store and retrieve large amounts of data.
  1. OpenIO - an open source, object storage, and data management platform that can be used to store and retrieve large amounts of data.
  1. Scality Ring - an open-source, distributed storage system that can be used to create a highly available and scalable object store.
  1. SwiftStack - an open-source, software-defined storage system that can be used to create a distributed object store.
  1. Nextcloud Object Storage - an open-source, self-hosted cloud storage system that includes an object storage component that can be used to store and retrieve large amounts of data.
  1. SSH and RSYNC over SSH

https://hub.docker.com/r/linuxserver/openssh-server

---
version: "2.1"
services:
  openssh-server:
    image: lscr.io/linuxserver/openssh-server:latest
    container_name: openssh-server
    hostname: openssh-server #optional
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Etc/UTC
    volumes:
      - ./config:/config
    ports:
      - 2222:2222
    restart: unless-stopped
  1. Network File System
  1. Server Message Block (SMB) and CIFS/SMB
  1. BitTorrent Sync
  1. Atom (web standard)
  1. Content Management Interoperability Services
  1. Linked Data Platform
  1. Unison File Synchronizer https://github.com/bcpierce00/unison
  1. netcat (nc)
nc -lk 1234 < ~/shared

Simulations

Simgrid

OverSim

Next steps

References

Tanenbaum, Andrew S.; Steen, Maarten van (2002). Distributed systems: principles and paradigms. Upper Saddle River, NJ: Pearson Prentice Hall. ISBN 0-13-088893-1. Archived from the original on 2020-08-12. Retrieved 2020-08-28.

6.5840 Home Page: Spring 2023. (2023, January 19). Retrieved from https://pdos.csail.mit.edu/6.824

"Distributed Programs". Texts in Computer Science. London: Springer London. 2010. pp. 373–406. doi:10.1007/978-1-84882-745-5_11. ISBN 978-1-84882-744-8. ISSN 1868-0941.

Lynch, Nancy. Distributed Algorithms. Burlington, MA: Morgan Kaufmann, 1996. ISBN:9781558603486.

Attiya, Hagit, and Jennifer Welch. Distributed Computing: Fundamentals, Simulations, and Advanced Topics. 2nd ed. New York, NY: Wiley-Interscience, 2004. ISBN: 9780471453246.

Herlihy, Maurice, and Nir Shavit. The Art of Multiprocessor Programming. Burlington, MA: Morgan Kaufmann, 2008. ISBN: 9780123705914.

Guerraoui, Rachid, and Michal Kapalka. Transactional Memory: The Theory. San Rafael, CA: Morgan and Claypool, 2010. ISBN: 9781608450114.

Dolev, Shlomi. Self-Stabilization. Cambridge, MA: MIT Press, 2000. ISBN:9780262041782.

Kaynar, Disun, Nancy Lynch, Roberto Segala, and Frits Vaandrager. The Theory of Timed I/O Automata. 2nd ed. San Rafael, CA: Morgan and Claypool, 2010. ISBN:9781608450022. https://groups.csail.mit.edu/tds/papers/Lynch/Monograph-second-edition.pdf

MIT 6.033

MIT 6.5840

MIT 6.824. (2023, January 27). MIT OpenCourseWare. Retrieved from https://ocw.mit.edu/courses/6-824-distributed-computer-systems-engineering-spring-2006

Systems, Mit 6. 824: Distributed. "Lecture 1: Introduction." YouTube, 6 Feb. 2020, www.youtube.com/watch?v=cQP8WApzIQQ&list=PLrw6a1wE39_tb2fErI4-WkMbsvGQk9_UB&ab_channel=MIT6.824%3ADistributedSystems.

Saltzer and Kaashoek's Principles of Computer System Design: An Introduction (Morgan Kaufmann 2009).

Nancy Lynch, Michael Merritt, William Weihl, and Alan Fekete. Atomic Transactions. Morgan Kaufmann Publishers, 1994.

MIT OpenCourseWare. (2023, January 27). MIT OpenCourseWare. Retrieved from https://ocw.mit.edu/courses/6-852j-distributed-algorithms-fall-2009

cs6234-16-pds https://www.comp.nus.edu.sg/~rahul/allfiles/cs6234-16-pds.pdf

Stevens, W. Richard, Bill Fenner, and Andrew M. Rudoff. UNIX Network Programming, Vol. 1: The Sockets Networking API. 3rd ed. Reading, MA: Addison-Wesley Professional, 2003. ISBN: 9780131411555.

McKusick, Marshall Kirk, Keith Bostic, Michael J. Karels, and John S. Quarterman. The Design and Implementation of the 4.4 BSD Operating System. Reading, MA: Addison-Wesley Professional, 1996. ISBN: 9780201549799.

Stevens, W. Richard, and Stephen Rago. Advanced Programming in the UNIX Environment. 2nd ed. Reading, MA: Addison-Wesley Professional, 2005. ISBN: 9780201433074.

. "Distributed Systems lecture series." YouTube, 31 Jan. 2023, www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_HdUFe97RItdiB.

https://ki.pwr.edu.pl/lemiesz/info/Tel.pdf

http://bafybeiemcaemivgxkzkuqq2kvvfmyo32joipgp6pivk2ea32la2stnmkwm.ipfs.localhost:8080/?filename=Maarten van Steen%2C Andrew S. Tanenbaum - Distributed Systems_ Principles and Paradigms-Maarten van Steen (2023).pdf

https://www.youtube.com/@DistributedSystems

https://www.youtube.com/@lindseykuperwithasharpie

https://www.youtube.com/@rbdiwate626

IPFS and libp2p

TODO