Start of distributed system:
- Development of powerful microprocessors
- High-speed networks, LAN, WAN
A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.
- A collection of autonomous computing elements, nodes
- No global clock
- Manage the group membership and organization
- Authentication, indeed communication with group member or not, confidentiality
- Open group: any node can join and send messages
- Closed group: communicate with in the group
- Users think they are dealing with a single system
- Nodes needs to collaborate
As the organization of the collection, overlay network should be connected. There should always be a path between two nodes to route messages. It can be formed by P2P networks.
- Structured overlay
- Nodes have a well defined set of neighbors to communicate
- Unstructured overlay
- randomly selected other nodes
Single coherent system
Ask the system to appear to be coherent.
Distribution transparency: End users would not know which computer a process is running, where the data is stored.
But partial failures are difficult to hide.
Middleware is a distributed-system layer placed on top of OS of nodes, as the OS of the distributed system. It’s a container of commonly used components and functions that now provided by the system rather than applications. Provide services:
- Resource management
- Facilities for inter-application communication
- Masking of and recovery from failures
Differences from OS:
- Offered in a networked environment
- Most services are useful to many applications
- e.g. Remote Procedure Call (RPC): an application can invoke functions implemented and executed on a remote computer as if locally available
- Support in an all-or-nothing fashion, atomic
- Service composition
- Develop new application by combining existing ones. e.g. mashups.
- Enhanced functions for building reliable applications
- e.g. Horus, messages are received in all-or-no fashion
Supporting resource sharing
- Cheaper to have a single high-end reliable storage facility be shared.
- Best illustrated by P2P networks.
Making distribution transparent
Hide the fact that its processes and resources are physically distributed across multiple computers (possibly separated by large distances). Invisible to end users and applications.
Types of transparency:
- data representation
- the access way
- Where the objects are stored
- Objects may be moved while in use
- Objects may be moved
- Shared by several users
- Failure and recovery
The price for achieving full transparency may be surprisingly high.
System offers components that can be easily used by or integrated into other systems.
Define services with interface, using Interface Definition Language.
- Two components can work together as specified by a common standard
- An application running on system A, can be run on system B if implements the same interface as A
- Easy to add parts
Separate policy from mechanism
It is crucial that the system be organized as a collection of small, easily replaceable components.
- Size scalability
- Add users and resources without losing performance
- CPU, storage and I/O rate, network
- Geographical scalability
- Users and resources may lie far apart, with small communication delays
- Synchronous communication, block until response. Slow on WAN
- Communication less reliable on WAN
- Limited bandwidth
- WAN system has very limited facilities for multi-point communications
- Administrative scalability
- Easily managed even if it spans many admin orgs
- Conflicting policies of resource usage, management and security
- System needs to protect itself against attacks from the new domain
- New domain users may only have read access
- Scaling up: improve CPU, memory, network capacity
- Scaling out: expanding by deploying more machines
Method for scaling out:
- Hiding communication latencies
- In the case of geographical scalability
- Try to avoid waiting for remote response as much as possible
- Asynchronous communication
- Move job from server to client
- Distribution of work
- e.g. DNS lookup
- Distribute copies across the system
- Drawback: consistency
- Global synchronization is hard to scale
False assumptions while developing systems:
- Reliable network
- Secure network
- Homogeneous network
- Topology does not change
- Zero latency
- Infinite bandwidth
- Zero transport cost
- One administrator
Types on distributed systems
High performance distributed computing
- Cluster computing
- Collection of similar workstations with the same OS and LAN
- Super computer
- Grid computing
- Federation of computer systems under different admin domains, software and hardware
- Collaboration: virtual organization
- Layers: fabric; connectivity & resource; collective; applications
- Cloud computing
- Constructed dynamically
- Provides a lot of resources
- Open resources to customers, virtualized
- Layers: hardware, infrastructure, platform, application
- Services: infrastructure, platform, software -as-a-service
Distributed information systems
Interoperability: Make it easier to integrate applications into an enterprise-wide information system.
Enterprise Application Integration: letting applications communicate directly with each other.
Distributed transaction (operation on a database) properties:
- Do not violate system invariant
- Concurrent transactions do not interfere with each other
- Once committed, changes are permanent
Nested transactions may consist of many sub transactions. Permanence only applys to the top-level transactions.
Transaction processing monitor handles distributed transactions.
Enterprise Application Integration
Applications communicate directly with each other.
Among several types of communication middleware, RPC allows an application can send request to and receive response from other application components by doing a local procedure call.
Remote method invocations (RMI) is similar to RPC, but on objects instead of functions.
RPC and RMI drawback: caller and callee need to be up and running for communication, and need to know how to approach each other. This leads to message-oriented middleware (MOM). Messages are sent to logical contact points. publish/subscribe is used; applications can indicate the messages they are interested.
By the introduction of mobile and embedded computing devices, pervasive systems are naturally distributed.
- Have many sensors and actuators
- Devices: small, battery, mobile, wireless
- Ubiquitous computing systems
- System is pervasive and continuously present
- Distribution, in a transparent manner
- Interaction, highly unobtrusive
- Context awareness: System knows user context
- Address allocation
- Adding devices
- Auto updates
- Intelligence, handles a wide range of dynamic actions.
- Mobile systems
- Devices vary widely
- Locations are changing over time
- Flooding-based send messages to spread on a part of network
- Disruption-tolerant network, intermediate node can store messages until finding the next node
- Sensor networks
- A collection of input devices
3-tier on cloud infrastructure:
- Web (Client) tier
- Business Logic tier
- Data tier
From 1996 to 2013,
- Server spending keeps steady.
- Management and administration part are increasing.
- Number of physical servers keeps going up until 2008 then has been constant.
- Number of logical servers keeps increasing.
20 years ago,
- Very low server utilization, 7%
- Mainframe optimization very high
- Run multiple OSs in one mainframe
- Virtualizing the hardware
- Hypervisors enable multiple OSs to run in the same hardware
- But server-based industry didn’t do this
Virtualization is regarding a single server.
- Hypervisor provides virtualized access to hardware for multiple OSs.
- Virtual Machine Monitor is an efficient, isolated duplicate of the real machine.
- fidelity, performance, safety and isolation
- Until 2005, x86 was not virtualizable
- 9 instructions break things
- Evolution of virtual support: CPU, memory, I/O
- Itanium, MIPS, ARM are mostly virtualizable
VMware software VMM:
- Software VMMs runs very slow due to interpretation
- Replace sensitive instructions in guest on-the-fly
- Binary input, not source code
- Present software interface to virtual machines that is similar but not identical to that of the underlying hardware
- Allow the guests to handle difficult portion of code to VMM
- Guest OS should be explicitly ported for para-API
- A convention OS may not work on p-VMM
- HyperV, MS
- Turtle Project in IBM
- Hypervisor running on top of Hypervisor
- Ship hardware with a modified hypervisor. Other hypervisors can be run on top of this hypervisor.
Difference between cloud computing and virtualization
- Virtualization is part of cloud computing
- Human is needed to interact with hypervisors to create virtual machines
- Cloud computing: API handles all automation to communicate with hypervisors
Container is user-level virtualization.
Now applications are:
- constantly developed
- New versions are being deployed often
- built from loosely coupled components
- deployed to a multitude of servers
Containers make apps portable
- looks the same everywhere
- No matter where it is running
- Don’t need to install all dependencies on the host
- contains project code and provisioning script
- Can be pushed to VM, Github and various server
- Put code in environment
- Dockerfile: define dependencies
- Docker image: A complete wrap of application and dependencies
- Build environment
- Multiple computers connected with each other
- Working together
- Seems like a single coherent system.
- Failures are normal and tolerant
- Expected to fail
- Accessing the same data
- No common lock
Example of problems
- Nodes agree on a value of a piece of data
- Leader election
- Group management
- Failure detection
- Clock synchronization
- Social network data
- MongoDB, XML
- Volume: large amounts of data
- Unstructured: hard to fit into a schema
- Images, videos
- Random reads and writes, write heavy
- Speed: queries quickly
- Distributed: availability, no single point of failure
- Low cost: include cost of system admin
- Scalable: demand changes quickly
- Non relational
- No schema required
- Easily distributed
- Horizontal scaling, scale out (add more machines)
Scenario to use:
- Fetching data from database or remote API call is slow
- Data access pattern
- Frequency: high
- Mostly static data
- Amazon’s ElastiCache
- In-memory key-value store
- Values can be anything, webpages, numbers
- Items in it have an expiration time
- Items replaced in LRU fashion when cache is full
- get, set, delete
- Failures are the norm
- Need failure detector to determine if a process has failed
- Send ping and receive ACK
- Randomly pick target
- If no ACK, ask other processes to check the target
- Gossip protocol
- Messages are like infection, go to neighboring nodes (or just a subset)
- Every node maintains a membership list
- Gossip messages may include membership list
- On receiving msg, nodes update membership list
- Mark a node as failed if no update after a certain period
- Messages are like infection, go to neighboring nodes (or just a subset)
- Nodes periodically send messages with sequence numbers
- Leader may fail
- Or ring based
- Or all to all
- C: Consistency
- Operations on the distributed system act as if on a single machine
- Every successful read receives the most recent write.
- A: Availability
- Every request to a non-failing node must result in a response
- Translate to revenue
- Read and write should complete with a low latency
- P: Partition tolerance
- Good even when all msg from one part of the system to other are cut off
- Consistency: read the most recent write (the written node may be blocked)
- Availability: read the latest copy (the node with that copy may be blocked)
Partition can happen when:
- Internet outages
- Between data centers
- Between racks within a data center
- Data centers may be blocked
A distributed data store can satisfy only at most two of CAP guarantees.
- Strong consistency
- Read always see the most recent write
- Eventual consistency
- Read may see older copy sometimes
- Eventually, the most recent
- Causal consistency
- Only causally related reads wait for the most recent write
- Read y does not wait to update x value (unrelated)
- Discover neighbors
- Request files
- Query hit
- Discover neighbors
- Consistent hashing
- Tradeoff between routing table size and lookup time
- A node knows every other node
- Fast lookup, large table
- A node only knows neighbors
- Slow lookup, small table
- Finger table: A node knows m other nodes
- Maps nodes at distance , , …,
- A node knows every other node