1.7 Design Issues and Challenges

AU: May-22

1.7.1 Challenges from System Perspective

Communication mechanisms
Distributed systems need proper mechanisms for communication among processes. Examples are Remote Procedure Call (RPC), Remote Object Invocation (ROI), message-oriented communication, and stream-oriented communication.

Processes
Important issues are code migration, process and thread management at client and server, and the design of software agents and mobile agents.

Naming
Distributed systems need easy identifiers to locate resources and processes. Naming must be transparent and scalable.

Synchronization
Mechanisms are required for coordination among processes. Examples: Mutual exclusion and leader election.

Data storage and access
Distributed systems need fast and scalable schemes for storage, searching, and lookup. Revisit concepts like file system design.

Consistency and replication
To avoid bottlenecks and provide fast access, data is replicated. Consistency management between replicas is necessary for scalability.

Distributed systems security
Important issues are secure channels, access control, key management (generation and distribution), authorization, and secure group management.

1.7.2 Challenges

Designing distributed systems is not simple. The following challenges must be solved:

Heterogeneity
Openness
Security
Scalability
Failure handling
Concurrency
Transparency

1.7.2.1 Heterogeneity

Modern distributed systems are heterogeneous in bandwidth, processor speed, disk capacity, failure rate, and security.

Computer networks: LAN, wireless, satellite links.
Hardware devices: Laptops, mobiles, tablets, computers.
Operating systems: Linux, UNIX, Windows.
Programming languages: C, C++, Java, PHP.
Roles: Developers, designers, system managers.

Different systems use different data representations (integers, floating points, characters). Marshaling helps data transfer without losing meaning.

Middleware hides heterogeneity and provides uniform high-level interfaces to applications. Examples: CORBA, DCOM, Java RMI. Middleware allows applications to be composed, reused, ported, and interoperable.

1.7.2.2 Openness

Openness means easy extension and modification of the system. It supports plug and play and allows new services that follow the same interface contract.

Open systems are built using well-defined interfaces so components can be replaced or extended. Developers can add features or replace subsystems easily.

A stable architecture is needed to integrate new components while preserving older ones.
Open systems follow standard rules (syntax and semantics) to provide services.

1.7.2.3 Security

Security is a major concern in distributed systems. Important issues:

Authentication, authorization, encryption, digital signatures, privacy, and non-repudiation.
Security system goals: protect information, detect intrusions, confine breaches, recover to stable secure state.

Three components of security:

Confidentiality – Protection from unauthorized access (e.g., ACL in UNIX).
Integrity – Protection from data corruption (e.g., checksum).
Availability – Protection from denial of service.

Challenges:

Denial-of-service attacks (DoS).
Security of mobile code (Java applets, scripts).

Mobile code supports service customization, dynamic extension, autonomy, fault tolerance, disconnected operation.

1.7.2.4 Scalability

A distributed system is scalable if it can handle growth in users/resources without performance loss or high complexity.

Types of scalability:

Size – Large number of users, machines, tasks.
Location – Distribution and mobility of resources.
Administration – Crossing multiple ownership regions.

Challenges:

Control cost of resources.
Control performance loss (e.g., DNS hierarchy is faster than linear lookup).
Prevent running out of resources (IPv6 solves 32-bit address limit).
Avoid performance bottlenecks (DNS replication and partitioning).

Techniques for scalability:

Replication – Increase availability, load balancing.
Caching – Faster repeated access.
Asynchronous communication – Hide latency.
Distribution – Splitting components across system.

1.7.2.5 Failure Handling

Failures in distributed systems are partial, meaning some components fail while others continue.

Techniques:

Detecting failures – e.g., corrupted file detected by checksum.
Masking failures – Hide effects, e.g., retransmit messages, duplicate files.
Tolerating failures – Client systems designed to tolerate crashes.
Recovering from failures – Rollback after crash, recover permanent data.
Redundancy – Replicate services, routes, and databases.

Examples:

At least two routes between internet routers.
DNS name tables replicated.
Database replicated on multiple servers.

1.7.2.6 Concurrency

In distributed systems, components run concurrently. Multiple clients may access a shared resource at the same time.

Resources must be protected to work correctly in concurrent environments.

1.7.2.7 Transparency

Transparency means users should see the system as a single whole instead of many components.

Types of transparency:

Location transparency – User doesn’t need to know resource location. Example: URL.
Access transparency – Same operations for local/remote resources. Example: web hyperlink.
Concurrency transparency – Multiple users share resources safely.
Replication transparency – Users don’t know about replicas. Example: web cache.
Failure transparency – Users continue tasks despite component failure. Example: email.
Mobility transparency – Resources/users move without affecting service. Example: mobile phones.
Performance transparency – System reconfigures to improve performance.
Scaling transparency – System expands without structural changes.

Advantages:

Users/programmers don’t worry about system topology.
Easier to use and understand.

Disadvantages:

Optimization cannot be done by user/programmer.
Failures may cause strange behavior.
Underlying system becomes complex.

1.7.3 Application of Distributed Computing and Challenges

1. Mobile system

Devices: laptops, PDAs, mobiles, wearable devices, appliances.
Mobile computing: users access resources while moving.
Location-aware computing: use nearby resources.

2. Ubiquitous computing (pervasive computing)

Many small devices present in user’s environment (home, office).
Example: WiFi, Bluetooth, WiMAX, 3G mobile networks.
Programs interact by message passing over the Internet.

Search This Blog

Design and issue challanges