SMC

From twiki
Jump to navigation Jump to search

Introduction

SMC stands for Statistic Multiplexed Computing and Communication. The idea was inspired by von Neumann's 1952 paper on probabilistic logics and synthesis of reliable organisms using unreliable components [1]. Building fault tolerant mission critical system was the triggering use case.

Modern computer applications must rely on networking services. Every networked application forms its own infrastructure consisting of network, processing, storage and software components. A data center or cloud service can duplex multiple applications on the same hardware. The definition of infrastructure remains with each application.

The challenge of building fault tolerant mission critical application is deciding how the programs should communicate with other programs and devices.

Since every hardware component has finite service life, in theory, mission critical software programs must be completely decoupled from hardware components, such as networking, processing, storage, sensors and other devices. This decoupling affords hardware management freedom without changing programs.

This simple requirement seems very difficult to satisfy in practice.

Canary In The Coal Mine

The TCP/IP protocol was an experimental protocol for connecting arbitrary two computers in the world-wide network like the "canary in a coal mine". The OSI model [2] is a conceptual model from the ISO (International Organization for Standardization) that provides "a common basis for the coordination of standards development for the purpose of systems interconnection [3]. Today, its 7-layer model is widely taught in universities and technology schools.

For every connection-oriented socket circuit, all commercial implementations guarantees "order-preserved lossless data communication", packet losses, duplicates and errors are automatically corrected in the lower layer protocols. It seems perfectly reasonable to deploy this "hop-to-hop" connected-oriented TCP sockets for mission critical applications.

Unfortunately, the fine-prints in the OSI model specification have not caught any attention. The "order-preserved lossless data communication" is guaranteed only if both the sender and receiver are 100% reliable. If the sender or the receiver can crash arbitrarily, then reliable communication is impossible [4]. In practice, we call the TCP/IP protocol the "best effort" reliable protocol for all nodes are indeed try their best to forward packets and repair errors and duplicates. Building mission critical application using these protocols is inappropriate.

The problem is that it is impossible to eliminate single-point failures. The "canary" worked well only for communication applications where data losses and service downtimes can be tolerated to some extent. For mission critical applications, such as banking, trading, air/sea/land navigation, service crashes can cause irreparable damages and loss of human lives.

Even for banking applications, arbitrary transaction losses cannot be eliminated completely. Banks resort to legal measures to compensate verified customer losses. There are unclaimed funds to be handled by the banks.

In business, the IT professionals simply try to minimize "opportunity costs". Solving the infrastructure scaling problem seems out of reach for most people.

Fruits of Poisonous Tree

The "hop-to-hop" protocol is the DNA of existing enterprise system program paradigms. It is the root of existing enterprise application programming paradigms. These include RPC (remote procedure call), MPI (message passing interface), RMI (remote method invocation) and distributed share-memory. These are the only means to communicate and retrieve data from a remote computer.

Applications built using these protocols form infrastructures that have many single-point failures. Every such point failure can bring down the entire enterprise.

The "Achilles Heel" of the legacy enterprise programming paradigms is the lack of "re-transmission discipline". The hop-to-hop protocols provide false hopes of data communication reliability. Programmers are at a loss when dealing with timeout events: do we retransmit or not retransmit? If not, data would be lost. If we do, what is the probability of improvement from the last timeout?

Achilles-heel

To eliminate infrastructure single-point failures requires complete program and data decoupling from hardware components -- a programming paradigm shift from the hop-to-hop communication paradigms.

Active Content Addressable Networking (ACAN)

Content addressable networks are peer-to-peer(P2P) networks. There are many proposed contend addressable networks (CAN) [5]. These include distributed hash table (DHT)[6], and projects: Chord[7], Pastry[8] and Tapestry[9]. These networks focused on information retrieval efficiency.

We need information retrieval efficiency and the freedom of multiplexing hardware components in real time. We call this ACAN (Active Content Addressable Networking).

The idea of ACAN is to enable automated parallel processing based on application's data retrieval patterns without explicitly building parallel SIMD, MIMD and pipeline clusters. This was inspired by the failures from the early-day dataflow machine research [10]. The earlier dataflow computer research failed to recognize the importance of parallel processing granularity[11]. The meticulously designed multiprocessor architectures failed to compete against simpler single processors.

The parallel processing granularity optimization problem needs to balance the disparity of processing and communication capabilities in an infrastructure. The solution can be modeled as the Brachistochrone problem [12], an ancient mathematical puzzle solved by Johnann Bernoulli [13].

On surface, parallel computing using P2P network will be slower than direct hop-to-hop networks, such as MPI -- the commonly used TOP500 supercomputer benchmark protocol. The problem is that hop-to-hop protocol favor fixed parallel processing granularity due to its rigid programming.

ACAN used the Tuple Space abstraction for automated data-parallel computing [14].

Computing experiments demonstrated that optimal granularity tuning using ACAN can out-performance fixed partitioned MPI program by big margins[15].

Similar experiments were conducted on ACAN storage performance against the Hadoop distributed file system [16].

Programs in ACAN are "stateless" programs that can run on any computer in ACAN. There is no single-point of failure in the infrastructure.

A more remarkable feature of ACAN applications is the ability to deliver unbounded performance as the infrastructure expands in size with open problem sizes. It is capable to harness multiple quantum computers.

Blockchain Protocols

Blockchain protocol also forms P2P network. Although there are different consensus protocols, such as POW (proof of work), POS (proof of stake), POH (proof of history) and POP (proof of possession), etc., the general architecture remains the same: the protocol runs on all computers. Each computer validates transactions subject to the consensus protocol for finality. Once committed to the cryptographically linked chain, the transactions are immutable.

POW protocol is not environmental friendly that consumes too much electricity for solving the mining puzzles. Other consensus protocols are more energy efficient but all suffer the same scaling challenges: expanding the network and growing the ledger will bring down the network performance gradually. The storage requirements are monotonically increasing. The network will eventually crash when all nodes are storage saturated.

The blockchain protocols avoided the hop-to-hop protocol trap by using various "gossip" protocols. In theory, there should be no single-point failure in any blockchain network and applications. Therefore, if it is not for protocol bugs, there should never be a service downtime. The Bitcoin network demonstrated such an excellent uptime record[17] that no legacy infrastructures could match.

Another remarkable feature of the Bitcoin network is that its protocol programs are Open Source. Anyone can download and manipulate it and turn it to attack the network. There has been no successful Bitcoin network attacks since its 2009 launch.

SMC Blockchain Protocol

SMC blockchain protocol aims to solve the scalability challenge by enabling automated parallel processing without negatively impacting the network security and reliability. The ACAN concept is a perfect fit for this case. ACAN enables automated SMC by forcing the transaction validation blocks to form SIMD, MIMD and pipeline clusters in real time.

SMC POS Consensus

SMC POS consensus will randomly select staking validators without staking pools. SMC POS protocol also has an adjustable replication factor that the network ledger has a fixed replication factor that can be tuned on demand without service interruptions. Therefore, SMC POS consensus protocol can deliver:

  1. Byzantine failure resistance
  2. Fault tolerance
  3. Program tamper resistance
  4. Centralization avoidance
  5. Incrementally better performance, security and reliability

More nodes in the network expands the network storage capacity without adding extra redundancy. More nodes facilitates longer pipelines for better performances.

Summary

SMC blockchain protocol is the only statistic multiplexed blockchain protocol to date. The SMC technology development history links the blockchain protocol development to stateless HPC (high performance computing) developments. This is the only know technology that can offer decentralized processing for centralized controls, which is the perfect gateway to integrate web2.0 applications for web3.0 infrastructures.

References: