SMC: Difference between revisions
|  (Added probabilistic logics reference) | No edit summary | ||
| Line 1: | Line 1: | ||
| == Introduction == | == Introduction == | ||
| SMC stands for Statistic Multiplexed Computing and Communication. The idea was inspired by von Neumann's 1952 paper on probabilistic logics and synthesis of reliable organisms using unreliable components <ref> | SMC stands for Statistic Multiplexed Computing and Communication. The idea was inspired by von Neumann's 1952 paper on probabilistic logics and synthesis of reliable organisms using unreliable components <ref>https://static.ias.edu/pitp/archive/2012files/Probabilistic_Logics.pdf</ref>. Building fault tolerant mission critical system was the triggering use case.  | ||
| === Blockchain  | |||
| === Consensus === | Modern computer applications must rely on networking services. Every networked application forms its own infrastructure consisting of network, processing,  storage and software components. A data center or cloud service can duplex multiple applications on the same hardware. The definition of infrastructure remains with each application. | ||
| ==== Proof of Work ==== | |||
| ==== Proof of Stake ==== | The challenge of building fault tolerant mission critical application is deciding how the programs should communicate with other programs and devices. | ||
| Since every hardware component has finite service life, in theory, mission critical software programs must be completely decoupled from hardware components, such as networking, processing, storage, sensors and other devices. This decoupling affords hardware redundancy management freedom without changing programs for each hardware configuration. | |||
| This simple requirement seems very difficult to satisfy in practice. | |||
| ==Canary In The Coal Mine== | |||
| The TCP/IP protocol was an experimental protocol for connecting arbitrary two computers in the world-wide network as the "canary in a coal mine". The OSI model <ref>https://en.wikipedia.org/wiki/OSI_model</ref> is a conceptual model from the ISO (International Organization for Standardization) that provides "a common basis for the coordination of standards development for the purpose of systems interconnection <ref>https://www.iso.org/standard/20269.html</ref>. Its 7-layer model is widely taught in universities and technology schools.  | |||
| For every connection-oriented socket circuit, all commercial implementations guarantees "order-preserved lossless data communication", packet losses, duplicates and errors are automatically corrected in the lower layer protocols.  It seems perfectly reasonable to deploy this "hop-to-hop" connected-oriented TCP sockets for mission critical applications. | |||
| Unfortunately, the fine-prints in the OSI model specification never caught wide-spread attention. The "order-preserved lossless data communication" is guaranteed only if both the sender and receiver are 100% reliable. If the sender or the receiver can crash arbitrarily, then reliable communication is impossible <ref>https://groups.csail.mit.edu/tds/papers/Lynch/jacm93.pdf</ref>. In practice, we call the TCP/IP protocol the "best effort" reliable protocol for all nodes are indeed try their best to forward packets and repair errors and duplicates. Building mission critical application using these protocols is inappropriate. | |||
| The problem is that it is impossible to eliminate single-point failures. The "canary" worked well only for communication applications where data losses and service downtimes can be tolerated to some extent. For mission critical applications, such as banking, trading, air/sea/land navigation, service crashes can cause irreparable damages and loss of human lives. | |||
| Even for banking applications, arbitrary transaction losses cannot be eliminated completely. Banks resort to legal measures to compensate verified customer losses. | |||
| In business, the IT professionals simply try to minimize "opportunity costs" than solving the root causes. | |||
| ==Fruits of Poisonous Tree== | |||
| The "hop-to-hop" protocol is the DNA of existing enterprise system program paradigms. These include RPC (remote procedure call), MPI (message passing interface), RMI (remote method invocation) and distributed share-memory. These are the only means to communicate and retrieve data from a remote computer. | |||
| Applications built using these protocols form infrastructures that have many single-point failures. Every such point failure can bring down the entire enterprise. | |||
| The "Achilles Heel" of the legacy enterprise programming paradigms is the lack of "re-transmission discipline". The hop-to-hop protocols provide false hopes of data communication reliability. Programmers are at a loss when dealing with timeout events: do we retransmit or not retransmit? If not, data would be lost. If we do, what is the probability of improvement from the last timeout? | |||
| To eliminate infrastructure single-point failures requires complete program and data decoupling from hardware components -- a programming paradigm shift from the hop-to-hop communication paradigms. | |||
| ==Active Content Addressable Networking (ACAN)== | |||
| Content addressable networks are peer-to-peer(P2P) networks. There are many proposed contend addressable networks (CAN) <ref>https://en.wikipedia.org/wiki/Content-addressable_network</ref>. These include distributed hash table (DHT)<ref>https://en.wikipedia.org/wiki/Distributed_hash_table</ref>, and projects: Chord<ref>https://en.wikipedia.org/wiki/Chord_(peer-to-peer)</ref>, Pastry<ref>https://en.wikipedia.org/wiki/Pastry_(DHT)</ref> and Tapestry<ref>https://en.wikipedia.org/wiki/Tapestry_(DHT)</ref>. These networks focused on information retrieval efficiency. | |||
| We need information retrieval efficiency and the freedom of multiplexing hardware components in real time. We call this ACAN (Active Content Addressable Networking). | |||
| The idea of ACAN is to enable automated parallel processing based on application's data retrieval patterns without explicitly building parallel SIMD, MIMD and pipeline clusters. This was inspired by the failures from the early-day dataflow machine research <ref>https://en.wikipedia.org/wiki/Dataflow</ref>. The earlier dataflow computer research failed to recognize the importance of parallel processing granularity<ref>https://en.wikipedia.org/wiki/Dataflow_architecture</ref>. The meticulously designed multiprocessor architectures failed to compete against simpler single processors. | |||
| The parallel processing granularity optimization problem needs to balance the disparity of processing and communication capabilities in an infrastructure. The solution can be modeled as the Brachistochrone problem <ref>https://www.researchgate.net/publication/319996786_Scalability_Dilemma_and_Statistic_Multiplexed_Computing_-_A_Theory_and_Experiment</ref>, an ancient mathematical puzzle solved by Johnann Bernoulli <ref>https://en.wikipedia.org/wiki/Brachistochrone_curve</ref>. | |||
| On surface, parallel computing using P2P network will be slower than direct hop-to-hop networks, such as MPI -- the commonly used TOP500 supercomputer benchmark protocol. The problem is that hop-to-hop protocol favor fixed parallel processing granularity due to its rigid programming. | |||
| ACAN used the Tuple Space abstraction for automated data-parallel computing <ref>J. Shi, Statistic Multiplexed Computing System for Network-Scale Reliable High Performance Services, US PTO, #US 11,588,926 B2, granted February 21, 2023 </ref>. | |||
| Computing experiments demonstrated that optimal granularity tuning using ACAN can out-performance fixed partitioned MPI program by big margins<ref>https://www.researchgate.net/publication/319996786_Scalability_Dilemma_and_Statistic_Multiplexed_Computing_-_A_Theory_and_Experiment</ref>. | |||
| Similar experiments were conducted on ACAN storage performance against the Hadoop distributed file system <ref>https://www.researchgate.net/publication/285579358_AnkaCom_A_Development_and_Experiment_for_Extreme_Scale_Computing</ref>. | |||
| Programs in ACAN are "stateless" programs that can run on any computer in ACAN. There is no single-point of failure in the infrastructure. | |||
| A more remarkable feature of ACAN applications is the ability to deliver unbounded performance as the infrastructure expands in size with open problem sizes. It is capable to harness multiple quantum computers. | |||
| ==Blockchain Protocols== | |||
| Blockchain protocol also forms P2P network. Although there are different consensus protocols, such as POW (proof of work), POS (proof of stake), POH (proof of history) and POP (proof of possession), etc., the general architecture remains the same: the protocol runs on all computers. Each computer validates transactions subject to the consensus protocol for finality. Once committed to the cryptographically linked chain, the transactions are immutable. | |||
| POW protocol is not environmentally friendly that consumes too much electricity for solving the mining puzzles. Other consensus protocols are more energy efficient but all suffer the same scaling challenges: expanding the network and growing the ledger will bring down the network performance gradually. The storage requirements are monotonically increasing. The network will eventually crash when all nodes are storage saturated. | |||
| The blockchain protocols avoided the hop-to-hop protocol trap by using various "gossip" protocols. In theory, there should be no single-point failure in any blockchain network and applications. | |||
| ===Consensus=== | |||
| ====Proof of Work==== | |||
| ====Proof of Stake==== | |||
| <references /> | |||
Revision as of 19:14, 23 June 2023
Introduction
SMC stands for Statistic Multiplexed Computing and Communication. The idea was inspired by von Neumann's 1952 paper on probabilistic logics and synthesis of reliable organisms using unreliable components [1]. Building fault tolerant mission critical system was the triggering use case.
Modern computer applications must rely on networking services. Every networked application forms its own infrastructure consisting of network, processing, storage and software components. A data center or cloud service can duplex multiple applications on the same hardware. The definition of infrastructure remains with each application.
The challenge of building fault tolerant mission critical application is deciding how the programs should communicate with other programs and devices.
Since every hardware component has finite service life, in theory, mission critical software programs must be completely decoupled from hardware components, such as networking, processing, storage, sensors and other devices. This decoupling affords hardware redundancy management freedom without changing programs for each hardware configuration.
This simple requirement seems very difficult to satisfy in practice.
Canary In The Coal Mine
The TCP/IP protocol was an experimental protocol for connecting arbitrary two computers in the world-wide network as the "canary in a coal mine". The OSI model [2] is a conceptual model from the ISO (International Organization for Standardization) that provides "a common basis for the coordination of standards development for the purpose of systems interconnection [3]. Its 7-layer model is widely taught in universities and technology schools.
For every connection-oriented socket circuit, all commercial implementations guarantees "order-preserved lossless data communication", packet losses, duplicates and errors are automatically corrected in the lower layer protocols. It seems perfectly reasonable to deploy this "hop-to-hop" connected-oriented TCP sockets for mission critical applications.
Unfortunately, the fine-prints in the OSI model specification never caught wide-spread attention. The "order-preserved lossless data communication" is guaranteed only if both the sender and receiver are 100% reliable. If the sender or the receiver can crash arbitrarily, then reliable communication is impossible [4]. In practice, we call the TCP/IP protocol the "best effort" reliable protocol for all nodes are indeed try their best to forward packets and repair errors and duplicates. Building mission critical application using these protocols is inappropriate.
The problem is that it is impossible to eliminate single-point failures. The "canary" worked well only for communication applications where data losses and service downtimes can be tolerated to some extent. For mission critical applications, such as banking, trading, air/sea/land navigation, service crashes can cause irreparable damages and loss of human lives.
Even for banking applications, arbitrary transaction losses cannot be eliminated completely. Banks resort to legal measures to compensate verified customer losses.
In business, the IT professionals simply try to minimize "opportunity costs" than solving the root causes.
Fruits of Poisonous Tree
The "hop-to-hop" protocol is the DNA of existing enterprise system program paradigms. These include RPC (remote procedure call), MPI (message passing interface), RMI (remote method invocation) and distributed share-memory. These are the only means to communicate and retrieve data from a remote computer.
Applications built using these protocols form infrastructures that have many single-point failures. Every such point failure can bring down the entire enterprise.
The "Achilles Heel" of the legacy enterprise programming paradigms is the lack of "re-transmission discipline". The hop-to-hop protocols provide false hopes of data communication reliability. Programmers are at a loss when dealing with timeout events: do we retransmit or not retransmit? If not, data would be lost. If we do, what is the probability of improvement from the last timeout?
To eliminate infrastructure single-point failures requires complete program and data decoupling from hardware components -- a programming paradigm shift from the hop-to-hop communication paradigms.
Active Content Addressable Networking (ACAN)
Content addressable networks are peer-to-peer(P2P) networks. There are many proposed contend addressable networks (CAN) [5]. These include distributed hash table (DHT)[6], and projects: Chord[7], Pastry[8] and Tapestry[9]. These networks focused on information retrieval efficiency.
We need information retrieval efficiency and the freedom of multiplexing hardware components in real time. We call this ACAN (Active Content Addressable Networking).
The idea of ACAN is to enable automated parallel processing based on application's data retrieval patterns without explicitly building parallel SIMD, MIMD and pipeline clusters. This was inspired by the failures from the early-day dataflow machine research [10]. The earlier dataflow computer research failed to recognize the importance of parallel processing granularity[11]. The meticulously designed multiprocessor architectures failed to compete against simpler single processors.
The parallel processing granularity optimization problem needs to balance the disparity of processing and communication capabilities in an infrastructure. The solution can be modeled as the Brachistochrone problem [12], an ancient mathematical puzzle solved by Johnann Bernoulli [13].
On surface, parallel computing using P2P network will be slower than direct hop-to-hop networks, such as MPI -- the commonly used TOP500 supercomputer benchmark protocol. The problem is that hop-to-hop protocol favor fixed parallel processing granularity due to its rigid programming.
ACAN used the Tuple Space abstraction for automated data-parallel computing [14].
Computing experiments demonstrated that optimal granularity tuning using ACAN can out-performance fixed partitioned MPI program by big margins[15].
Similar experiments were conducted on ACAN storage performance against the Hadoop distributed file system [16].
Programs in ACAN are "stateless" programs that can run on any computer in ACAN. There is no single-point of failure in the infrastructure.
A more remarkable feature of ACAN applications is the ability to deliver unbounded performance as the infrastructure expands in size with open problem sizes. It is capable to harness multiple quantum computers.
Blockchain Protocols
Blockchain protocol also forms P2P network. Although there are different consensus protocols, such as POW (proof of work), POS (proof of stake), POH (proof of history) and POP (proof of possession), etc., the general architecture remains the same: the protocol runs on all computers. Each computer validates transactions subject to the consensus protocol for finality. Once committed to the cryptographically linked chain, the transactions are immutable.
POW protocol is not environmentally friendly that consumes too much electricity for solving the mining puzzles. Other consensus protocols are more energy efficient but all suffer the same scaling challenges: expanding the network and growing the ledger will bring down the network performance gradually. The storage requirements are monotonically increasing. The network will eventually crash when all nodes are storage saturated.
The blockchain protocols avoided the hop-to-hop protocol trap by using various "gossip" protocols. In theory, there should be no single-point failure in any blockchain network and applications.
Consensus
Proof of Work
Proof of Stake
- ↑ https://static.ias.edu/pitp/archive/2012files/Probabilistic_Logics.pdf
- ↑ https://en.wikipedia.org/wiki/OSI_model
- ↑ https://www.iso.org/standard/20269.html
- ↑ https://groups.csail.mit.edu/tds/papers/Lynch/jacm93.pdf
- ↑ https://en.wikipedia.org/wiki/Content-addressable_network
- ↑ https://en.wikipedia.org/wiki/Distributed_hash_table
- ↑ https://en.wikipedia.org/wiki/Chord_(peer-to-peer)
- ↑ https://en.wikipedia.org/wiki/Pastry_(DHT)
- ↑ https://en.wikipedia.org/wiki/Tapestry_(DHT)
- ↑ https://en.wikipedia.org/wiki/Dataflow
- ↑ https://en.wikipedia.org/wiki/Dataflow_architecture
- ↑ https://www.researchgate.net/publication/319996786_Scalability_Dilemma_and_Statistic_Multiplexed_Computing_-_A_Theory_and_Experiment
- ↑ https://en.wikipedia.org/wiki/Brachistochrone_curve
- ↑ J. Shi, Statistic Multiplexed Computing System for Network-Scale Reliable High Performance Services, US PTO, #US 11,588,926 B2, granted February 21, 2023
- ↑ https://www.researchgate.net/publication/319996786_Scalability_Dilemma_and_Statistic_Multiplexed_Computing_-_A_Theory_and_Experiment
- ↑ https://www.researchgate.net/publication/285579358_AnkaCom_A_Development_and_Experiment_for_Extreme_Scale_Computing