介绍 | 大狗精读 PBFT 论文（二）

李大狗
发布于 2020-03-17 17:07
阅读 7546

继续来读经典论文吧ヾ(◍°∇°◍)ﾉﾞ

## 1 原文与翻译

> **1 Introduction**
> Malicious attacks and software errors are increasingly common. The growing reliance of industry and government on online information services makes malicious attacks more attractive and makes the consequences of successful attacks more serious. In addition, the number of software errors is increasing due to the growth in size and complexity of software. Since malicious attacks and software errors can cause faulty nodes to exhibit Byzantine (i.e., arbitrary) behavior, Byzantine-fault-tolerant algorithms are increasingly important.
> 恶意攻击和软件错误越来越普遍。行业和政府对在线信息服务的日益依赖使恶意攻击更具吸引力，并且使成功攻击的后果更加严重。另外，由于软件系统的规模和复杂性的增加，软件系统错误的数量也在增加。由于恶意攻击和软件错误可能导致故障节点表现出拜占庭（即任意）行为，因此拜占庭容错算法变得越来越重要。
> 
> This paper presents a new, practical algorithm for state machine replication that tolerates Byzantine faults. The algorithm offers both liveness and safety provided at most (n-1)/3 out of a total of replicas are simultaneously faulty. This means that clients eventually receive replies to their requests and those replies are correct according to linearizability .The algorithm works in asynchronous systems like the Internet and it incorporates important optimizations that enable it to perform efficiently.
> 
> 本文提出了一种新的实用的状态机复制算法，可以容忍拜占庭式错误。该算法同时提供了活动性和安全性，但同时最多有 ( n - 1 ) / 3 个副本同时出现故障。这意味着客户端最终会收到对其请求的答复，并且遵循线性一致性（ linearizability ），这些答复是正确的。该算法在 Internet 等异步系统中运行，并且结合了重要的优化功能，使其能够高效执行。
> 
> There is a significant body of work on agreement and replication techniques that tolerate Byzantine faults. However, most earlier work either concerns techniques designed to demonstrate theoretical feasibility that are too inefficient to be used in practice, or assumes synchrony, i.e., relies on known bounds on message delays and process speeds. The systems closest to ours, Rampart and SecureRing , were designed to be practical, but they rely on the synchrony assumption for correctness, which is dangerous in the presence of malicious attacks. An attacker may compromise the safety of a service by delaying non-faulty nodes or the communication between them untilthey are tagged asfaulty and excluded from the replica group. Such a denial-of-service attack is generally easier than gaining control over a non-faulty node.
> 
> 在协议和复制技术上可以容忍拜占庭式错误的工作量很大。 但是，大多数较早的工作要么涉及「旨在证明理论上的」可行性的技术，而这些技术的效率太低而无法在实践中使用，要么涉及同步性，即依赖消息延迟和处理速度在已知范围内。 最接近我们的系统， Rampart 和 SecureRing 被设计为实用的，但是它们依靠同步假设来确保正确性，这在存在恶意攻击的情况下是危险的。 攻击者可能会延迟非故障节点或它们之间的通信，直到它们被标记为故障节点并从副本组中排除，从而损害服务的安全性。 通常，这种服务拒绝攻击（DOS）比获得对非故障节点的控制要容易得多。
>  
> Our algorithm is not vulnerable to this type of attack because it does not rely on synchrony for safety. In addition, it improves the performance of Rampart and SecureRing by more than an order of magnitude as explained in Section 7. It uses only one message round trip to execute read-only operations and two to execute read-write operations. Also, it uses an efficient authentication scheme based on message authentication codes during normal operation; public-key cryptography, which was cited as the major latency [29] and throughput [22] bottleneck in Rampart, is used only when there are faults.  
>   
> 我们的算法不容易受到此类攻击，因为它不依赖同步来保证安全。 此外，如第 7 节所述，它将 Rampart 和SecureRing 的性能提高了一个数量级以上。它仅使用一次消息往返执行只读操作，使用两次消息往返执行读写操作。 此外，它在正常操作期间使用基于消息身份验证代码的有效身份验证方案； 被称为 Rampart 中的主要延迟[29]和吞吐量[22]瓶颈的公钥密码学，仅在出现故障时才使用。  
>   
> To evaluate our approach, we implemented a replication library and used it to implement a real service: a Byzantine-fault-tolerant distributed file system that supports the NFS protocol. We used the Andrew benchmark [15] to evaluate the performance of oursystem. The results show that our system is only 3% slower than the standard NFS daemon in the Digital Unix kernel during normal-case operation.  
>   
> 为了评估我们的方法，我们实现了一个复制库，并使用它来实现真实的服务：支持 NFS 协议的拜占庭容错分布式文件系统。 我们使用 Andrew 基准[15]来评估我们系统的性能。 结果表明，在正常情况下，我们的系统仅比 Digital Unix 内核中的标准 NFS 后台程序慢 3％。  
>   
> Thus, the paper makes the following contributions:  
>   
> * It describes the first state-machine replication protocol that correctly survives Byzantine faults in asynchronous networks.  
>   
> * It describes a number of important optimizations that allow the algorithm to perform well so that it can be used in real systems.  
>   
> * It describes the implementation of a Byzantine-faulttolerant distributed file system.  
>   
> * It provides experimental results that quantify the cost of the replication technique.  
>   
>   
> 因此，本篇论文做出了如下贡献：  
>   
> * 描述了第一个状态机复制协议，该协议可以在异步网络中正确地克服拜占庭式错误。  
>   
> * 描述了许多重要的优化，这些优化使算法能够很好地执行，因此可以在实际系统中使用。  
>   
> * 描述了拜占庭容错分布式文件系统的实现。  
>   
> * 提供了复制技术的量化的成本的试验结果。  
>   
>   
> The remainder of the paper is organized as follows. We begin by describing our system model, including our failure assumptions. Section 3 describes the problem solved by the algorithm and states correctness conditions. The algorithm is described in Section 4 and some important optimizations are described in Section 5. Section 6 describes our replication library and how we used it to implement a Byzantine-fault-tolerant NFS. Section 7 presents the results of our experiments. Section 8 discusses related work. We conclude with a summary of what we have accomplished and a discussion of future research directions.  
>   
> 在本文的其余部分安排如下。 我们首先描述我们的系统模型，包括我们的故障假设。 第 3 节描述了该算法解决的问题并陈述了正确性条件。 该算法在第 4 节中进行了介绍，而一些重要的优化则在第 5 节中进行了介绍。第 6 节介绍了我们的复制库以及我们如何使用它来实现拜占庭式容错的 NFS 。 第7节介绍了我们的实验结果。 第 8 节讨论相关工作。 最后，我们总结了已完成的工作并讨论了未来的研究方向。

## 2 概念解析

### 2.1 「活动性」与「安全性」

这两个概念由《[The Byzantine Generals Problem](https://people.eecs.berkeley.edu/~luca/cs174/byzantine.pdf)》的作者 [Leslie Lamport](https://en.wikipedia.org/wiki/Leslie_Lamport) 提出，是分布式系统中非常重要的属性。

这两个属性可以这样理解 ——

* **安全性（Safety）：**程序永远不会产生错误的结果，如「数组下标越界」。
    
* **活动性（Liveness）：**程序总会产生预期的结果，如「请求资源一定有返回」。

（来源：[https://lrita.github.io/images/posts/distribution/Safety-and-liveness.pdf](https://lrita.github.io/images/posts/distribution/Safety-and-liveness.pdf)）

在设计分布式系统的时候，要[同时考虑到这两个属性](http://www.bailis.org/blog/safety-and-liveness-eventual-consistency-is-not-safe/)。

> 避免 `死锁` 是保证 liveness 的一个充分条件，虽然这个约束比较弱。但是是比较好验证的，如果算法、逻辑中存在死锁，那么一定不能保证系统的 livenes`。避免`饥饿`也是保证`liveness`的一个充分条件，这个约束比避免死锁更强一些，因为避免饥饿的算法一定是避免死锁的。更强的约束是，保证算法、逻辑能在有限步骤内完成，这样系统一定是`liveness 的，但是这个约束条件很难被验证和证明。
> 
> —— 来自博客文章 [https://lrita.github.io/2018/10/23/safety-and-liveness-in-distributed](https://lrita.github.io/2018/10/23/safety-and-liveness-in-distributed)

### 2.2 线性一致性

线性一致性（Linearizability）的解释来自于这篇论文：

[https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf](https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf)

就是在并发编程里，我们进行了一番操作，得到了一个结果。然后这个操作的运行记录，和按照串行顺序一步步来的运行记录相一致，我们就能称其为「线性一致的（linearizable）」。

举个例子，小明和小王同时吃饭，他们吃完了饭小红来收拾碗筷，抽象的话就是「小红收拾碗筷」这一事件在「小明吃饭」和「小王吃饭」这两个事件之后。如果按照串行顺序一步步来，「小明吃饭」-> 「小王吃饭」->「小红收拾碗筷」。「小红收拾碗筷」依然在他们两吃饭之后，所以这个过程就是线性一致的。

在分布式系统中，如果我们可以按照操作发生的先后顺序, 构造一个「线性一致的」运行记录, 那么这个系统就具有线性一致性（Linearizability）。

比特币系统可以认为是 Linearizability 的，虽然所有矿工都在并发挖矿，但是「主链」块高顺序递增，始终保持线性一致。

值得注意的一点是，比特币系统中能确定顺序的是块高，而非时间戳。时间戳是一个含糊的参数，因为在没有中心的情况下无法保证全网时间统一。

> 合法的时间戳必须大于前11个区块的中位数，并且小于网络调整时间+2小时。
> 
> — [https://en.bitcoin.it/wiki/Block_timestamp](https://en.bitcoin.it/wiki/Block_timestamp)

更多关于线性一致性的文章：

[https://zhuanlan.zhihu.com/p/42239873](https://zhuanlan.zhihu.com/p/42239873)

[https://zhuanlan.zhihu.com/p/35473164](https://zhuanlan.zhihu.com/p/35473164)

此外，「线性一致性」还是分布式系统重要理论—— CAP 理论中的重要一环，暂且放下不表，之后再另行文章详细说明。

### 2.3 拒绝服务攻击

DOS 攻击被形象称为「洪水攻击」，指的是通过大量的数据或其它方式让服务器的带宽和可用资源过载，从而不能提供正常服务。

在比特维基中提及了这种攻击方法：

[https://en.bitcoin.it/wiki/Weaknesses#Denial_of_Service_.28DoS.29_attacks](https://en.bitcoin.it/wiki/Weaknesses#Denial_of_Service_.28DoS.29_attacks)

对于区块链系统而言，比特币系统因为节点众多，这种攻击的危害较小。

其次是出块节点不确定的区块链系统——因为出块节点不确定，所以 DOS 攻击无法有明确的目标。

对 DOS 攻击抵御最差的是出块节点顺序明确的区块链系统，对他们来说 DOS 攻击只能硬抗。

对于 BFT 系统来说，DOS 可能造成这样一种局面 —— 节点迟迟没有响应，但无法区分究竟是网络不好还是节点因为 DOS 攻击而宕机，这样就会造成可用节点数量减少，系统危险性增加。

### 2.4 NFS 文件系统

NFS 全称 Network File System，是一种分布式文件系统协议，允许用户通过计算机网络远程访问文件，就像访问本地存储一样。

Github 上 NFS-Server 话题下的仓库列表：

> [https://github.com/topics/nfs-server](https://github.com/topics/nfs-server)

### 2.5 状态机

目前我们仅讨论「有限状态机（finite-state machine — FSM ）」。有限状态机，是一种计算机数学模型。任何一个生活中的对象都可以抽象为状态机。例如，一个苹果我们可以抽象其为具有健康/腐坏两种属性的状态机。

在分布式系统中，我们可以将节点中的数据状态抽象为状态机。那么本质上来讲，分布式系统做的是这么一件事：

依据某种既定规则，将某个节点的状态「复制」给其它的节点。在所有时刻，总能认定某一个状态机的状态是这个系统的唯一状态，这个状态被称之为「世界状态（World State）」。

1 原文与翻译

1 Introduction Malicious attacks and software errors are increasingly common. The growing reliance of industry and government on online information services makes malicious attacks more attractive and makes the consequences of successful attacks more serious. In addition, the number of software errors is increasing due to the growth in size and complexity of software. Since malicious attacks and software errors can cause faulty nodes to exhibit Byzantine (i.e., arbitrary) behavior, Byzantine-fault-tolerant algorithms are increasingly important. 恶意攻击和软件错误越来越普遍。行业和政府对在线信息服务的日益依赖使恶意攻击更具吸引力，并且使成功攻击的后果更加严重。另外，由于软件系统的规模和复杂性的增加，软件系统错误的数量也在增加。由于恶意攻击和软件错误可能导致故障节点表现出拜占庭（即任意）行为，因此拜占庭容错算法变得越来越重要。

This paper presents a new, practical algorithm for state machine replication that tolerates Byzantine faults. The algorithm offers both liveness and safety provided at most (n-1)/3 out of a total of replicas are simultaneously faulty. This means that clients eventually receive replies to their requests and those replies are correct according to linearizability .The algorithm works in asynchronous systems like the Internet and it incorporates important optimizations that enable it to perform efficiently.

本文提出了一种新的实用的状态机复制算法，可以容忍拜占庭式错误。该算法同时提供了活动性和安全性，但同时最多有 ( n - 1 ) / 3 个副本同时出现故障。这意味着客户端最终会收到对其请求的答复，并且遵循线性一致性（ linearizability ），这些答复是正确的。该算法在 Internet 等异步系统中运行，并且结合了重要的优化功能，使其能够高效执行。

There is a significant body of work on agreement and replication techniques that tolerate Byzantine faults. However, most earlier work either concerns techniques designed to demonstrate theoretical feasibility that are too inefficient to be used in practice, or assumes synchrony, i.e., relies on known bounds on message delays and process speeds. The systems closest to ours, Rampart and SecureRing , were designed to be practical, but they rely on the synchrony assumption for correctness, which is dangerous in the presence of malicious attacks. An attacker may compromise the safety of a service by delaying non-faulty nodes or the communication between them untilthey are tagged asfaulty and excluded from the replica group. Such a denial-of-service attack is generally easier than gaining control over a non-faulty node.

在协议和复制技术上可以容忍拜占庭式错误的工作量很大。但是，大多数较早的工作要么涉及「旨在证明理论上的」可行性的技术，而这些技术的效率太低而无法在实践中使用，要么涉及同步性，即依赖消息延迟和处理速度在已知范围内。最接近我们的系统， Rampart 和 SecureRing 被设计为实用的，但是它们依靠同步假设来确保正确性，这在存在恶意攻击的情况下是危险的。攻击者可能会延迟非故障节点或它们之间的通信，直到它们被标记为故障节点并从副本组中排除，从而损害服务的安全性。通常，这种服务拒绝攻击（DOS）比获得对非故障节点的控制要容易得多。

Our algorithm is not vulnerable to this type of attack because it does not rely on synchrony for safety. In addition, it improves the performance of Rampart and SecureRing by more than an order of magnitude as explained in Section 7. It uses only one message round trip to execute read-only operations and two to execute read-write operations. Also, it uses an efficient authentication scheme based on message authentication codes during normal operation; public-key cryptography, which was cited as the major latency [29] and throughput [22] bottleneck in Rampart, is used only when there are faults.

我们的算法不容易受到此类攻击，因为它不依赖同步来保证安全。此外，如第 7 节所述，它将 Rampart 和SecureRing 的性能提高了一个数量级以上。它仅使用一次消息往返执行只读操作，使用两次消息往返执行读写操作。此外，它在正常操作期间使用基于消息身份验证代码的有效身份验证方案；被称为 Rampart 中的主要延迟[29]和吞吐量[22]瓶颈的公钥密码学，仅在出现故障时才使用。

To evaluate our approach, we implemented a replication library and used it to implement a real service: a Byzantine-fault-tolerant distributed file system that supports the NFS protocol. We used the Andrew benchmark [15] to evaluate the performance of oursystem. The results show that our system is only 3% slower than the standard NFS daemon in the Digital Unix kernel during normal-case operation.

为了评估我们的方法，我们实现了一个复制库，并使用它来实现真实的服务：支持 NFS 协议的拜占庭容错分布式文件系统。我们使用 Andrew 基准[15]来评估我们系统的性能。结果表明，在正常情况下，我们的系统仅比 Digital Unix 内核中的标准 NFS 后台程序慢 3％。

Thus, the paper makes the following contributions:

It describes the first state-machine replication protocol that correctly survives Byzantine faults in asynchronous networks.

It describes a number of important optimizations that allow the algorithm to perform well so that it can be used in real systems.

It describes the implementation of a Byzantine-faulttolerant distributed file system.

It provides experimental results that quantify the cost of the replication technique.

因此，本篇论文做出了如下贡献：

描述了第一个状态机复制协议，该协议可以在异步网络中正确地克服拜占庭式错误。

描述了许多重要的优化，这些优化使算法能够很好地执行，因此可以在实际系统中使用。

描述了拜占庭容错分布式文件系统的实现。

提供了复制技术的量化的成本的试验结果。

The remainder of the paper is organized as follows. We begin by describing our system model, including our failure assumptions. Section 3 describes the problem solved by the algorithm and states correctness conditions. The algorithm is described in Section 4 and some important optimizations are described in Section 5. Section 6 describes our replication library and how we used it to implement a Byzantine-fault-tolerant NFS. Section 7 presents the results of our experiments. Section 8 discusses related work. We conclude with a summary of what we have accomplished and a discussion of future research directions.

在本文的其余部分安排如下。我们首先描述我们的系统模型，包括我们的故障假设。第 3 节描述了该算法解决的问题并陈述了正确性条件。该算法在第 4 节中进行了介绍，而一些重要的优化则在第 5 节中进行了介绍。第 6 节介绍了我们的复制库以及我们如何使用它来实现拜占庭式容错的 NFS 。第7节介绍了我们的实验结果。第 8 节讨论相关工作。最后，我们总结了已完成的工作并讨论了未来的研究方向。

2 概念解析

2.1 「活动性」与「安全性」

这两个概念由《The Byzantine Generals Problem》的作者 Leslie Lamport 提出，是分布式系统中非常重要的属性。

这两个属性可以这样理解 ——

安全性（Safety）：程序永远不会产生错误的结果，如「数组下标越界」。
活动性（Liveness）：程序总会产生预期的结果，如「请求资源一定有返回」。

（来源：https://lrita.github.io/images/posts/distribution/Safety-and-liveness.pdf）

在设计分布式系统的时候，要同时考虑到这两个属性。

避免 死锁 是保证 liveness 的一个充分条件，虽然这个约束比较弱。但是是比较好验证的，如果算法、逻辑中存在死锁，那么一定不能保证系统的 livenes。避免饥饿也是保证liveness的一个充分条件，这个约束比避免死锁更强一些，因为避免饥饿的算法一定是避免死锁的。更强的约束是，保证算法、逻辑能在有限步骤内完成，这样系统一定是liveness 的，但是这个约束条件很难被验证和证明。

—— 来自博客文章 https://lrita.github.io/2018/10/23/safety-and-liveness-in-distributed