字符串新建，更新，读取

`Rust` 的核心语言中只有一种字符串类型：字符串 slice` str`，它通常以被借用的形式出现，`&str`。由于字符串字面值被储存在程序的二进制输出中，因此字符串字面值也是字符串` slice`。

# 新建字符串
很多 `Vec` 可用的操作在` String` 中同样可用，事实上 `String `被实现为一个带有一些额外保证、限制和功能的字节 `vector` 的封装。其中一个同样作用于 `Vec<T>` 和 `String` 函数的例子是用来新建一个实例的 `new `函数：
```rust
fn main() {
    let s = String::new();
}
```
这新建了一个叫做 `s `的空的字符串，接着我们可以向其中装载数据。通常字符串会有初始数据，因为我们希望一开始就有这个字符串。为此，可以使用 `to_string` 方法，它能用于任何实现了 `Display trait `的类型，比如字符串字面值：
```rust
fn main() {
    let s = "hello world".to_string();
}
```
这些代码会创建包含 `hello world` 的字符串。

也可以使用` String::from` 函数来从字符串字面值创建 `String`。等同于使用 `to_string`：
```rust
fn main() {
    let s = String::from("hello world");
}
```
因为字符串应用广泛，这里有很多不同的用于字符串的通用 `API `可供选择。其中一些可能看起来多余，不过都有其用武之地！在这个例子中，`String::from` 和 `.to_string` 最终做了完全相同的工作，所以如何选择就是代码风格与可读性的问题了。

# 更新字符串
`String` 的大小可以增加，其内容也可以改变，就像可以放入更多数据来改变 `Vec` 的内容一样。另外，可以方便的使用 `+` 运算符或 `format!` 宏来拼接 `String` 值。

## 使用 `push_str` 和 `push` 附加字符串
可以通过 `push_str` 方法来附加字符串 `slice`，从而使 `String` 变长：
```rust
fn main() {
    let mut s = String::from("hello ");
    s.push_str("world");
}
```
执行这两行代码之后，`s` 将会包含 `hello world`。`push_str` 方法采用字符串 `slice`，因为我们并不需要获取参数的所有权。我们希望在将 s2 的内容附加到 s1 之后还能使用它：
```rust
fn main() {
    let mut s1 = String::from("hello ");
    let s2 = "world";
    s1.push_str(s2);
    println!("s2 : {}", s2);
}
```
如果 `push_str` 方法获取了 `s2` 的所有权，就不能在最后一行打印出其值了。

`push` 方法被定义为获取一个单独的字符作为参数，并附加到 `String` 中。使用 push 方法将字母 `"l"` 加入 `String` 的代码。`
```rust
fn main() {
    let mut s = String::from("lo");
    s.push('l');
}
```
执行这些代码之后，`s` 将会包含 `“lol”`，注意`push`是`char`类型。

## 使用 `+` 运算符或 `format!` 宏拼接字符串
通常你会希望将两个已知的字符串合并在一起。一种办法是像这样使用` + `运算符：
```rust
fn main() {
    let s1 = String::from("hello ");
    let s2 = String::from("world");
    let s3 = s1 + &s2; // 注意 s1 被移动了，不能继续使用
}
```
执行完这些代码之后，字符串 `s3` 将会包含 `Hello world`。`s1` 在相加后不再有效的原因，和使用 `s2` 的引用的原因，与使用 `+` 运算符时调用的函数签名有关。`+` 运算符使用了 `add `函数，这个函数签名看起来像这样：
```rust
fn add(self, s: &str) -> String {...}
```
如果想要级联多个字符串，`+ `的行为就显得笨重了：
```rust
fn main() {
    let s1 = String::from("l");
    let s2 = String::from("o");
    let s3 = String::from("l");
    let s = s1 + "-" + &s2 + "-" + &s3;
}
```
这时 `s` 的内容会是 `“l-o-l”`。在有这么多 `+` 和 `"` 字符的情况下，很难理解具体发生了什么。对于更为复杂的字符串链接，可以使用` format!` 宏：
```rust
fn main() {
    let s1 = String::from("l");
    let s2 = String::from("o");
    let s3 = String::from("l");
    let s = format!("{}-{}-{}", s1, s2, s3);
}
```
这些代码也会将` s `设置为` “l-o-l”`。`format!` 与 `println!` 的工作原理相同，不过不同于将输出打印到屏幕上，它返回一个带有结果内容的 `String`。这个版本就好理解的多，宏 `format!` 生成的代码使用引用所以不会获取任何参数的所有权。

# 字符串索引
在很多语言中，通过索引来引用字符串中的单独字符是有效且常见的操作。然而在 `Rust` 中，如果你尝试使用索引语法访问 `String` 的一部分，会出现一个错误：
```rust
fn main() {
    let s1 = String::from("lol");
    let s2 = s1[0];
}
```
这段代码会导致如下错误：
```
$ cargo run
   Compiling variables v0.1.0 (/projects/variables)
error[E0277]: the type `String` cannot be indexed by `{integer}`
 --> src/main.rs:3:14
  |
3 |     let s2 = s1[0];
  |              ^^^^^ `String` cannot be indexed by `{integer}`
  |
  = help: the trait `Index<{integer}>` is not implemented for `String`
  = help: the following other types implement trait `Index<Idx>`:
            <String as Index<RangeFrom<usize>>>
            <String as Index<RangeFull>>
            <String as Index<RangeInclusive<usize>>>
            <String as Index<RangeTo<usize>>>
            <String as Index<RangeToInclusive<usize>>>
            <String as Index<std::ops::Range<usize>>>

For more information about this error, try `rustc --explain E0277`.
error: could not compile `variables` due to previous error
```
错误和提示说明了全部问题：`Rust` 的字符串不支持索引。那么接下来的问题是，为什么不支持呢？，看例子：
```rust
fn main() {
    let s1 = String::from("lol");
    println!("s1 长度 {}", s1.len());

let s2 = String::from("你好");
    println!("s2 长度 {}", s2.len());
}
```
输出结果：
```
s1 长度 3
s2 长度 6
```
这里每一个字母的 `UTF-8` 编码占用一个字节，一个汉字的 `UTF-8` 编码占用三个字节，`Rust`的字符串实际上是不支持通过下标访问的，但是呢，我们可以通过将其转变成数组的方式访问。

# 字符串遍历
操作字符串每一部分的最好的方法是明确表示需要字符还是字节。对于单独的 `Unicode` 标量值使用` chars `方法。对 `“你好”` 调用 `chars` 方法会将其分开并返回两个` char` 类型的值，接着就可以遍历其结果来访问每一个元素了：
```rust
fn main() {
    let s1 = String::from("你好");
    for s in s1.chars() {
        println!("{}", s);
    }
}
```
这些代码会打印出如下内容：
```
你
好
```
另外 `bytes` 方法返回每一个原始字节，这可能会适合你的使用场景：
```rust
fn main() {
    let s1 = String::from("你好");
    for s in s1.bytes() {
        println!("{}", s);
    }
}
```
这些代码会打印出组成` String` 的` 6` 个字节：
```
228
189
160
229
165
189
```

`Rust`还为我们提供比较安全取索引下标的方法：
```rust
fn main() {
    let s1 = String::from("你好");
    if let Some(s) = s1.chars().nth(0) {
        println!("{}", s);
    } else {
        println!("索引无效")
    }
}
```
这些代码会打印出如下内容：
```
你
```

# 字符串切片
对字符串切片是一件非常危险的事，虽然`Rust`支持，但是我并不推荐。因为`Rust`的字符串`Slice`实际上是切的`bytes`。这也就造成了一个严重后果，如果你切片的位置正好是一个`Unicode`字符的内部，`Rust`会发生`Runtime`的`panic`，导致整个程序崩溃。 因为这个操作是如此的危险：
```rust
fn main() {
    let s1 = String::from("你好");
    println!("{}", &s1[0..3]);
}
```
这里，`s1` 会是一个 `&str`，它包含字符串的头三个字节。早些时候，我们提到了这些汉字都是三个字节长的，所以这意味着 `s1` 将会是 `“你”`。

如果获取 `&s1[0..1]` 会发生什么呢？答案是：`Rust` 在运行时会 `panic`，就跟访问 `vector `中的无效索引时一样：
```rust
fn main() {
    let s1 = String::from("你好");
    println!("{}", &s1[0..1]);
}
```
运行报错
```
 $ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/variables`
thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside '你' (bytes 0..3) of `你好`', src/main.rs:3:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

Rust 的核心语言中只有一种字符串类型：字符串 slicestr，它通常以被借用的形式出现，&str。由于字符串字面值被储存在程序的二进制输出中，因此字符串字面值也是字符串slice。

新建字符串

很多 Vec 可用的操作在String 中同样可用，事实上 String被实现为一个带有一些额外保证、限制和功能的字节 vector 的封装。其中一个同样作用于 Vec<T> 和 String 函数的例子是用来新建一个实例的 new函数：

fn main() {
    let s = String::new();
}

这新建了一个叫做 s的空的字符串，接着我们可以向其中装载数据。通常字符串会有初始数据，因为我们希望一开始就有这个字符串。为此，可以使用 to_string 方法，它能用于任何实现了 Display trait的类型，比如字符串字面值：

fn main() {
    let s = "hello world".to_string();
}

这些代码会创建包含 hello world 的字符串。

也可以使用String::from 函数来从字符串字面值创建 String。等同于使用 to_string：

fn main() {
    let s = String::from("hello world");
}

因为字符串应用广泛，这里有很多不同的用于字符串的通用 API可供选择。其中一些可能看起来多余，不过都有其用武之地！在这个例子中，String::from 和 .to_string 最终做了完全相同的工作，所以如何选择就是代码风格与可读性的问题了。

更新字符串

String 的大小可以增加，其内容也可以改变，就像可以放入更多数据来改变 Vec 的内容一样。另外，可以方便的使用 + 运算符或 format! 宏来拼接 String 值。

使用 `push_str` 和 `push` 附加字符串

可以通过 push_str 方法来附加字符串 slice，从而使 String 变长：

fn main() {
    let mut s = String::from("hello ");
    s.push_str("world");
}

执行这两行代码之后，s 将会包含 hello world。push_str 方法采用字符串 slice，因为我们并不需要获取参数的所有权。我们希望在将 s2 的内容附加到 s1 之后还能使用它：

fn main() {
    let mut s1 = String::from("hello ");
    let s2 = "world";
    s1.push_str(s2);
    println!("s2 : {}", s2);
}

如果 push_str 方法获取了 s2 的所有权，就不能在最后一行打印出其值了。

push 方法被定义为获取一个单独的字符作为参数，并附加到 String 中。使用 push 方法将字母 "l" 加入 String 的代码。`

fn main() {
    let mut s = String::from("lo");
    s.push('l');
}

执行这些代码之后，s 将会包含 “lol”，注意push是char类型。

使用 `+` 运算符或 `format!` 宏拼接字符串

通常你会希望将两个已知的字符串合并在一起。一种办法是像这样使用+运算符：

fn main() {
    let s1 = String::from("hello ");
    let s2 = String::from("world");
    let s3 = s1 + &s2; // 注意 s1 被移动了，不能继续使用
}

执行完这些代码之后，字符串 s3 将会包含 Hello world。s1 在相加后不再有效的原因，和使用 s2 的引用的原因，与使用 + 运算符时调用的函数签名有关。+ 运算符使用了 add函数，这个函数签名看起来像这样：

fn add(self, s: &str) -> String {...}

如果想要级联多个字符串，+的行为就显得笨重了：

fn main() {
    let s1 = String::from("l");
    let s2 = String::from("o");
    let s3 = String::from("l");
    let s = s1 + "-" + &s2 + "-" + &s3;
}

这时 s 的内容会是 “l-o-l”。在有这么多 + 和 " 字符的情况下，很难理解具体发生了什么。对于更为复杂的字符串链接，可以使用format! 宏：

fn main() {
    let s1 = String::from("l");
    let s2 = String::from("o");
    let s3 = String::from("l");
    let s = format!("{}-{}-{}", s1, s2, s3);
}

这些代码也会将s设置为“l-o-l”。format! 与 println! 的工作原理相同，不过不同于将输出打印到屏幕上，它返回一个带有结果内容的 String。这个版本就好理解的多，宏 format! 生成的代码使用引用所以不会获取任何参数的所有权。

字符串索引

在很多语言中，通过索引来引用字符串中的单独字符是有效且常见的操作。然而在 Rust 中，如果你尝试使用索引语法访问 String 的一部分，会出现一个错误：

fn main() {
    let s1 = String::from("lol");
    let s2 = s1[0];
}

这段代码会导致如下错误：

$ cargo run
   Compiling variables v0.1.0 (/projects/variables)
error[E0277]: the type `String` cannot be indexed by `{integer}`
 --> src/main.rs:3:14
  |
3 |     let s2 = s1[0];
  |              ^^^^^ `String` cannot be indexed by `{integer}`
  |
  = help: the trait `Index&lt;{integer}>` is not implemented for `String`
  = help: the following other types implement trait `Index&lt;Idx>`:
            &lt;String as Index&lt;RangeFrom&lt;usize>>>
            &lt;String as Index&lt;RangeFull>>
            &lt;String as Index&lt;RangeInclusive&lt;usize>>>
            &lt;String as Index&lt;RangeTo&lt;usize>>>
            &lt;String as Index&lt;RangeToInclusive&lt;usize>>>
            &lt;String as Index&lt;std::ops::Range&lt;usize>>>

For more information about this error, try `rustc --explain E0277`.
error: could not compile `variables` due to previous error

错误和提示说明了全部问题：Rust 的字符串不支持索引。那么接下来的问题是，为什么不支持呢？，看例子：

fn main() {
    let s1 = String::from("lol");
    println!("s1 长度 {}", s1.len());

    let s2 = String::from("你好");
    println!("s2 长度 {}", s2.len());
}

输出结果：

s1 长度 3
s2 长度 6

这里每一个字母的 UTF-8 编码占用一个字节，一个汉字的 UTF-8 编码占用三个字节，Rust的字符串实际上是不支持通过下标访问的，但是呢，我们可以通过将其转变成数组的方式访问。

字符串遍历

操作字符串每一部分的最好的方法是明确表示需要字符还是字节。对于单独的 Unicode 标量值使用chars方法。对 “你好” 调用 chars 方法会将其分开并返回两个char 类型的值，接着就可以遍历其结果来访问每一个元素了：

fn main() {
    let s1 = String::from("你好");
    for s in s1.chars() {
        println!("{}", s);
    }
}

这些代码会打印出如下内容：

你
好

另外 bytes 方法返回每一个原始字节，这可能会适合你的使用场景：

fn main() {
    let s1 = String::from("你好");
    for s in s1.bytes() {
        println!("{}", s);
    }
}

这些代码会打印出组成String 的6 个字节：

Rust还为我们提供比较安全取索引下标的方法：

fn main() {
    let s1 = String::from("你好");
    if let Some(s) = s1.chars().nth(0) {
        println!("{}", s);
    } else {
        println!("索引无效")
    }
}

这些代码会打印出如下内容：

你

字符串切片

对字符串切片是一件非常危险的事，虽然Rust支持，但是我并不推荐。因为Rust的字符串Slice实际上是切的bytes。这也就造成了一个严重后果，如果你切片的位置正好是一个Unicode字符的内部，Rust会发生Runtime的panic，导致整个程序崩溃。因为这个操作是如此的危险：

fn main() {
    let s1 = String::from("你好");
    println!("{}", &s1[0..3]);
}

这里，s1 会是一个 &str，它包含字符串的头三个字节。早些时候，我们提到了这些汉字都是三个字节长的，所以这意味着 s1 将会是 “你”。

如果获取 &s1[0..1] 会发生什么呢？答案是：Rust 在运行时会 panic，就跟访问 vector中的无效索引时一样：

fn main() {
    let s1 = String::from("你好");
    println!("{}", &s1[0..1]);
}

运行报错

 $ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/variables`
thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside '你' (bytes 0..3) of `你好`', src/main.rs:3:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

学分: 2
分类: Rust
标签:

Rust入门基础到进阶教程

基础篇-String类型

新建字符串

更新字符串

使用 `push_str` 和 `push` 附加字符串

使用 `+` 运算符或 `format!` 宏拼接字符串

字符串索引

字符串遍历

字符串切片

0 条评论

文章目录

Rust入门基础到进阶教程

基础篇-String类型

新建字符串

更新字符串

使用 push_str 和 push 附加字符串

使用 + 运算符或 format! 宏拼接字符串

字符串索引

字符串遍历

字符串切片

0 条评论

文章目录

使用 `push_str` 和 `push` 附加字符串

使用 `+` 运算符或 `format!` 宏拼接字符串