Gatlingを使って、actix-web上でのRedisクレートを比較してみる

この記事はナイル Advent Calendar 2021 19日目の記事です。

前回の記事ではactix-webでRedisを扱えるように、いくつかのRedisクレートで実装した。この記事ではどのクレートを使うのがいいか、性能を比較してみる。

やりたいこと

前回作ったactix-web+Redisのサンプルで実装したRedis用クレートで、最も負荷に強いものを探す。
負荷をかける際にはGatlingを使う。

本気の性能検証をする際にはulimitなど設定をする必要がある。
Gatling - Operations
今回はざっくり調べたいだけなので、Dockerで適当にやる。エンドポイント間のパフォーマンス比較くらいならこれでも問題ないはず。

実験をするマシンはXPS13(9300)、CPUはi7-1065G7、メモリ16GB、OSはArch Linux。

Gatlingについて

Gatlingは負荷テストツール。
アクセスしまくったときにレスポンスがどれくらい遅くなるかとか、そもそも200 OKが返らなくなってしまうかとか、そういうことを調べるときに使う。
Webアプリケーションで、パフォーマンスが非常に重要なものは、こういうものでテストをしてからリリースをするということも多いはず。

Gatlingの特色を書きたいが、他の負荷テストツールをあまり使ったことがないのであまり比較できない。
よく使われる簡易負荷テストツールApache Benchでできないこととしては、レスポンスで受け取った内容をキャプチャして、以後のリクエストに使えるというのがある。今回もこの機能を使う。(おそらく同様のツールならどれでもある機能だと思う。)

GatlingはScala製で、テストスクリプトもScalaで書く…と思っていたが、バージョン3.7からはJavaとKotlinもサポートされたようだ。3.7ではいろいろ変わっているようなので注意。これまで ${} で参照していたのが #{} になるなど。
Gatling - Migrating from 3.6 to 3.7
テストスクリプトはGatling独自言語のように書くことになるので、ScalaにせよJavaにせよ、言語の知識はあまり必要でないと思った(少なくとも単純な場合は)。

(他の負荷テストツールとして、最近はK6というものも人気があるらしい。こっちもよさそう。)

やってみた

これを作った: docker_gatling。DockerでGatlingを動かし、ホストで動いているアプリケーションにアクセスしまくる。
アプリのコードはGitHubに上がっている通りで、コネクションプールのサイズは30。リリースビルドで試した。

スクリプトはこんな感じ。 setUp() の中のうち、実行したい1行だけ残して、他はコメントアウトした状態で実行する。一度にすべてのエンドポイントごとのテストをするというのはできないようなので、面倒だが1つずつやる。(なにかやり方があるのかも。)

package computerdatabase

import scala.concurrent.duration._

import io.gatling.core.Predef._
import io.gatling.http.Predef._

class SampleActixWebWithRedis extends Simulation {
  // val baseUrl = "http://host.docker.internal:8080" // Mac/Windowsはこちらを使う
  val baseUrl = "http://localhost:8080"

  val userAgent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36"

  val httpProtocol = http
    .baseUrl(baseUrl)
    .acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
    .doNotTrackHeader("1")
    .acceptLanguageHeader("en-US,en;q=0.5")
    .acceptEncodingHeader("gzip, deflate")
    .userAgentHeader(userAgent)

  def createScn(
    title: String,
    path: String,
  ) = scenario(title)
    .exec(
      http("testing set...")
        .get(path)
        .check(status.in(200))
        .check(
          regex("""([0-9a-z\-]+)""")
            .saveAs("uuid"),
        ),
    )
    .exec(
      http("testing get...")
        .get(s"${path}/#{uuid}")
        .check(status.in(200)),
    )

  def createScenario(
    name: String,
    path: String,
    usersPerSec: Int,
    duration: Int
  ) = {
    createScn(name, path).inject(constantUsersPerSec(usersPerSec) during(duration seconds)).protocols(httpProtocol)
  }

  setUp(
    createScenario("direct", "/direct", 100, 10),
    // createScenario("r2d2", "/r2d2", 100, 10),
    // createScenario("bb8", "/bb8", 100, 10),
    // createScenario("deadpool", "/deadpool", 100, 10),
    // createScenario("mobc", "/mobc", 100, 10),
    // createScenario("alt_r2d2", "/alt_r2d2", 100, 10),
  )
}

上の状態だと、まず http://localhost:8080/direct にアクセスし、200 OKかチェックする。そしてレスポンスの文字列を取得し、 uuid として保存。そのあと http://localhost:8080/direct/{uuid} にアクセスし、200 OKかチェックする。
これを毎秒100回ずつ(100ユーザ)、10秒間にわたってテストし続ける。

実行すると以下のような出力が出る。全体のアクセス数がいくつで、うち何個が期待通りだったか、最大・最小のレスポンス時間はどれくらいだったか、800ms未満のレスポンスはいくつあったか、などがわかる。
一番下に書いてあるHTMLをブラウザで開けば、詳細でグラフィカルなレポートも見れる。

================================================================================
---- Global Information --------------------------------------------------------
> request count                                       2000 (OK=2000   KO=0     )
> min response time                                      1 (OK=1      KO=-     )
> max response time                                     23 (OK=23     KO=-     )
> mean response time                                     2 (OK=2      KO=-     )
> std deviation                                          1 (OK=1      KO=-     )
> response time 50th percentile                          2 (OK=2      KO=-     )
> response time 75th percentile                          3 (OK=3      KO=-     )
> response time 95th percentile                          3 (OK=3      KO=-     )
> response time 99th percentile                          5 (OK=5      KO=-     )
> mean requests/sec                                    200 (OK=200    KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                          2000 (100%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                            0 (  0%)
> failed                                                 0 (  0%)
================================================================================

Reports generated in 0s.
Please open the following file: /gatling/results/sampleactixwebwithredis-20211220002129635/index.html

調べ方としては、このGatlingの出力で、「t < 800 ms」の部分が90%を下回るまでユーザ数を100単位で増やしていく。
…という方法でやるつもりだったが、何回も試していたらロードアベレージが9を超えているのに気付き、このままではPCが壊れると思いやめた。

代わりにユーザ数1500で固定して、OKだった割合とレスポンス時間の中央値を見る。

結果

予想としては、非同期なコネクションプールを使うのが最速、コネクションプールがないのが最遅。
なのでパフォーマンスの良さは以下のように予想していた。

bb8, deadpool, mobc > r2d2, alt_r2d2 > direct

上に書いた方法で測定した結果は以下。

endpoint	OKの比率[%]	中央値[ms]
/direct	25	299
/r2d2	100	146
/bb8	100	120
/deadpool	29	329
/mobc	97	348
/alt_r2d2	100	178

つまり 自分の環境では パフォーマンスの良さはこうなった。

bb8, r2d2, alt_r2d2 > mobc > direct, deadpool

おわりに

正直、かなり納得がいかない。

deadpoolがこんなに性能が悪いわけがないと思う。時間をおいて何度か測定してみたが変わらない。Rust側でなにか実装を間違えていたのだろうか？
同期的なr2d2がこんなに早いというのも不思議だ。

もう少し調べてみたい。
自分の環境では こうだったが、どの環境でもこうなるわけではないはずなので、これをもって「○○はダメ」などと思わないでほしい。

補足: Dockerコンテナ内からホストへアクセスする

これはいつも悩むのでメモ。今回はDockerで動かしているGatlingから、ホストで動いているアプリケーションを参照することになる。これをどうやるか？

おそらく以下のようにするのが一番楽だと思う。他の方法とパフォーマンスの差が出るのかはわからない。

ホストがLinuxの場合
docker run のとき --network host をつける。コンテナからはホストを localhost で参照できる。
- Use host networking | Docker Documentation
ホストがMac/Windowsのとき(Docker Desktop for Mac/Windows)
--network のオプションはないので普通に docker run する。コンテナからホストを localhost ではなく host.docker.internal で参照する。
- Networking features in Docker Desktop for Mac | Docker Documentation
- Networking features in Docker Desktop for Windows | Docker Documentation

結果の詳細

direct

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      30000 (OK=7630   KO=22370 )
> min response time                                      0 (OK=4      KO=0     )
> max response time                                   7376 (OK=7376   KO=3852  )
> mean response time                                   494 (OK=572    KO=468   )
> std deviation                                        725 (OK=781    KO=702   )
> response time 50th percentile                        299 (OK=393    KO=269   )
> response time 75th percentile                        575 (OK=734    KO=488   )
> response time 95th percentile                       1680 (OK=1717   KO=1670  )
> response time 99th percentile                       3700 (OK=3812   KO=3681  )
> mean requests/sec                                   2500 (OK=635.833 KO=1864.167)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                          6221 ( 21%)
> 800 ms < t < 1200 ms                                 874 (  3%)
> t > 1200 ms                                          535 (  2%)
> failed                                             22370 ( 75%)
---- Errors --------------------------------------------------------------------
> status.find.in(200), but actually found 404                     11234 (50.22%)
> status.find.in(200), but actually found 500                     11136 (49.78%)
================================================================================

r2d2

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      30000 (OK=30000  KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                   3376 (OK=3376   KO=-     )
> mean response time                                   246 (OK=246    KO=-     )
> std deviation                                        283 (OK=283    KO=-     )
> response time 50th percentile                        146 (OK=146    KO=-     )
> response time 75th percentile                        355 (OK=355    KO=-     )
> response time 95th percentile                        711 (OK=711    KO=-     )
> response time 99th percentile                       1605 (OK=1605   KO=-     )
> mean requests/sec                                   3000 (OK=3000   KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         29082 ( 97%)
> 800 ms < t < 1200 ms                                 436 (  1%)
> t > 1200 ms                                          482 (  2%)
> failed                                                 0 (  0%)
================================================================================

bb8

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      30000 (OK=30000  KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                   3773 (OK=3773   KO=-     )
> mean response time                                   261 (OK=261    KO=-     )
> std deviation                                        432 (OK=432    KO=-     )
> response time 50th percentile                        120 (OK=120    KO=-     )
> response time 75th percentile                        358 (OK=358    KO=-     )
> response time 95th percentile                        873 (OK=873    KO=-     )
> response time 99th percentile                       1858 (OK=1858   KO=-     )
> mean requests/sec                                   3000 (OK=3000   KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         28163 ( 94%)
> 800 ms < t < 1200 ms                                 976 (  3%)
> t > 1200 ms                                          861 (  3%)
> failed                                                 0 (  0%)
================================================================================

deadpool

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      30000 (OK=8634   KO=21366 )
> min response time                                      0 (OK=4      KO=0     )
> max response time                                   9577 (OK=9577   KO=8991  )
> mean response time                                   677 (OK=801    KO=627   )
> std deviation                                       1237 (OK=1383   KO=1169  )
> response time 50th percentile                        329 (OK=436    KO=288   )
> response time 75th percentile                        678 (OK=810    KO=550   )
> response time 95th percentile                       3560 (OK=3308   KO=3573  )
> response time 99th percentile                       7803 (OK=8575   KO=7697  )
> mean requests/sec                                2307.692 (OK=664.154 KO=1643.538)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                          6404 ( 21%)
> 800 ms < t < 1200 ms                                1041 (  3%)
> t > 1200 ms                                         1189 (  4%)
> failed                                             21366 ( 71%)
---- Errors --------------------------------------------------------------------
> status.find.in(200), but actually found 404                     10732 (50.23%)
> status.find.in(200), but actually found 500                     10634 (49.77%)
================================================================================

mobc

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      30000 (OK=29065  KO=935   )
> min response time                                      0 (OK=0      KO=0     )
> max response time                                   7179 (OK=7179   KO=3782  )
> mean response time                                   633 (OK=637    KO=529   )
> std deviation                                        969 (OK=969    KO=940   )
> response time 50th percentile                        348 (OK=351    KO=230   )
> response time 75th percentile                        774 (OK=781    KO=505   )
> response time 95th percentile                       3393 (OK=3323   KO=3576  )
> response time 99th percentile                       4042 (OK=4050   KO=3719  )
> mean requests/sec                                   2500 (OK=2422.083 KO=77.917)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         22111 ( 74%)
> 800 ms < t < 1200 ms                                3291 ( 11%)
> t > 1200 ms                                         3663 ( 12%)
> failed                                               935 (  3%)
---- Errors --------------------------------------------------------------------
> status.find.in(200), but actually found 404                       473 (50.59%)
> status.find.in(200), but actually found 500                       462 (49.41%)
================================================================================

alt_r2d2

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      30000 (OK=30000  KO=0     )
> min response time                                      1 (OK=1      KO=-     )
> max response time                                   3997 (OK=3997   KO=-     )
> mean response time                                   314 (OK=314    KO=-     )
> std deviation                                        413 (OK=413    KO=-     )
> response time 50th percentile                        178 (OK=178    KO=-     )
> response time 75th percentile                        426 (OK=426    KO=-     )
> response time 95th percentile                        856 (OK=856    KO=-     )
> response time 99th percentile                       1689 (OK=1689   KO=-     )
> mean requests/sec                                2727.273 (OK=2727.273 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         28189 ( 94%)
> 800 ms < t < 1200 ms                                 962 (  3%)
> t > 1200 ms                                          849 (  3%)
> failed                                                 0 (  0%)
================================================================================