Technology
- 將 Windows IIS 用的 *.pfx 轉換成 Linux 使用的 SSL 憑證
openssl pkcs12 -in ssl.pfx -nodes -out ssl.pem
openssl rsa -in ssl.pem -out ssl.key
openssl x509 -in ssl.pem -out ssl.crt
MongoDB 3.6 & mongoose Issue
使用的 MongoDB server 版本:3.6.7
Crash Log:
pthread_create failed: Resource temporarily unavailable in sharding cluster
Terminating session due to error: InternalError: failed to create service entry worker thread
簡單來說就是 OS 的 connection 用完了,vm.max_map_count 預設上限是 65530,MongoDB Operations Checklist 有提到 production 環境參考設定,顯然不是設定值調校的問題。
繼續找 root cause,發現 log 有大量的連線沒有完全被關掉,一直誤以為是正常的。
[thread4] Starting new replica set monitor for rs/172.31.15.27:27017,172.31.5.133:27017,172.31.5.84:27017
[thread4] Successfully connected to 172.31.5.133:27017 (1 connections now open to 172.31.5.133:27017 with a 5 second timeout)
[ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 172.31.5.84:27017 (1 connections now open to 172.31.5.84:27017 with a 5 second timeout)
[listener] connection accepted from 172.31.15.27:49040 #4 (4 connections now open)
[conn4] received client metadata from 172.31.15.27:49040 conn4: { driver: { name: "MongoDB Internal Client", version: "3.6.7" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
[thread4] Successfully connected to 172.31.15.27:27017 (1 connections now open to 172.31.15.27:27017 with a 5 second timeout)
[thread4] Successfully connected to 172.31.5.84:27017 (1 connections now open to 172.31.5.84:27017 with a 0 second timeout)
[thread4] scoped connection to 172.31.5.84:27017 not being returned to the pool
[thread4] Starting new replica set monitor for rs/172.31.15.27:27017,172.31.5.133:27017,172.31.5.84:27017
[thread4] Successfully connected to 172.31.5.84:27017 (2 connections now open to 172.31.5.84:27017 with a 0 second timeout)
[thread4] scoped connection to 172.31.5.84:27017 not being returned to the pool
[thread4] Starting new replica set monitor for rs/172.31.15.27:27017,172.31.5.133:27017,172.31.5.84:27017
[thread4] Successfully connected to 172.31.5.84:27017 (3 connections now open to 172.31.5.84:27017 with a 0 second timeout)
[thread4] scoped connection to 172.31.5.84:27017 not being returned to the pool
接下來就是無止盡的 scoped connection not being returned to the pool,測試環境累積了三個月也達到了 12.5 萬個 connection 沒有被關掉,但實際上只有 11 connections now open。
最後查到了幾個 ticket,似乎是個 3.6 系列版本的 bug,直到 3.6.8 (2018-09-19) 才解掉。
Related issues:
- Tailable cursor fails on getMore against a sharded cluster
- scoped connection not being returned to the pool
但似乎還不能解釋為什麼會有 65k 個連線開著…
隔天早上遠端的神隊友丟來了一個連結,訴說著 mongoose 的故事:
看起來問題發生的時機是 reconnect 時,所以平常連線正常的使用情境下也遇不到,可能觸發的時機是 MongoDB 掛掉或是正在 failover,導致 mongoose 需要 reconnect,此時 connection 就會暴增。
解法也很簡單,把 mongoose 升級到 5.2.9 / 2018-08-17 以上的版本即可。
Other
- 初階管理者和進階的管理者,有什麼不同?
- 理想與現實的平衡
- 面對現實提出解決方案的能力
- 只有一邊叫一邊做的人,才能先把事情做完,然後把事情做好
- 從商業角度探討老闆、專業經理人與freelancer的差異
- Sars: 同時身為老闆、專業經理人與 freelancer,覺得很有挑戰…
- 初談 OKR 的評分
- 「Specific」(明確)
- 「Measurable」(可衡量)
- 「Achievable」(可達成)
- 「Relevant」(相關)
- 「Time-bound」(有時限)
- COO功課 # 1 — 營運長的定位
- 營運長該管理整個公司的「運作」,讓公司變得更有效率
- COO功課 # 2 — 該從哪裡下手呢?