Technology
- 將 Windows IIS 用的 *.pfx 轉換成 Linux 使用的 SSL 憑證
openssl pkcs12 -in ssl.pfx -nodes -out ssl.pem
openssl rsa -in ssl.pem -out ssl.key
openssl x509 -in ssl.pem -out ssl.crt
MongoDB 3.6 & mongoose Issue
使用的 MongoDB server 版本:3.6.7
Crash Log:
pthread_create failed: Resource temporarily unavailable in sharding cluster
Terminating session due to error: InternalError: failed to create service entry worker thread
簡單來說就是 OS 的 connection 用完了,vm.max_map_count 預設上限是 65530,MongoDB Operations Checklist 有提到 production 環境參考設定,顯然不是設定值調校的問題。
繼續找 root cause,發現 log 有大量的連線沒有完全被關掉,一直誤以為是正常的。
[thread4] Starting new replica set monitor for rs/172.31.15.27:27017,172.31.5.133:27017,172.31.5.84:27017
[thread4] Successfully connected to 172.31.5.133:27017 (1 connections now open to 172.31.5.133:27017 with a 5 second timeout)
[ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 172.31.5.84:27017 (1 connections now open to 172.31.5.84:27017 with a 5 second timeout)
[listener] connection accepted from 172.31.15.27:49040 #4 (4 connections now open)
[conn4] received client metadata from 172.31.15.27:49040 conn4: { driver: { name: "MongoDB Internal Client", version: "3.6.7" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
[thread4] Successfully connected to 172.31.15.27:27017 (1 connections now open to 172.31.15.27:27017 with a 5 second timeout)
[thread4] Successfully connected to 172.31.5.84:27017 (1 connections now open to 172.31.5.84:27017 with a 0 second timeout)
[thread4] scoped connection to 172.31.5.84:27017 not being returned to the pool
[thread4] Starting new replica set monitor for rs/172.31.15.27:27017,172.31.5.133:27017,172.31.5.84:27017
[thread4] Successfully connected to 172.31.5.84:27017 (2 connections now open to 172.31.5.84:27017 with a 0 second timeout)
[thread4] scoped connection to 172.31.5.84:27017 not being returned to the pool
[thread4] Starting new replica set monitor for rs/172.31.15.27:27017,172.31.5.133:27017,172.31.5.84:27017
[thread4] Successfully connected to 172.31.5.84:27017 (3 connections now open to 172.31.5.84:27017 with a 0 second timeout)
[thread4] scoped connection to 172.31.5.84:27017 not being returned to the pool
接下來就是無止盡的 scoped connection not being returned to the pool
,測試環境累積了三個月也達到了 12.5 萬個 connection 沒有被關掉,但實際上只有 11 connections now open
。
最後查到了幾個 ticket,似乎是個 3.6 系列版本的 bug,直到 3.6.8 (2018-09-19) 才解掉。
Related issues:
- Tailable cursor fails on getMore against a sharded cluster
- scoped connection not being returned to the pool
但似乎還不能解釋為什麼會有 65k 個連線開著…
隔天早上遠端的神隊友丟來了一個連結,訴說著 mongoose 的故事:
看起來問題發生的時機是 reconnect 時,所以平常連線正常的使用情境下也遇不到,可能觸發的時機是 MongoDB 掛掉或是正在 failover,導致 mongoose 需要 reconnect,此時 connection 就會暴增。
解法也很簡單,把 mongoose 升級到 5.2.9 / 2018-08-17 以上的版本即可。
Other
- 初階管理者和進階的管理者,有什麼不同?
- 理想與現實的平衡
- 面對現實提出解決方案的能力
- 只有一邊叫一邊做的人,才能先把事情做完,然後把事情做好
- 從商業角度探討老闆、專業經理人與freelancer的差異
- Sars: 同時身為老闆、專業經理人與 freelancer,覺得很有挑戰…
- 初談 OKR 的評分
- 「Specific」(明確)
- 「Measurable」(可衡量)
- 「Achievable」(可達成)
- 「Relevant」(相關)
- 「Time-bound」(有時限)
- COO功課 # 1 — 營運長的定位
- 營運長該管理整個公司的「運作」,讓公司變得更有效率
- COO功課 # 2 — 該從哪裡下手呢?