本网站(662p.com)打包出售,且带程序代码数据,662p.com域名,程序内核采用TP框架开发,需要联系扣扣:2360248666 /wx:lianweikj
精品域名一口价出售:1y1m.com(350元) ,6b7b.com(400元) , 5k5j.com(380元) , yayj.com(1800元), jiongzhun.com(1000元) , niuzen.com(2800元) , zennei.com(5000元)
需要联系扣扣:2360248666 /wx:lianweikj
k8s集群部署时etcd容器不停重启问题以及处理详解
无间道 · 142浏览 · 发布于2023-05-23 +关注

一次在k8s集群中创建实例发现etcd集群状态出现连接失败状况,导致创建实例失败,下面这篇文章主要给大家介绍了关于k8s集群部署时etcd容器不停重启问题以及处理的相关资料,需要的朋友可以参考下

宝塔服务器面板,一键全能部署及管理,送你10850元礼包,点我领取

问题现象

在安装部署Kubernetes 1.26版本时,通过kubeadm初始化集群后,发现执行kubectl命令报以下错误:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

查看kubelet状态是否正常,发现无法连接apiserver的6443端口。

Dec 21 09:36:03 k8s-master kubelet[7127]: E1221 09:36:03.015089    
7127 kubelet_node_status.go:540] "Error updating node status, 
will retry" err="error getting node \"k8s-master\":
 Get \"https://192.168.2.200:6443/api/v1/nodes/k8s-master?timeout=10s\": 
 dial tcp 192.168.2.200:6443: connect: connection refused"
Dec 21 09:36:03 k8s-master kubelet[7127]: E1221 09:36:03.015445    
7127 kubelet_node_status.go:540] "Error updating node status,
 will retry" err="error getting node \"k8s-master\": 
 Get \"https://192.168.2.200:6443/api/v1/nodes/k8s-master?timeout=10s\":
  dial tcp 192.168.2.200:6443: connect: connection refused"
Dec 21 09:36:03 k8s-master kubelet[7127]: E1221 09:36:03.015654    
7127 kubelet_node_status.go:540] "Error updating node status, 
will retry" err="error getting node \"k8s-master\":
 Get \"https://192.168.2.200:6443/api/v1/nodes/k8s-master?timeout=10s\": 
 dial tcp 192.168.2.200:6443: connect: connection refused"
Dec 21 09:36:03 k8s-master kubelet[7127]: E1221 09:36:03.015818    
7127 kubelet_node_status.go:540] "Error updating node status,
 will retry" err="error getting node \"k8s-master\": 
 Get \"https://192.168.2.200:6443/api/v1/nodes/k8s-master?timeout=10s\": 
 dial tcp 192.168.2.200:6443: connect: connection refused"

进而查看apiserver容器的状态,由于是基于containerd作为容器运行时,此时kubectl不可用的情况下,使用crictl ps -a命令可以查看所有容器的情况。

root@k8s-master:~/k8s/calico# crictl ps -a
CONTAINER           IMAGE     CREATED    STATE        
NAME               ATTEMPT    POD ID   POD
395b45b1cb733    a31e1d84401e6  50 seconds ago      Exited      
 kube-apiserver            28    e87800ae06ff5       
 kube-apiserver-k8s-master
b5c7e2a07bf1b       5d7c5dfd3ba18   3 minutes ago   Running        
 kube-controller-manager   32   6b7cc9dd07f1d    
 kube-controller-manager-k8s-master
944aa31862613       556768f31eb1d    4 minutes ago  Exited        
 kube-proxy                27    ccb6557c6f629       
 kube-proxy-ctjjq
c097332b6f416       fce326961ae2d   4 minutes ago   Exited        
etcd               30     079d491eb9925    etcd-k8s-master
b8103090322c4  dafd8ad70b156   6 minutes ago   Exited   
kube-scheduler            32     48f9544c9798c   kube-scheduler-k8s-master
a14b969e8ad05   5d7c5dfd3ba18  12 minutes ago     Exited         
kube-controller-manager   31                  5576806b4e142       
kube-controller-manager-k8s-master

发现此时kube-apiserver容器已经退出,查看容器日志是否有异常信息。通过日志信息发现是kube-apiserver无法连接etcd的2379端口,那么问题应该是出在etcd了。

W1221 07:00:20.392868       1 logging.go:59] [core] [Channel #1 SubChannel #2] 
grpc: addrConn.createTransport failed to connect to {
  "Addr": "127.0.0.1:2379",
  "ServerName": "127.0.0.1",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 
127.0.0.1:2379:
 connect: connection refused"
W1221 07:00:21.391330       1 logging.go:59] [core] [Channel #4 SubChannel #6] 
grpc: 
addrConn.createTransport failed to connect to {
  "Addr": "127.0.0.1:2379",
  "ServerName": "127.0.0.1",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport:
 Error while dialing dial tcp 127.0.0.1:2379: 
connect: connection refused"

此时etcd容器也在不断地重启,查看其日志发现没有错误级别的信息。

{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":
"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 is starting a new election 
at term 2"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":
"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 became pre-candidate at term 2"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":
"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 received MsgPreVoteResp from 
d975d9ebc69964b3 at term 2"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":
"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 became candidate at term 3"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":
"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 received MsgVoteResp from 
d975d9ebc69964b3 at term 3"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":
"etcdserver/zap_raft.go:77","msg":"d975d9ebc69964b3 became leader at term 3"}
{"level":"info","ts":"2022-12-21T10:29:00.740Z","logger":"raft","caller":
"etcdserver/zap_raft.go:77","msg":"raft.node: d975d9ebc69964b3 elected leader 
d975d9ebc69964b3 at term 3"}
{"level":"info","ts":"2022-12-21T10:29:00.742Z","caller":
"etcdserver/server.go:2054","msg":"published local member to cluster through raft",
"local-member-id":"d975d9ebc69964b3","local-member-attributes":
"{Name:k8s-master ClientURLs:[ 
"request-path":"/0/members/d975d9ebc69964b3/attributes","cluster-id":
"f88ac1c8c4bab6","publish-timeout":"7s"}
{"level":"info","ts":"2022-12-21T10:29:00.742Z","caller":
"embed/serve.go:100","msg":"ready to serve client requests"}
{"level":"info","ts":"2022-12-21T10:29:00.742Z","caller":
"embed/serve.go:100","msg":"ready to serve client requests"}
{"level":"info","ts":"2022-12-21T10:29:00.743Z","caller":
"etcdmain/main.go:44","msg":"notifying init daemon"}
{"level":"info","ts":"2022-12-21T10:29:00.743Z","caller":
"etcdmain/main.go:50","msg":"successfully notified init daemon"}
{"level":"info","ts":"2022-12-21T10:29:00.744Z","caller":
"embed/serve.go:198","msg":"serving client traffic securely","address":"192.168.2.200:2379"}
{"level":"info","ts":"2022-12-21T10:29:00.745Z","caller":
"embed/serve.go:198","msg":"serving client traffic securely","address":"127.0.0.1:2379"}
{"level":"info","ts":"2022-12-21T10:30:20.624Z","caller":
"osutil/interrupt_unix.go:64","msg":"received signal; shutting down","signal":"terminated"}
{"level":"info","ts":"2022-12-21T10:30:20.624Z","caller":
"embed/etcd.go:373","msg":"closing etcd server","name":
"k8s-master","data-dir":"/var/lib/etcd","advertise-peer-urls":
["https://192.168.2.200:2380"],"advertise-client-urls":["https://192.168.2.200:2379"]}
{"level":"info","ts":"2022-12-21T10:30:20.636Z","caller":
"etcdserver/server.go:1465","msg":"skipped leadership 
transfer for single voting member cluster","local-member-id":
"d975d9ebc69964b3","current-leader-member-id":"d975d9ebc69964b3"}
{"level":"info","ts":"2022-12-21T10:30:20.637Z","caller":
"embed/etcd.go:568","msg":"stopping serving peer traffic","address":"192.168.2.200:2380"}
{"level":"info","ts":"2022-12-21T10:30:20.639Z","caller":
"embed/etcd.go:573","msg":"stopped serving peer traffic","address":"192.168.2.200:2380"}
{"level":"info","ts":"2022-12-21T10:30:20.639Z","caller":
"embed/etcd.go:375","msg":"closed etcd server","name":
"k8s-master","data-dir":"/var/lib/etcd","advertise-peer-urls":
["https://192.168.2.200:2380"],"advertise-client-urls":["https://192.168.2.200:2379"]}

但是,其中一行日志信息表示etcd收到了关闭的信号,并不是异常退出的。

{"level":"info","ts":"2022-12-21T10:30:20.624Z","caller":"osutil/interrupt_unix.go:64",
"msg":"received signal; shutting down","signal":"terminated"}

解决问题

该问题为未正确设置cgroups导致,在containerd的配置文件/etc/containerd/config.toml中,修改SystemdCgroup配置为true。

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  BinaryName = ""
  CriuImagePath = ""
  CriuPath = ""
  CriuWorkPath = ""
  IoGid = 0
  IoUid = 0
  NoNewKeyring = false
  NoPivotRoot = false
  Root = ""
  ShimCgroup = ""
  SystemdCgroup = true

重启containerd服务

systemctl restart containerd

etcd容器不再重启,其他容器也恢复正常,问题解决。


相关推荐

将Fedora 29升级到Fedora 30

吴振华 · 704浏览 · 2019-05-14 22:00:02
使用Nginx反向代理到go-fastdfs

iamitnan · 727浏览 · 2019-05-23 13:42:00
利用VLC搭建组播流服务器

追忆似水年华 · 2693浏览 · 2019-06-14 11:27:06
用Bash脚本监控Linux上的内存使用情况

吴振华 · 974浏览 · 2019-06-24 11:27:02
加载中

0评论

评论
各位好,我是无间道,欢迎互助粉丝!
分类专栏
小鸟云服务器
扫码进入手机网页