0%

实验环境搭建

KubeEdge搭建

环境配置

使用mininet搭建模拟网络环境

虚拟机硬件配置

型号 硬盘 内存 CPU
vritualbox虚拟机 60G 2G Intel i5-11320H 2核 3.2GHz
vritualbox虚拟机 60G 2G Intel i5-11320H 2核 3.2GHz
vritualbox虚拟机 60G 2G Intel i5-11320H 2核 3.2GHz
vritualbox虚拟机 60G 2G Intel i5-11320H 2核 3.2GHz

软件配置,由于兼容性问题,使用kubernetes1.23,KubeEdge1.12.2

主机名 IP 操作系统
master 10.0.1.200(192.168.80.134) Ubuntu20.04
node1 192.168.0.201(192.168.80.135) Ubuntu20.04
node2 192.168.0.202(192.168.80.136) Ubuntu20.04
node3 192.168.0.203 Ubuntu20.04

初始化系统配置

所有主机修改主机名

1
2
sudo hostnamectl set-hostname master
reboot

所有主机关闭防火墙

1
2
sudo systemctl stop ufw
sudo systemctl disable ufw

所有主机禁用swap

1
2
3
4
5
sudo vi /etc/fstab
# 注释swap那一行
sudo swapon -a # 启用所有swap
sudo swapoff -a # 禁用所有swap
sudo swapon -s # 查看swap状态

所有主机设置时间同步

1
2
3
sudo apt install -y ntpdate
sudo ntpdate time.windows.com
sudo timedatectl set-timezone Asia/Shanghai

所有节点添加hosts

1
2
3
4
5
6
7
8
# 添加hosts
sudo vi /etc/hosts
# 加入如下几行
10.0.1.200 master
192.168.0.201 node1
192.168.0.202 node2
192.168.0.203 node3
185.199.108.133 raw.githubusercontent.com

启用ipv4转发

1
2
3
4
sudo vi /etc/sysctl.conf
# 注释下面行
/etc/sysctl.conf: net.ipv4.ip_forward = 1
sudo sysctl -p /etc/sysctl.conf

安装Docker

我们选择安装docker.io ubuntu的版本,省事

记住,需要在所有4台节点上都安装docker

1
sudo apt install docker.io

docker官方镜像仓库访问比较慢,可以使用dockerhub国内源加速

1
2
3
4
5
6
7
8
9
10
11
12
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://knjsrl1b.mirror.aliyuncs.com","https://docker.hub.com"]
}
EOF
# 阿里云镜像加速 https://knjsrl1b.mirror.aliyuncs.com
# 中科大镜像加速 https://docker.mirrors.ustc.edu.cn
# 云节点加入"exec-opts": ["native.cgroupdriver=systemd"],
# 边缘节点默认cgroupfs就行了,和kubeedge一致
sudo systemctl daemon-reload
sudo systemctl restart docker

主机安装Kubernetes

考虑到兼容性,我们选择kubernetes1.23.17进行安装

根据阿里云的教程,在1台云主机上,使用阿里源安装kubelet,kubeadm和kubectl组件

1
2
3
4
5
6
7
8
sudo apt-get update && apt-get install -y apt-transport-https
sudo curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
sudo vim /etc/apt/sources.list.d/kubernetes.list
# 输入以下内容:
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
sudo apt update
# 可先使用apt list kubelet -a 查看所有版本,再指定版本
sudo apt install -y kubelet=1.23.17-00 kubeadm=1.23.17-00 kubectl=1.23.17-00

在云master主机上使用kubeadm创建kubernetes集群,这里我们使用阿里云的镜像进行加速,这里kubeadm会安装和自己版本匹配的kubernetes

如果主机的IP就是公网IP,那么初始化如下:

1
2
3
4
5
sudo kubeadm init \
--apiserver-advertise-address=192.168.132.100 \
--image-repository registry.aliyuncs.com/google_containers \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16

如果云的服务器的公网IP在主机上看不到,因此这里选择让apiserver监听所有网卡的地址,并且添加额外的公网IP作为认证许可的IP

1
2
3
4
5
6
sudo kubeadm init \
--apiserver-advertise-address=0.0.0.0 \
--apiserver-cert-extra-sans=139.9.72.62 \
--image-repository registry.aliyuncs.com/google_containers \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16

执行完毕会输出很多提示指令需要我们执行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.132.100:6443 --token 20vasa.tus6j1y6edbm6e1i \
--discovery-token-ca-cert-hash sha256:830a4e14fdecfb8c9eb7143fe44a1abb8bc68956d959b55447b2e7a1d6e61d85

我们按照提示在普通用户和root用户下都执行一次,这样kubectl就可以访问到本地的kube-api-server了

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

我们接着安装CNI网络插件,下载太慢了可以使用这个网站查询raw.githubusercontent.com的IP地址并且写入hosts文件。

1
2
3
4
5
6
7
8
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
vi kube-flannel.yml
# kubectl edit -n kube-flannel daemonset.apps/kube-flannel-ds
......# 修改下面的affinity亲和性,增加一个key使其不部署于边缘节点
- key: node-role.kubernetes.io/edge
operator: DoesNotExist
......
kubectl apply -f kube-flannel.yml

执行 kubectl get pods -n kube-flannel如果出现如下说明网络插件安装成功

1
2
3
...
kube-flannel-ds-hgn9l 1/1 Running 0 44m
...

让master也作为工作节点可以运行用户Pod

1
kubectl taint node master node-role.kubernetes.io/master-

让master不参与运行用户Pod

1
kubectl taint node master node-role.kubernetes.io/master=:NoSchedule

过一会,在master主机上执行 kubectl get nodes,如下则加入成功

1
2
3
kubectl get nodekub
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 13m v1.22.15

在 Kubernetes 集群中创建一个 pod,验证是否正常运行

1
2
3
4
5
6
7
8
9
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=NodePort
kubectl get pod,svc
NAME READY STATUS RESTARTS AGE
pod/nginx-6799fc88d8-hf2m9 1/1 Running 0 22s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14m
service/nginx NodePort 10.104.74.138 <none> 80:31332/TCP 8s

可以看到nginx暴露的端口号为31332,因此我们访问地址:http://39.106.4.225:31332可以成功访问nginx首页

http://192.168.80.128:31017

安装KubeEdge

kubeEdge和kubernetes类似,提供了keadm工具用来快速搭建kubeedge集群,我们可以提前在KubeEdge的github官网上面下载keadm1.12.2

1
wget https://github.com/kubeedge/kubeedge/releases/download/v1.13.1/keadm-v1.13.1-linux-amd64.tar.gz

在每个节点上安装keadm

1
2
tar -xvf keadm-v1.12.2-linux-长度amd64.tar.gz
sudo mv keadm-v1.12.2-linux-amd64/keadm/keadm /usr/bin/

云端安装

使用keadm安装kubeedge的云端组件cloudcore

如果速度慢可以提前拉取cloudcore镜像

sudo docker pull kubeedge/cloudcore:v1.13.1

1
2
3
4
5
6
7
8
9
10
sudo keadm init --advertise-address=192.168.43.118 --profile version=v1.13.1
# 这些参数已经没有用了 --set cloudcore-tag=v1.13.0 --kubeedge-version=1.13.0
Kubernetes version verification passed, KubeEdge installation will start...
CLOUDCORE started
=========CHART DETAILS=======
NAME: cloudcore
LAST DEPLOYED: Thu Nov 3 11:05:24 2022
NAMESPACE: kubeedge
STATUS: deployed
REVISION: 1

–advertise-address=xxx.xx.xx.xx 这里的xxx.xx.xx.xx换成云主机的公网地址,–profile version=v1.12.1 意思是指定安装的kubeEdge的版本,如果默认不指定那么keadm会自动去下载最新的版本

注意,这个命令会从仓库下载cloudcore容器镜像

我们可以看到cloudcore的Pod和service已经在运行了,cloudcore会监听本地的10000-10004端口

1
2
3
4
5
6
kubectl get pod,svc -n kubeedge
NAME READY STATUS RESTARTS AGE
pod/cloudcore-5768d46f8d-fqdcn 1/1 Running 0 78s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cloudcore ClusterIP 10.99.61.17 <none> 10000/TCP,10001/TCP,10002/TCP,10003/TCP,10004/TCP 78s

获得边缘设备接入的token

1
2
sudo keadm gettoken
0825d1d733ec84877374418cc4ecd379501efe7fe1c778e91022367c834a22a6.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2OTM0NDg5NzV9.ws0B17aZrvhSL0mu1FEVElnewTFGrh5MNn_4reBgbNA

边缘节点安装

可以提前拉取镜像(提前拉取没有用)

1
sudo docker pull kubeedge/installation-package:v1.13.0

加入集群,keadm会安装edgecore和mqtt协议的实现软件mosquitto,mosquitto会监听localhost:1183端口

1.12

1
sudo keadm join --cloudcore-ipport=192.168.43.117:10000 --kubeedge-version=1.12.2 --token=f17ab9d16aa9b82249d2242101759e257e44970f58b347e61155ea0c34f836a4.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NzY4MTc5ODF9.60yCItWyPoNJrIjEBZxNzcQlqTQuiLYkF3Ky9zQ16Ps

由于1.13版本默认容器运行时为containerd,可手动指定runtimetype为docker

1
sudo keadm join --cloudcore-ipport=192.168.43.118:10000 --kubeedge-version=1.13.1 --runtimetype=docker --token=0825d1d733ec84877374418cc4ecd379501efe7fe1c778e91022367c834a22a6.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2OTM0NDg5NzV9.ws0B17aZrvhSL0mu1FEVElnewTFGrh5MNn_4reBgbNA

--cloudcore-ipport是边缘节点能访问的云master主机的IP端口号,--token是上面云matster生成的识别码

安装成功

1
2
3
4
5
......
W1024 13:10:08.370505 4423 validation.go:71] NodeIP is empty , use default ip which can connect to cloud.
I1024 13:10:08.371425 4423 join.go:100] 9. Run EdgeCore daemon
I1024 13:10:08.822777 4423 join.go:317]
I1024 13:10:08.822789 4423 join.go:318] KubeEdge edgecore is running, For logs visit: journalctl -u edgecore.service -xe

如果通过 sudo systemctl status edgecore发现服务失败,使用 journalctl -u edgecore.service -xe查看日志

在master上查看

1
2
3
4
5
6
root@master:~# kubectl get node
NAME STATUS ROLES AGE VERSION
edgenode1 Ready agent,edge 10m v1.22.6-kubeedge-v1.12.0
edgenode2 Ready agent,edge 5m8s v1.22.6-kubeedge-v1.12.0
edgenode3 Ready agent,edge 2s v1.22.6-kubeedge-v1.12.0
master Ready control-plane,master 47m v1.22.15

配置kubectl logs支持边缘节点

边缘部署一个Nginx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
kubectl create deployment nginx --image=nginx -oyaml --dry-run=client > nginx.yaml
vi nginx.yaml
# 在spec-spec中加入边缘亲和性
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/edge
operator: In
values: [""]
kubectl apply -f nginx.yaml
(需要首先修改端口号,在~/.kube/config文件中,将端口号修改一下)
kubectl expose deployment nginx --port=80 --type=NodePort
kubectl get pod,svc

可以看到nginx暴露的端口号为30865,因此我们在任意边缘节点访问地址:curl http://node2:30865可以成功访问nginx首页

这样默认搭建好的kubeedge不支持在master查看边缘节点logs,会有如下的报错

1
2
kubectl logs nginx-597c67fd4d-kx44m
Error from server: Get "https://192.168.40.10:10350/containerLogs/default/nginx-597c67fd4d-kx44m/nginx": dial tcp 192.168.40.10:10350: i/o timeout

参考官方文档教程使用Keadm进行部署 | KubeEdge一个支持边缘计算的开放平台

kube-proxy默认和kubeedge不兼容,因此我们考虑在边缘端移除kube-proxy

1
kubectl edit daemonsets.apps -n kube-system kube-proxy

我们修改kube-proxy的节点亲和性,使其不存在于边缘节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2022-06-20T12:43:20Z"
generation: 1
labels:
k8s-app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "92283"
uid: 39dd85f5-8d7f-47ff-83b4-59df66de7803
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kube-proxy
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-proxy
spec:
affinity: # 在这里加入亲和性
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/edge
operator: DoesNotExist
containers:
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --hostname-override=$(NODE_NAME)
env:
......

此时我们在云master上看系统的pod组件,发现边缘的kube-proxy已经消失了

1
kubectl get pod -n kube-system -owide

因为flannel不支持边缘环境,无法在边缘运行,等下使用edgemesh见issue2287,同样修改flannel的亲和性,使其不在边缘运行

1
2
3
4
5
6
7
8
9
10
11
12
13
# 重新部署flannel
kubectl delete daemonset.apps/kube-flannel-ds -nkube-flannel
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
vi kube-flannel.yml
# kubectl edit -n kube-flannel daemonset.apps/kube-flannel-ds
......# 修改下面的affi亲和性
- key: node-role.kubernetes.io/edge
operator: DoesNotExist
......
# 重新部署
kubectl apply -f kube-flannel.yml
# 再看,只有云主机上有运行了
kubectl get all -nkube-flannel -owide

查看Kubernetes 的 ca.crtca.key 文件都存在

1
2
3
4
ls /etc/kubernetes/pki
apiserver.crt apiserver.key ca.crt front-proxy-ca.crt front-proxy-client.key
apiserver-etcd-client.crt apiserver-kubelet-client.crt ca.key front-proxy-ca.key sa.key
apiserver-etcd-client.key apiserver-kubelet-client.key etcd front-proxy-client.crt sa.pub

在云端节点为CloudStream生成证书,其中 certgen.sh在kubeedge源码中

1
2
3
4
5
6
7
8
9

wget https://github.com/kubeedge/kubeedge/archive/refs/tags/v1.13.1.tar.gz
tar -xf v1.13.1.tar.gz
mkdir /etc/kubeedge
cp kubeedge-1.13.1/build/tools/certgen.sh /etc/kubeedge/
cd /etc/kubeedge/
sudo su
export CLOUDCOREIPS="192.168.43.118"
bash /etc/kubeedge/certgen.sh stream

在master设置iptables规则,把所有发往边缘edgecore10350的包全部转发给cloudcore,让edgecore通过stream来转发

1
2
# 端口10003和10350是 CloudStream 和 Edgecore 的默认端口
sudo iptables -t nat -A OUTPUT -p tcp --dport 10350 -j DNAT --to 10.0.1.200:10003

修改边缘端edgecore配置文件 /etc/kubeedge/config/edgecore.yaml

1
sudo gedit /etc/kubeedge/config/edgecore.yaml
1
2
3
4
5
6
7
8
9
edgeStream:
enable: true # 修改为true
handshakeTimeout: 30
readDeadline: 15
server: 101.201.181.239:10004 # 这个为cloudcore的ip地址
tlsTunnelCAFile: /etc/kubeedge/ca/rootCA.crt
tlsTunnelCertFile: /etc/kubeedge/certs/server.crt
tlsTunnelPrivateKeyFile: /etc/kubeedge/certs/server.key
writeDeadline: 15

重启edgecore

1
sudo systemctl restart edgecore

然后边缘的edgecore就可以正常启动了

1
2
3
4
5
6
7
8
9
10
11
sudo systemctl start edgecore
sudo systemctl status edgecore
● edgecore.service
Loaded: loaded (/etc/systemd/system/edgecore.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2022-07-07 16:53:51 CST; 4s ago
Main PID: 58760 (edgecore)
Tasks: 10 (limit: 992)
Memory: 36.5M
CPU: 315ms
CGroup: /system.slice/edgecore.service
└─58760 /usr/local/bin/edgecore

现在就可以正常在云端看边缘的logs了

1
2
3
4
kubectl logs nginx-597c67fd4d-hwmdz
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
......

安装EdgeMesh

在边缘不使用kube-proxy以后,边缘需要一个网络代理插件,这里我们使用kubeedge官方的edgemesh

EdgeMesh 相当于kube-proxy+flannel+coreDNS

edgemesh现在从kubeedge独立出来了,有自己专门的文档网站介绍 | EdgeMesh,根据文档edgemesh是一个不依赖kubeedge的独立的k8s网络代理,只需要在master节点上使用helm安装就可以了

去除 K8s master 节点的污点,如果 K8s master 节点上没有部署需要被代理的应用,此步骤也可以不执行

1
kubectl taint nodes --all node-role.kubernetes.io/master-

正常情况下你不会希望 EdgeMesh 去代理 Kubernetes API 服务,因此需要给它添加过滤标签,更多信息请参考 服务过滤

1
kubectl label services kubernetes service.edgemesh.kubeedge.io/service-proxy-name=""

启用 KubeEdge 的边缘 Kube-API 端点服务

在云端,开启 dynamicController 模块,配置完成后,需要重启 cloudcore

1
2
3
4
5
6
7
8
9
10
kubectl edit cm cloudcore -n kubeedge
modules:
...
dynamicController:
enable: true
...

# 杀死cloudcore容器
kubectl get all -nkubeedge
kubectl delete -nkubeedge pod/cloudcore-6687684d4d-92cvz

在边缘节点,打开 metaServer 模块,配置完成后,需要重启 edgecore

1
2
3
4
5
6
7
8
9
sudo gedit /etc/kubeedge/config/edgecore.yaml
modules:
...
metaManager:
metaServer:
enable: true
...
# 重启 edgecore
sudo systemctl restart edgecore

在边缘节点,配置 clusterDNS 和 clusterDomain,配置完成后,需要重启 edgecore(这是为了边缘应用能够访问到 EdgeMesh 的 DNS 服务,与边缘 Kube-API 端点本身无关,但为了配置的流畅性,还是放在这里说明。

clusterDNS 设置的值 ‘169.254.96.16’ 来自于 commonConfig 中 bridgeDeviceIP 的默认值,正常情况下无需修改,非得修改请保持两者一致。)

1
2
3
4
5
6
7
8
9
10
11
12
13
sudo gedit /etc/kubeedge/config/edgecore.yaml
modules:
...
edged:
...
tailoredKubeletConfig:
...
clusterDNS:
- 169.254.96.16
clusterDomain: cluster.local
...
# 重启 edgecore
sudo systemctl restart edgecore

最后,在边缘节点,测试边缘 Kube-API 端点功能是否正常

1
2
3
curl 127.0.0.1:10550/api/v1/services
# 出现apiversion开头的json字符串说明正常
{"apiVersion":"v1","items":[{"api......

主机安装helm3

1
2
3
wget https://get.helm.sh/helm-v3.11.0-linux-amd64.tar.gz
tar -xf helm-v3.11.0-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin/

设置helm repo仓库,这里使用微软仓库

1
2
helm repo add stable http://mirror.azure.cn/kubernetes/charts
helm repo update

生成PSK密码

1
2
openssl rand -base64 32
JDhvPrqj/mA/2zA4P9voxqQIR8ectRzY8pDKaD+vlHo=

helm安装edgemesh,只有一个master节点作为中继节点,暴露地址写云主机公网IP

1
2
3
4
helm install edgemesh --namespace kubeedge \
--set agent.psk=JDhvPrqj/mA/2zA4P9voxqQIR8ectRzY8pDKaD+vlHo= \
--set agent.relayNodes[0].nodeName=master,agent.relayNodes[0].advertiseAddress="{192.168.80.128}" \
https://raw.githubusercontent.com/kubeedge/edgemesh/main/build/helm/edgemesh.tgz

多个中继节点,如果一个边缘局域网内要做到服务可访问,应该在局域网内设置一个rely节点(似乎不用)

1
2
3
4
5
6
helm install edgemesh --namespace kubeedge \
--set agent.psk=udj41ZTdaQNb0gUaS64QuLgkFNTYy9dlXKg6bvQYuls= \
--set agent.relayNodes[0].nodeName=master,agent.relayNodes[0].advertiseAddress="{101.201.181.239}" \
--set agent.relayNodes[1].nodeName=edgenode2,agent.relayNodes[1].advertiseAddress="{192.168.56.11}" \
--set agent.relayNodes[2].nodeName=edgenode3,agent.relayNodes[2].advertiseAddress="{192.168.56.12}" \
https://raw.githubusercontent.com/kubeedge/edgemesh/main/build/helm/edgemesh.tgz

卸载edgemesh方法

1
helm uninstall edgemesh -n kubeedge

检验部署结果

1
2
3
helm ls -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
edgemesh kubeedge 1 2022-10-08 22:36:18.261721438 +0800 CST deployed edgemesh-0.1.0 latest

再次检验

1
2
3
4
5
6
7
8
9
kubectl get all -n kubeedge -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/cloudcore-5768d46f8d-t8dnn 1/1 Running 1 (50m ago) 159m 172.28.40.134 master <none> <none>
pod/edgemesh-agent-4f5xt 0/1 ContainerCreating 0 4m8s 192.168.40.10 area1-node1 <none> <none>
pod/edgemesh-agent-6czts 0/1 CrashLoopBackOff 5 (27s ago) 4m8s 172.28.40.134 master <none> <none>
pod/edgemesh-agent-krvsc 0/1 Pending 0 4m8s <none> area2-node1 <none> <none>
pod/edgemesh-agent-p22b5 0/1 Pending 0 4m8s <none> area2-node2 <none> <none>
pod/edgemesh-agent-tzq7g 0/1 Pending 0 4m8s <none> area1-node2 <none> <none>
......

云主机上的edgemesh启动崩溃,边缘上的edgemesh一直在下载镜像,创建容器和pending,可以看看是不是edgemesh镜像不是最新的造成的

等了一晚上,终于edgemesh镜像拉取完毕,在边缘运行起来了。

因为helm部署的edgemesh在重启后不会恢复,因此我们手动部署edgemesh

1
2
3
4
5
6
7
git clone https://github.com/kubeedge/edgemesh.git
cd edgemesh
kubectl apply -f build/crds/istio/
# 设置 build/agent/resources/04-configmap.yaml 的 relayNodes,并重新生成 PSK 密码
kubectl apply -f build/agent/resources/
# 检验部署结果
kubectl get all -n kubeedge -o wide

性能测试

查看监听端口

1
2
sudo lsof -i -P -n | grep LISTEN
sudo netstat -tulpn | grep LISTEN
1
2
3
4
# 云边流量
ip.addr == 192.168.80.128
# 边边流量
arp or mdns or ssdp or (ip.src == 192.168.0.0/24 and ip.dst == 192.168.0.0/24)

健康检查

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/bash
while true
do
res=$(curl http://10.96.0.28:8091/scheduler |grep 0.000)
if [ -z "$res" ];then
echo "得到正确的功率值"
echo -e "\a"
break
fi
sleep 1
done
curl http://10.96.0.28:8091/scheduler

# 运行时用time运行脚本
time ./test.sh

统计离线感知的准确度

部署nginx应用,使每个应用均匀分布在每个节点上

断开一个运行有应用的节点的云边连接,查看控制平面上该节点上的Pod是否会迁移

1
2
3
4
5
6
7
8
# 禁止某个节点ip上外网
sudo iptables -A FORWARD -i enp0s8 -s 192.168.56.10 -o enp0s3 -j DROP
# 恢复某个节点上外网
sudo iptables -A FORWARD -i enp0s8 -s 192.168.56.10 -o enp0s3 -j ACCEPT
# 列出规则
sudo iptables -L -n --line-number
# 删除规则
sudo iptables -D FORWARD 1

关闭一个运行有应用的节点电源,查看控制平面上该节点上的Pod是否会迁移

重连同步性能

边缘断网以后,等待节点not ready被打上污点,然后使用nethogs记录apiserver的流量情况,然后边缘通网,等节点ready后10秒结束流量记录,每秒流量总和即为重连同步的数据

1
2
3
# 使用nethogs工具查看apiserver进程网速
sudo apt install nethogs
nethogs -b|grep kube-apiserver >mon.txt

Ubuntu20.04 TLS 开机卡在“A start job is running for wait for network to be Configured”解决

1
2
3
4
5
6
7
# 在[Service]下添加 TimeoutStartSec=2sec,(设置超时时间为2秒)如下:
sudo vi /etc/systemd/system/network-online.target.wants/systemd-networkd-wait-online.service
[Service]
Type=oneshot
ExecStart=/lib/systemd/systemd-networkd-wait-online
RemainAfterExit=yes
TimeoutStartSec=2sec

修改pod上限

1
2
3
4
5
6
7
8
9
10
11
12
sudo vi /etc/kubeedge/config/edgecore.yaml # kubeedge
...
# 修改下行
maxPods: 500
sudo vi /var/lib/kubelet/config.yaml # 其他
...
# 加入下行
maxPods: 500
sudo systemctl restart kubelet
sudo systemctl restart edgecore
# master查询状态:
kubectl describe node node1 | grep -i "Capacity\|Allocatable" -A 6

安装Prometheuses