环境:OCP 3.4
注意, Metrics 非常占用内存,因此至少要给运行 Metrics 的 Node 节点分配 4G 内存。
1. 安装与配置 Metrics (在 Master 机器上操作)
创建一个 admin 用户,并且赋予权限
htpasswd -b /etc/origin/master/htpasswd admin admin
oadm policy add-cluster-role-to-user admin admin
用 system:admin 系统用户登录到 openshift-infra
oc login -u system:admin
oc project openshift-infra
oc get node --show-labels
openshift-infra project 中的 pod 只能部署到 infra=yes 的 node 上
oc project openshift-infra
oc annotate namespace openshift-infra openshift.io/node-selector='infra=yes' --overwrite
修改 metrics-depolyer.yaml 的版本信息和 registry 参数
cp /usr/share/ansible/openshift-ansible/roles/openshift_hosted_templates/files/v1.4/enterprise/metrics-deployer.yaml ~/
vim metrics-deployer.yaml
name: IMAGE_VERSION
value: "3.4.0"
==> "v3.4"
name: IMAGE_PREFIX
value: "registry.access.redhat.com/openshift3/"
==> "registry.example.com:5000/openshift3/"
之所以要做如上修改,是因为 docker images | grep metrics-deployer
输出如下:
registry.example.com:5000/openshift3/metrics-deployer v3.4
修改或者增加 metricsPublicURL 参数
vim /etc/origin/master/master-config.yaml
找到 assetConfig:,在其后添加如下一行:
metricsPublicURL: "https://metrics.apps.example.com/hawkular/metrics"
重启 master 和 node
systemctl restart atomic-openshift-{master,node};
创建 service account 账户
oc create -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-deployer
secrets:
- name: metrics-deployer
EOF
给 service account 账户赋权
oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer
oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster
确认权限设置成功
oc get rolebinding
oc get clusterrolebinding
设置 service account 账户口令 (如果出错,此步需要c)
oc secrets new metrics-deployer nothing=/dev/null
修改 iptables
vim /etc/sysconfig/iptables
-A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2049 -j ACCEPT
-A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT
重启 iptables
systemctl restart iptables
2. 创建 NFS Server (在 Registry 机器上操作)
yum -y install nfs-utils
export volname=cassandra
mkdir -p /srv/nfs/${volname}
chown nfsnobody:nfsnobody /srv/nfs/${volname}
chmod 700 /srv/nfs/${volname}
echo "/srv/nfs/${volname} *(rw,sync,all_squash)" >> /etc/exports
systemctl restart rpcbind nfs-server nfs-lock nfs-idmap
systemctl enable nfs-server
showmount -e
3. 验证 NFS Server 创建成功(在 Node1 机器上操作)
mkdir -p /mnt/nfs
mount -t nfs nfs.example.com:/srv/nfs/cassandra /mnt/nfs
umount /mnt/nfs
4. 创建 PV(在 Master 机器上操作,如果出错,此步需要重做)
echo '{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "cassandra-volume"
},
"spec": {
"capacity": {
"storage": "10Gi"
},
"accessModes": [ "ReadWriteOnce","ReadWriteMany" ],
"nfs": {
"path": "/srv/nfs/cassandra",
"server": "nfs.example.com"
}
}
}' | oc create -f -
5. 给 metrics-deployer.yaml 参数赋值,并创建部署(在 Master 机器上操作)
oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=metrics.apps.example.com -v CASSANDRA_PV_SIZE=10Gi | oc create -f -
这一步时间较长,请耐心等待......
oc get pod -w
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-2asw6 0/1 Running 0 2m
hawkular-metrics-p3wvl 0/1 CrashLoopBackOff 2 2m
heapster-we30w 0/1 Running 0 2m
metrics-deployer-0gv9q 1/1 Running 0 2m
oc logs hawkular-metrics-p3wvl
输出如下:
Error: the service account for Hawkular Metrics does not have permission to view resources in this namespace. View permissions are required for Hawkular Metrics to function properly.
Usually this can be resolved by running: oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra
于是按照提示,执行
oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra
执行以下命令清除,等待一分钟,然后重新把上述步骤做一遍
在 Master 机器上执行如下操作:
oc delete all --selector="metrics-infra"
oc delete sa --selector="metrics-infra"
oc delete templates --selector="metrics-infra"
oc delete secrets --selector="metrics-infra"
oc delete pvc --selector="metrics-infra"
oc delete pv cassandra-volume
oc delete sa metrics-deployer
oc delete secret metrics-deployer
在 Registry 机器上执行如下操作:
cd /srv/nfs/cassandra/
rm -rf *
6. 验证 Metrics 是否成功
以验证模式运行,该命令会重启部署一个新的 metrics-deployer pod
oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=metrics.apps.example.com -v CASSANDRA_PV_SIZE=10Gi -v MODE=validate | oc create -f -
查看 pod 日志,确定没有错误,且验证通过;如果有错,可以查看错误详细信息
oc logs metrics-deployer-sa2cq
输出如下:
......
Will retry in 5 seconds.
========================
--- validate_deployment_artifacts ---
--- validate_deployed_project ---
VALIDATION SUCCEEDED
validate_nodes_accessible: ok
validate_deployment_artifacts: ok
validate_deployed_project:
Success!
诊断 MetricsApiProxy
oadm diagnostics MetricsApiProxy
curl -k -X GET https://`oc get pod $(oc get pods | grep -i hawkular-metrics | awk '{print $1}') -o template --template='{{.status.podIP}}'`:8443/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}
oc describe pod hawkular-metrics-mwn9i | grep -i IP
输出如下:
IP: 10.128.0.19
curl -k https://10.128.0.19:8443/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}
curl -k https://metrics.apps.example.com/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}
注意, Metrics 非常占用内存,因此至少要给运行 Metrics 的 Node 节点分配 4G 内存。
1. 安装与配置 Metrics (在 Master 机器上操作)
创建一个 admin 用户,并且赋予权限
htpasswd -b /etc/origin/master/htpasswd admin admin
oadm policy add-cluster-role-to-user admin admin
用 system:admin 系统用户登录到 openshift-infra
oc login -u system:admin
oc project openshift-infra
oc get node --show-labels
openshift-infra project 中的 pod 只能部署到 infra=yes 的 node 上
oc project openshift-infra
oc annotate namespace openshift-infra openshift.io/node-selector='infra=yes' --overwrite
修改 metrics-depolyer.yaml 的版本信息和 registry 参数
cp /usr/share/ansible/openshift-ansible/roles/openshift_hosted_templates/files/v1.4/enterprise/metrics-deployer.yaml ~/
vim metrics-deployer.yaml
name: IMAGE_VERSION
value: "3.4.0"
==> "v3.4"
name: IMAGE_PREFIX
value: "registry.access.redhat.com/openshift3/"
==> "registry.example.com:5000/openshift3/"
之所以要做如上修改,是因为 docker images | grep metrics-deployer
输出如下:
registry.example.com:5000/openshift3/metrics-deployer v3.4
修改或者增加 metricsPublicURL 参数
vim /etc/origin/master/master-config.yaml
找到 assetConfig:,在其后添加如下一行:
metricsPublicURL: "https://metrics.apps.example.com/hawkular/metrics"
重启 master 和 node
systemctl restart atomic-openshift-{master,node};
创建 service account 账户
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-deployer
secrets:
- name: metrics-deployer
EOF
给 service account 账户赋权
oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer
确认权限设置成功
oc get rolebinding
oc get clusterrolebinding
设置 service account 账户口令 (如果出错,此步需要c)
oc secrets new metrics-deployer nothing=/dev/null
修改 iptables
vim /etc/sysconfig/iptables
-A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2049 -j ACCEPT
-A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT
重启 iptables
systemctl restart iptables
2. 创建 NFS Server (在 Registry 机器上操作)
yum -y install nfs-utils
export volname=cassandra
mkdir -p /srv/nfs/${volname}
chown nfsnobody:nfsnobody /srv/nfs/${volname}
chmod 700 /srv/nfs/${volname}
echo "/srv/nfs/${volname} *(rw,sync,all_squash)" >> /etc/exports
systemctl restart rpcbind nfs-server nfs-lock nfs-idmap
systemctl enable nfs-server
showmount -e
3. 验证 NFS Server 创建成功(在 Node1 机器上操作)
mkdir -p /mnt/nfs
mount -t nfs nfs.example.com:/srv/nfs/cassandra /mnt/nfs
umount /mnt/nfs
4. 创建 PV(在 Master 机器上操作,如果出错,此步需要重做)
echo '{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "cassandra-volume"
},
"spec": {
"capacity": {
"storage": "10Gi"
},
"accessModes": [ "ReadWriteOnce","ReadWriteMany" ],
"nfs": {
"path": "/srv/nfs/cassandra",
"server": "nfs.example.com"
}
}
}' | oc create -f -
5. 给 metrics-deployer.yaml 参数赋值,并创建部署(在 Master 机器上操作)
oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=metrics.apps.example.com -v CASSANDRA_PV_SIZE=10Gi | oc create -f -
这一步时间较长,请耐心等待......
oc get pod -w
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-2asw6 0/1 Running 0 2m
hawkular-metrics-p3wvl 0/1 CrashLoopBackOff 2 2m
heapster-we30w 0/1 Running 0 2m
metrics-deployer-0gv9q 1/1 Running 0 2m
oc logs hawkular-metrics-p3wvl
输出如下:
Error: the service account for Hawkular Metrics does not have permission to view resources in this namespace. View permissions are required for Hawkular Metrics to function properly.
Usually this can be resolved by running: oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra
于是按照提示,执行
oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra
执行以下命令清除,等待一分钟,然后重新把上述步骤做一遍
在 Master 机器上执行如下操作:
oc delete all --selector="metrics-infra"
oc delete sa --selector="metrics-infra"
oc delete templates --selector="metrics-infra"
oc delete secrets --selector="metrics-infra"
oc delete pvc --selector="metrics-infra"
oc delete pv cassandra-volume
oc delete sa metrics-deployer
oc delete secret metrics-deployer
在 Registry 机器上执行如下操作:
cd /srv/nfs/cassandra/
rm -rf *
6. 验证 Metrics 是否成功
以验证模式运行,该命令会重启部署一个新的 metrics-deployer pod
oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=metrics.apps.example.com -v CASSANDRA_PV_SIZE=10Gi -v MODE=validate | oc create -f -
查看 pod 日志,确定没有错误,且验证通过;如果有错,可以查看错误详细信息
oc logs metrics-deployer-sa2cq
输出如下:
......
Will retry in 5 seconds.
========================
--- validate_deployment_artifacts ---
--- validate_deployed_project ---
VALIDATION SUCCEEDED
validate_nodes_accessible: ok
validate_deployment_artifacts: ok
validate_deployed_project:
Success!
诊断 MetricsApiProxy
oadm diagnostics MetricsApiProxy
curl -k -X GET https://`oc get pod $(oc get pods | grep -i hawkular-metrics | awk '{print $1}') -o template --template='{{.status.podIP}}'`:8443/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}
oc describe pod hawkular-metrics-mwn9i | grep -i IP
输出如下:
IP: 10.128.0.19
curl -k https://10.128.0.19:8443/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}
curl -k https://metrics.apps.example.com/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}
没有评论:
发表评论