2017年2月9日星期四

OpenShift_051:离线安装 OCP 3.4 之安装与配置 Metrics

环境:OCP 3.4

注意, Metrics 非常占用内存,因此至少要给运行 Metrics 的 Node 节点分配 4G 内存。

1. 安装与配置 Metrics (在 Master 机器上操作)
创建一个 admin 用户,并且赋予权限
htpasswd -b /etc/origin/master/htpasswd admin admin
oadm policy add-cluster-role-to-user admin admin

用 system:admin 系统用户登录到 openshift-infra
oc login -u system:admin
oc project openshift-infra
oc get node --show-labels

openshift-infra project 中的 pod 只能部署到 infra=yes 的 node 上
oc project openshift-infra
oc annotate namespace openshift-infra openshift.io/node-selector='infra=yes' --overwrite

修改 metrics-depolyer.yaml 的版本信息和 registry 参数
cp /usr/share/ansible/openshift-ansible/roles/openshift_hosted_templates/files/v1.4/enterprise/metrics-deployer.yaml ~/

vim metrics-deployer.yaml
 name: IMAGE_VERSION
  value: "3.4.0"
==> "v3.4"
  name: IMAGE_PREFIX
  value: "registry.access.redhat.com/openshift3/"
==> "registry.example.com:5000/openshift3/"

之所以要做如上修改,是因为 docker images | grep metrics-deployer
输出如下:
registry.example.com:5000/openshift3/metrics-deployer                        v3.4

修改或者增加 metricsPublicURL 参数
vim /etc/origin/master/master-config.yaml
找到 assetConfig:,在其后添加如下一行:
metricsPublicURL: "https://metrics.apps.example.com/hawkular/metrics"

重启 master 和 node
systemctl restart atomic-openshift-{master,node};

创建 service account 账户
oc create -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-deployer
secrets:
- name: metrics-deployer
EOF


给 service account 账户赋权
oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer

oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster

确认权限设置成功
oc get rolebinding
oc get clusterrolebinding

设置 service account 账户口令 (如果出错,此步需要c)
oc secrets new metrics-deployer nothing=/dev/null

修改 iptables
vim /etc/sysconfig/iptables
-A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2049 -j ACCEPT
-A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT

重启 iptables
systemctl restart iptables

2. 创建 NFS Server (在 Registry 机器上操作)
yum -y install nfs-utils
export volname=cassandra
mkdir -p /srv/nfs/${volname}
chown nfsnobody:nfsnobody /srv/nfs/${volname}
chmod 700 /srv/nfs/${volname}
echo "/srv/nfs/${volname} *(rw,sync,all_squash)" >> /etc/exports
systemctl restart rpcbind nfs-server nfs-lock nfs-idmap
systemctl enable nfs-server
showmount -e

3. 验证 NFS Server 创建成功(在 Node1 机器上操作)
mkdir -p /mnt/nfs
mount -t nfs nfs.example.com:/srv/nfs/cassandra /mnt/nfs
umount /mnt/nfs

4. 创建 PV(在 Master 机器上操作,如果出错,此步需要重做)
echo '{
  "apiVersion": "v1",
  "kind": "PersistentVolume",
  "metadata": {
    "name": "cassandra-volume"
  },
  "spec": {
    "capacity": {
        "storage": "10Gi"
        },
    "accessModes": [ "ReadWriteOnce","ReadWriteMany" ],
    "nfs": {
        "path": "/srv/nfs/cassandra",
        "server": "nfs.example.com"
    }
  }
}' | oc create -f -

5.  给 metrics-deployer.yaml 参数赋值,并创建部署(在 Master 机器上操作)
oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=metrics.apps.example.com -v CASSANDRA_PV_SIZE=10Gi | oc create -f -
这一步时间较长,请耐心等待......

oc get pod -w
NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-2asw6   0/1       Running            0          2m
hawkular-metrics-p3wvl       0/1       CrashLoopBackOff   2          2m
heapster-we30w               0/1       Running            0          2m
metrics-deployer-0gv9q       1/1       Running            0          2m
oc logs hawkular-metrics-p3wvl
输出如下:
Error: the service account for Hawkular Metrics does not have permission to view resources in this namespace. View permissions are required for Hawkular Metrics to function properly.
Usually this can be resolved by running: oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra

于是按照提示,执行
oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra

执行以下命令清除,等待一分钟,然后重新把上述步骤做一遍
在 Master 机器上执行如下操作:
oc delete all --selector="metrics-infra"
oc delete sa --selector="metrics-infra"
oc delete templates --selector="metrics-infra"
oc delete secrets --selector="metrics-infra"
oc delete pvc --selector="metrics-infra"
oc delete pv cassandra-volume
oc delete sa metrics-deployer
oc delete secret metrics-deployer
在 Registry 机器上执行如下操作:
cd /srv/nfs/cassandra/
rm -rf *

6.  验证 Metrics 是否成功
以验证模式运行,该命令会重启部署一个新的 metrics-deployer pod
oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=metrics.apps.example.com -v CASSANDRA_PV_SIZE=10Gi -v MODE=validate | oc create -f -

查看 pod 日志,确定没有错误,且验证通过;如果有错,可以查看错误详细信息
oc logs metrics-deployer-sa2cq
输出如下:
......
  Will retry in 5 seconds.
========================
--- validate_deployment_artifacts ---
--- validate_deployed_project ---

VALIDATION SUCCEEDED
validate_nodes_accessible: ok
validate_deployment_artifacts: ok
validate_deployed_project:
Success!

诊断 MetricsApiProxy
oadm diagnostics MetricsApiProxy

curl -k -X GET https://`oc get pod $(oc get pods | grep -i hawkular-metrics | awk '{print $1}') -o template --template='{{.status.podIP}}'`:8443/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}

oc describe pod hawkular-metrics-mwn9i | grep -i IP
输出如下:
IP:            10.128.0.19

curl -k https://10.128.0.19:8443/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}

curl -k https://metrics.apps.example.com/hawkular/metrics/status
输出如下:
{"MetricsService":"STARTED","Implementation-Version":"0.21.5.Final-redhat-1","Built-From-Git-SHA1":"632f908a52d3e45b3a0bafa84e117ec6ca87bb19"}




没有评论: