SLUB:Unable to Allocate Memory

故障 如图:系统日志中报出不能分配内存 解决方法: 临时解决:重启相关节点 永久解决: 升级内核:3.10.0-1062.XXX.el7.x86_64 yum provides kernel yum install -y kernel-3.10.0-1062.9.1.el7.x86_64 awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg 添加内核参数 cgroup.memory=nokmem [root@acp2-node-1 ~]# cat /etc/default/grub GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL="serial console" GRUB_TERMINAL_OUTPUT="serial console" GRUB_CMDLINE_LINUX="crashkernel=auto cgroup.memory=nokmem console=ttyS0 console=tty0 panic=5 net.ifnames=0 biosdevname=0" GRUB_DISABLE_RECOVERY="true" GRUB_SERIAL_COMMAND="serial --speed=9600 --unit=0 --word=8 --parity=no --stop=1" [root@acp2-node-1 ~]# 保存退出,刷新grup菜单……

阅读全文

重启服务器导致docker中redis无法启动的问题解决

故障现像 harbor服务里的redis容器启动失败 [root@acp2-master-1 ~]# kubectl get po -n default NAME READY STATUS RESTARTS AGE docker-registry-fb854474f-jmwq5 1/1 Running 16 212d gitlab-ce-gitlab-ce-5c7b984fc-85clk 1/1 Running 8 9d gitlab-ce-gitlab-ce-database-8f7d789ff-hm2rf 1/1 Running 8 9d gitlab-ce-gitlab-ce-redis-c6b479b95-t5rjr 1/1 Running 8 9d harbor-harbor-chartmuseum-7bfd86c887-7dvnt 1/1 Running 0 33m harbor-harbor-clair-5d6bd4fdf-nxcw8 1/1 Running 3 33m harbor-harbor-core-d95c5f884-5x2cm 0/1 CrashLoopBackOff 5 14m harbor-harbor-database-0 1/1 Running 0 32m harbor-harbor-jobservice-f5d9c4995-nh8qk 1/1 Running 6 14m harbor-harbor-nginx-774f9569cb-njxtn 1/1 Running 0 33m harbor-harbor-notary-server-867d58d99f-hdq8v 1/1 Running 0 33m harbor-harbor-notary-signer-6f6955b4fc-z99sh 1/1 Running 0 33m harbor-harbor-portal-65fd74dcbb-8pctx 1/1 Running 0 33m harbor-harbor-redis-bd8dbdf49-kttnr 0/1 CrashLoopBackOff 6 7m36s harbor-harbor-registry-7bbf6cb89f-ncxz9 2/2 Running 0 33m 日志报错 [root@acp2-master-1 ~]# kubectl logs --tail 20 -f -n default harbor-harbor-redis-bd8dbdf49-kttnr ( ' , .-` | `, ) Running in standalone mode |`-._`-...-` __...-.``-._|'` _.-'|……

阅读全文

使用etcd快照恢复集群数据

备份etcd及相关证书 #!/bin/bash set -eux mkdir -p /cpaas/{etcd_bak,pki_bak} BACKUP_ETC_DIR=/cpaas/etcd_bak BACKUP_PKI_DIR=/cpaas/pki_bak/ IP=`/usr/sbin/ifconfig eth0 | grep -w 'inet' | awk '{print $2}'` ETCDCTL=/usr/local/bin/etcdctl TAR=/usr/bin/tar backup_etcd() { ETCDCTL_API=3 ${ETCDCTL} --endpoints ${IP}:2379 \ --cert="/etc/kubernetes/pki/etcd/server.crt" \ --key="/etc/kubernetes/pki/etcd/server.key" \ --cacert="/etc/kubernetes/pki/etcd/ca.crt" \ snapshot save ${BACKUP_ETC_DIR}/snap-$(date +%Y%m%d%H%M).db } backup_pki() { ${TAR} -cvf ${BACKUP_PKI_DIR}pki-$(date +%Y%m%d%H%M).tar /etc/kubernetes/pki/ } del_backup() { find ${BACKUP_ETC_DIR} -mtime +5 -a -name '*.db' | xargs rm -rf find ${BACKUP_PKI_DIR} -mtime +5 -a -name '*.tar' | xargs rm -rf } backup_etcd ## 备份etcd backup_pki ## 备份证书 del_backup ## 删除7天前日志 恢复 ***注意:***恢复的顺序是 global 集群 、 业务集群。如果只有业务集群升……

阅读全文

理解和配置Out of Memory: Kill Process

理解 OOM killer 最近有位 VPS 客户抱怨 MySQL 无缘无故挂掉,还有位客户抱怨 VPS 经常死机,登陆到终端看了一下,都是常见的 Out of memory 问题。这通常是因为某时刻应用程序大量请求内存导致系统内存不足造成的,这通常会触发 Linux 内核里的 Out of Memory (OOM) killer,OOM killer 会杀掉某个进程以腾出内存留给系统用,不致于让系统立刻崩溃……

阅读全文

Captain 运维手册

captain 安装 安装包 helm install --version <captain chart version> --debug --namespace=<ns> --set global.registry.address=<init registry> --set alaudaChartRepoURL=<init 节点上的 chart repo> --set namespace=<ns> --name=captain stable/captain --wait --timeout 3000 kubectl-captain 在安装目录下的 other 目录里 安装要求 软件依赖 captain 依赖 cert-manager ,必须在 cert-manager 部署成功后,安装 captain 硬件依赖 均可 用 captain 替换 helm 已经通过 helm 部署的 chart 怎样迁移到 helm 上 详见升级说明文档,大致流程如下: 通过 kubectl captain 命令或者直接创建 helmrequest 资源,hr 资源存在 global 的集群内……

阅读全文

Helm常用命令手册

Helm 常用命令 查看版本 helm version 查看当前安装的charts helm list 查询 charts helm search nginx 下载远程安装包到本地 helm fetch rancher-stable/rancher 查看package详细信息 helm inspect chart 安装charts #helm install --name nginx --namespaces prod bitnami/nginx 查看charts状态 #helm status nginx 删除charts #helm delete --purge nginx 增加repo #helm repo add stable https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts #helm repo add --username admin --password password myps https://harbor.pt1.cn/chartrepo/charts 更新repo仓库资源 #helm repo update 创建char……

阅读全文

升级docker1.12.6到18.09,并切换存储direct-lvm到overlay2

先确保操作系统内核为:3.10.0.862及其以上版本,可通过 uname -r 查看 停止并卸载docker systemctl stop docker && yum remove -y docker* 删除docker存储,并删除docker目录 vgremove docker #若这一步报devicebusy,则reboot节点之后重新vgremove。 pvremove /dev/*** rm -rf /var/lib/docker/* 格式化overlay2存储 mkfs.xfs -n ftype=1 /dev/*** mount……

阅读全文

overlay2 use xfs filesystem cause system hang

日志报错 报错信息 [166973.065674] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250) [166974.848634] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250) [166976.857584] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250) [166978.697604] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250) [166980.524526] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250) [166982.529419] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250) [166984.534372] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250) 排查问题 排查 https://access.redhat.com/solutions/532663 决议 This is a long standing issue with xfs and highly fragmented files. Our engineering team is working on a long term resolution for this issue. Workarounds There are several solutions that can be used to avoid high file fragmentation: Preallocate the space to be used by the file with unwritten extents. This gives……

阅读全文

K8s环境使用老IP添加一个新的master节点

注:以下为k8s 1.16版本,并且是新加节点,无备份的操作 备份 配置etcd 3 docker cp `docker ps |grep etcd |grep -v pause |awk '{print $1}'`:/usr/local/bin/etcdctl /tmp/ export `cat /etc/kubernetes/manifests/etcd.yaml |grep ETCDCTL_API -A1 |xargs |sed 's/^.//g' |awk '{print $1 }'` ;echo $ETCDCTL_API 获取etcd指令: export etcdctl=`cat /etc/kubernetes/manifests/etcd.yaml |grep ETCDCTL_API -A1 |xargs |sed 's/^.//g' |sed 's/ETCDCTL_API=3 /\/tmp\//g'` 备份etcd:(最好/etc/kubernetes目录) mkdir /back ; cd /back ; $etcdctl snapshot save snapshot.db 根据ID删除坏的etcd节点:……

阅读全文

k8s1.13证书升级(包含etcd证书)

一、etcd备份: 脚本中对k8s做了备份,但是没有对etcd数据做备份,需要对etcd数据做备份。 参考etcd(V3版api)备份和恢复 二 、master节点k8s证书更换(分别在每个master上执行即可) 脚本最好放在一个空目录下执行 #!/bin/bash if [ ! -d "/root/tmp/" ]; then mkdir /root/tmp/ fi cp -rf /etc/kubernetes /root/tmp/kubernetes_`date '+%Y%m%d_%H.%M.%S'` #更换apis……

阅读全文