Docker 的资源限制

Docker 资源限制

容器资源限制介绍

官方文档: https://docs.docker.com/config/containers/resource_constraints/

默认情况下，容器没有资源的使用限制，可以使用主机内核调度程序允许的尽可能多的资源

docker 提供了控制容器使用资源的方法,可以限制容器使用多少内存或 CPU等，在docker run 命令的运行时配置标志实现资源限制功能

其中许多功能都要求宿主机的内核支持，要检查是否支持这些功能，可以使用docker info 命令，如果内核中的某项特性可能会在输出结尾处看到警告，如下所示:

WARNING: No swap limit support #没有启用 swap 限制功能会出现此提示警报

可通过修改内核参数消除以上警告

官方文档: https://docs.docker.com/install/linux/linux-postinstall/#your-kernel-does-not-support-cgroup-swap-limit-capabilities

范例: 修改内核参数消除以上警告

[root@ubuntu1804 ~]#docker info
Client:
Debug Mode: false

Server:
....
WARNING: No swap limit support

#修改内核参数
[root@ubuntu1804 ~]#vim /etc/default/grub
GRUB_CMDLINE_LINUX="cgroup_enable=memory net.ifnames=0 swapaccount=1"
[root@ubuntu1804 ~]#update-grub
[root@ubuntu1804 ~]#reboot
[root@ubuntu1804 ~]#docker info

OOM （Out of Memory Exception）

对于 Linux 主机，如果没有足够的内存来执行其他重要的系统任务，将会抛出OOM (Out of MemoryException,内存溢出、内存泄漏、内存异常 )，随后系统会开始杀死进程以释放内存，凡是运行在宿主机的进程都有可能被 kill ，包括 Dockerd和其它的应用程序，如果重要的系统进程被 Kill，会导致和该进程相关的服务全部宕机。通常越消耗内存比较大的应用越容易被kill，比如: MySQL数据库，Java程序等

产生 OOM 异常时， Dockerd尝试通过调整 Docker 守护程序上的 OOM 优先级来减轻这些风险，以便它比系统上的其他进程更不可能被杀死但是每个容器的 OOM 优先级并未调整，这使得单个容器被杀死的可能性比 Docker守护程序或其他系统进程被杀死的可能性更大，不推荐通过在守护程序或容器上手动设置-- oom -score-adj为极端负数，或通过在容器上设置 -- oom-kill-disable来绕过这些安全措施

OOM 优先级机制

linux会为每个进程计算一个分数，最终将分数最高的kill

/proc/PID/oom_score_adj
#范围为 -1000 到 1000，值越高容易被宿主机 kill掉，如果将该值设置为 -1000 ，则进程永远不会被宿主机 kernel kill

/proc/PID/oom_adj
#该设置参数是为了和旧版本的 Linux 内核兼容的旧接口文件，范围为 -17 到+15 ，会被线性映射到oom_score_adj，取值越高越容易被干掉，如果是 -17 ， 则表示不能被 kill， root可读写，

/proc/PID/oom_score
#这个值是系统综合进程的内存消耗量，只读文件，取值范围0 –- 1000，0代表never kill，1000代表always kill，值越大，进程被选中的概率越大。
#oom_score = 占用消耗内存/总内存 *1000
#内存消耗=常驻内存RSS + 进程页面 +交换内存
#总内存=总的物理内存 +交换分区
#消耗内存越多得分越高，容易被宿主机 kernel 强制杀死

#当内存紧张的时候，内核通过 oom = oom_score + oom_score_adj 计算出分数最高的进程，向其发送关闭信号

范例: 查看OOM相关值

#按内存排序
[root@ubuntu1804 ~]#top
top - 20:15:38 up 5:53, 3 users, load average: 0.00, 0.00, 0.00
Tasks: 191 total, 1 running, 116 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 985104 total, 310592 free, 448296 used, 226216 buff/cache
KiB Swap: 1951740 total, 1892860 free, 58880 used. 384680 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19674 2019 20 0 2241656 94684 12452 S 0.0 9.6 0:16.05 java
19675 2019 20 0 2235512 74816 12440 S 0.0 7.6 0:14.89 java
19860 99 20 0 183212 67748 960 S 0.0 6.9 0:01.15 haproxy
4969 root 20 0 937880 49352 12612 S 0.0 5.0 0:46.07 dockerd
2981 root 20 0 793072 13552 1808 S 0.0 1.4 0:13.78 containerd
500 root 19 -1 78560 7552 7112 S 0.0 0.8 0:01.45 systemd-
journal
798 root 20 0 170416 6604 4084 S 0.0 0.7 0:00.77 networkd-
dispat
1 root 20 0 78036 6200 4416 S 0.0 0.6 0:05.39 systemd

[root@ubuntu1804 ~]#cat /proc/19674/oom_adj
0
[root@ubuntu1804 ~]#cat /proc/19674/oom_score
32
[root@ubuntu1804 ~]#cat /proc/19674/oom_score_adj
0
[root@ubuntu1804 ~]#cat /proc/7108/oom_adj
0
[root@ubuntu1804 ~]#cat /proc/7108/oom_score
1
[root@ubuntu1804 ~]#cat /proc/7108/oom_score_adj
0

#docker服务进程的OOM默认值
[root@ubuntu1804 ~]#cat /proc/`pidof dockerd`/oom_adj
-8
[root@ubuntu1804 ~]#cat /proc/`pidof dockerd`/oom_score
0
[root@ubuntu1804 ~]#cat /proc/`pidof dockerd`/oom_score_adj
-500

Stress-ng 压力测试工具

Stress-ng 工具介绍

stress-ng 是一个压力测试工具，可以通过软件仓库进行安装，也提供了docker版本的容器

官方链接：https://kernel.ubuntu.com/~cking/stress-ng/

官方文档：https://wiki.ubuntu.com/Kernel/Reference/stress-ng

Stress-ng 安装

范例: 软件包方式安装

[root@centos7 ~]#yum -y install stress-ng
[root@ubuntu1804 ~]#apt update && apt -y install stress-ng

范例: 容器方式安装

[root@ubuntu1804 ~]#docker pull lorel/docker-stress-ng
Using default tag: latest
latest: Pulling from lorel/docker-stress-ng
Image docker.io/lorel/docker-stress-ng:latest uses outdated schema1 manifest
format. Please upgrade to a schema2 image for better future compatibility. More
information at https://docs.docker.com/registry/spec/deprecated-schema-v1/
c52e3ed763ff: Pull complete
a3ed95caeb02: Pull complete
7f831269c70e: Pull complete
Digest: sha256:c8776b750869e274b340f8e8eb9a7d8fb2472edd5b25ff5b7d55728bca681322
Status: Downloaded newer image for lorel/docker-stress-ng:latest
docker.io/lorel/docker-stress-ng:latest
[root@ubuntu1804 ~]#docker images
REPOSITORY TAG IMAGE ID CREATED
SIZE
lorel/docker-stress-ng latest 1ae56ccafe55 3 years ago
8.1MB

Stress-ng 使用

范例: 查看帮助

[root@ubuntu1804 ~]#docker run -it --rm lorel/docker-stress-ng
stress-ng, version 0.03.11
Usage: stress-ng [OPTION [ARG]]
--h, --help show help
--affinity N start N workers that rapidly change CPU affinity
--affinity-ops N stop when N affinity bogo operations completed
--affinity-rand change affinity randomly rather than sequentially
--aio N start N workers that issue async I/O requests
--aio-ops N stop when N bogo async I/O requests completed
--aio-requests N number of async I/O requests per worker
-a N, --all N start N workers of each stress test
-b N, --backoff N wait of N microseconds before work starts
-B N, --bigheap N start N workers that grow the heap using calloc()
--bigheap-ops N stop when N bogo bigheap operations completed
--bigheap-growth N grow heap by N bytes per iteration
--brk N start N workers performing rapid brk calls
--brk-ops N stop when N brk bogo operations completed
--brk-notouch don't touch (page in) new data segment page
--bsearch start N workers that exercise a binary search
--bsearch-ops stop when N binary search bogo operations completed
--bsearch-size number of 32 bit integers to bsearch
-C N, --cache N start N CPU cache thrashing workers
--cache-ops N stop when N cache bogo operations completed (x86 only)
--cache-flush flush cache after every memory write (x86 only)
--cache-fence serialize stores
--class name specify a class of stressors, use with --sequential
--chmod N start N workers thrashing chmod file mode bits
--chmod-ops N stop chmod workers after N bogo operations
-c N, --cpu N start N workers spinning on sqrt(rand())
--cpu-ops N stop when N cpu bogo operations completed
-l P, --cpu-load P load CPU by P %%, 0=sleep, 100=full load (see -c)
--cpu-method m specify stress cpu method m, default is all
-D N, --dentry N start N dentry thrashing processes
--dentry-ops N stop when N dentry bogo operations completed
--dentry-order O specify dentry unlink order (reverse, forward, stride)
--dentries N create N dentries per iteration
--dir N start N directory thrashing processes
--dir-ops N stop when N directory bogo operations completed
-n, --dry-run do not run
--dup N start N workers exercising dup/close
--dup-ops N stop when N dup/close bogo operations completed
--epoll N start N workers doing epoll handled socket activity
--epoll-ops N stop when N epoll bogo operations completed
--epoll-port P use socket ports P upwards
--epoll-domain D specify socket domain, default is unix
--eventfd N start N workers stressing eventfd read/writes
--eventfd-ops N stop eventfd workers after N bogo operations
--fault N start N workers producing page faults
--fault-ops N stop when N page fault bogo operations completed
--fifo N start N workers exercising fifo I/O
--fifo-ops N stop when N fifo bogo operations completed
--fifo-readers N number of fifo reader processes to start
--flock N start N workers locking a single file
--flock-ops N stop when N flock bogo operations completed
-f N, --fork N start N workers spinning on fork() and exit()
--fork-ops N stop when N fork bogo operations completed
--fork-max P create P processes per iteration, default is 1
--fstat N start N workers exercising fstat on files
--fstat-ops N stop when N fstat bogo operations completed
--fstat-dir path fstat files in the specified directory
--futex N start N workers exercising a fast mutex
--futex-ops N stop when N fast mutex bogo operations completed
--get N start N workers exercising the get*() system calls
--get-ops N stop when N get bogo operations completed
-d N, --hdd N start N workers spinning on write()/unlink()
--hdd-ops N stop when N hdd bogo operations completed
--hdd-bytes N write N bytes per hdd worker (default is 1GB)
--hdd-direct minimize cache effects of the I/O
--hdd-dsync equivalent to a write followed by fdatasync
--hdd-noatime do not update the file last access time
--hdd-sync equivalent to a write followed by fsync
--hdd-write-size N set the default write size to N bytes
--hsearch start N workers that exercise a hash table search
--hsearch-ops stop when N hash search bogo operations completed
--hsearch-size number of integers to insert into hash table
--inotify N start N workers exercising inotify events
--inotify-ops N stop inotify workers after N bogo operations
-i N, --io N start N workers spinning on sync()
--io-ops N stop when N io bogo operations completed
--ionice-class C specify ionice class (idle, besteffort, realtime)
--ionice-level L specify ionice level (0 max, 7 min)
-k, --keep-name keep stress process names to be 'stress-ng'
--kill N start N workers killing with SIGUSR1
--kill-ops N stop when N kill bogo operations completed
--lease N start N workers holding and breaking a lease
--lease-ops N stop when N lease bogo operations completed
--lease-breakers N number of lease breaking processes to start
--link N start N workers creating hard links
--link-ops N stop when N link bogo operations completed
--lsearch start N workers that exercise a linear search
--lsearch-ops stop when N linear search bogo operations completed
--lsearch-size number of 32 bit integers to lsearch
-M, --metrics print pseudo metrics of activity
--metrics-brief enable metrics and only show non-zero results
--memcpy N start N workers performing memory copies
--memcpy-ops N stop when N memcpy bogo operations completed
--mmap N start N workers stressing mmap and munmap
--mmap-ops N stop when N mmap bogo operations completed
--mmap-async using asynchronous msyncs for file based mmap
--mmap-bytes N mmap and munmap N bytes for each stress iteration
--mmap-file mmap onto a file using synchronous msyncs
--mmap-mprotect enable mmap mprotect stressing
--msg N start N workers passing messages using System V
messages
--msg-ops N stop msg workers after N bogo messages completed
--mq N start N workers passing messages using POSIX messages
--mq-ops N stop mq workers after N bogo messages completed
--mq-size N specify the size of the POSIX message queue
--nice N start N workers that randomly re-adjust nice levels
--nice-ops N stop when N nice bogo operations completed
--no-madvise don't use random madvise options for each mmap
--null N start N workers writing to /dev/null
--null-ops N stop when N /dev/null bogo write operations completed
-o, --open N start N workers exercising open/close
--open-ops N stop when N open/close bogo operations completed
-p N, --pipe N start N workers exercising pipe I/O
--pipe-ops N stop when N pipe I/O bogo operations completed
-P N, --poll N start N workers exercising zero timeout polling
--poll-ops N stop when N poll bogo operations completed
--procfs N start N workers reading portions of /proc
--procfs-ops N stop procfs workers after N bogo read operations
--pthread N start N workers that create multiple threads
--pthread-ops N stop pthread workers after N bogo threads created
--pthread-max P create P threads at a time by each worker
-Q, --qsort N start N workers exercising qsort on 32 bit random
integers
--qsort-ops N stop when N qsort bogo operations completed
--qsort-size N number of 32 bit integers to sort
-q, --quiet quiet output
-r, --random N start N random workers
--rdrand N start N workers exercising rdrand instruction (x86
only)
--rdrand-ops N stop when N rdrand bogo operations completed
-R, --rename N start N workers exercising file renames
--rename-ops N stop when N rename bogo operations completed
--sched type set scheduler type
--sched-prio N set scheduler priority level N
--seek N start N workers performing random seek r/w IO
--seek-ops N stop when N seek bogo operations completed
--seek-size N length of file to do random I/O upon
--sem N start N workers doing semaphore operations
--sem-ops N stop when N semaphore bogo operations completed
--sem-procs N number of processes to start per worker
--sendfile N start N workers exercising sendfile
--sendfile-ops N stop after N bogo sendfile operations
--sendfile-size N size of data to be sent with sendfile
--sequential N run all stressors one by one, invoking N of them
--sigfd N start N workers reading signals via signalfd reads
--sigfd-ops N stop when N bogo signalfd reads completed
--sigfpe N start N workers generating floating point math faults
--sigfpe-ops N stop when N bogo floating point math faults completed
--sigsegv N start N workers generating segmentation faults
--sigsegv-ops N stop when N bogo segmentation faults completed
-S N, --sock N start N workers doing socket activity
--sock-ops N stop when N socket bogo operations completed
--sock-port P use socket ports P to P + number of workers - 1
--sock-domain D specify socket domain, default is ipv4
--stack N start N workers generating stack overflows
--stack-ops N stop when N bogo stack overflows completed
-s N, --switch N start N workers doing rapid context switches
--switch-ops N stop when N context switch bogo operations completed
--symlink N start N workers creating symbolic links
--symlink-ops N stop when N symbolic link bogo operations completed
--sysinfo N start N workers reading system information
--sysinfo-ops N stop when sysinfo bogo operations completed
-t N, --timeout N timeout after N seconds
-T N, --timer N start N workers producing timer events
--timer-ops N stop when N timer bogo events completed
--timer-freq F run timer(s) at F Hz, range 1000 to 1000000000
--tsearch start N workers that exercise a tree search
--tsearch-ops stop when N tree search bogo operations completed
--tsearch-size number of 32 bit integers to tsearch
--times show run time summary at end of the run
-u N, --urandom N start N workers reading /dev/urandom
--urandom-ops N stop when N urandom bogo read operations completed
--utime N start N workers updating file timestamps
--utime-ops N stop after N utime bogo operations completed
--utime-fsync force utime meta data sync to the file system
-v, --verbose verbose output
--verify verify results (not available on all tests)
-V, --version show version
-m N, --vm N start N workers spinning on anonymous mmap
--vm-bytes N allocate N bytes per vm worker (default 256MB)
--vm-hang N sleep N seconds before freeing memory
--vm-keep redirty memory instead of reallocating
--vm-ops N stop when N vm bogo operations completed
--vm-locked lock the pages of the mapped region into memory
--vm-method m specify stress vm method m, default is all
--vm-populate populate (prefault) page tables for a mapping
--wait N start N workers waiting on child being stop/resumed
--wait-ops N stop when N bogo wait operations completed
--zero N start N workers reading /dev/zero
--zero-ops N stop when N /dev/zero bogo read operations completed
Example: stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 10s
Note: Sizes can be suffixed with B,K,M,G and times with s,m,h,d,y

容器的内存限制

Docker 可以强制执行硬性内存限制，即只允许容器使用给定的内存大小。

Docker 也可以执行非硬性内存限制，即容器可以使用尽可能多的内存，除非内核检测到主机上的内存不够用了

docker run 运行容器使用以下设置大部分的选项取正整数，跟着一个后缀 b ，k， m， g ，表示字节，千字节，兆字节或千兆字节

选项	描述
`-m`, `--memory=`	容器可以使用的最大物理内存量，硬限制。此选项最小允许值为 `4m` (4 MB)，此项较常用。
`--memory-swap`	允许此容器交换到磁盘的内存量。必须先用 `-m` 对内存限制后才可以使用，详细说明见下方。
`--memory-swappiness`	设置容器使用交换分区的倾向性。值越高表示越倾向于使用 swap 分区，范围为 0-100。0 为能不用就不用，100 为能用就用，N 表示内存使用率达到 N% 时，就会使用 swap 空间。
`--memory-reservation`	允许指定小于 `--memory` 的软限制。当 Docker 检测到主机上的争用或内存不足时会激活该限制。如果使用该选项，则必须将其设置为低于 `--memory` 才能使其优先生效。因为它核心是软限制，所以不能保证容器不超过该限制。
`--kernel-memory`	容器可以使用的最大内核内存量，最小为 `4m`。由于内核内存与用户空间内存隔离，因此无法与用户空间内存直接交换。内核内存不足的容器可能会阻塞宿主机资源，对主机或其他容器产生影响，因此不建议设置内核内存大小。
`--oom-kill-disable`	默认情况下，如果发生内存不足 (OOM) 错误，则内核将终止容器中的进程。要更改此行为，请使用该选项。建议仅在设置了 `-m`/`--memory` 选项的容器上禁用 OOM。如果未设置 `-m` 标志，则主机可能会用完内存，内核可能需要终止主机系统的进程以释放内存。

范例:

[root@ubuntu1804 ~]#docker run -e MYSQL_ROOT_PASSWORD=123456 -it --rm -m 1g --oom-kill-disable mysql:5.7.30
2020-02-04 13:11:54+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL
Server 5.7.29-1debian9 started.
2020-02-04 13:11:54+00:00 [Note] [Entrypoint]: Switching to dedicated user
'mysql'
2020-02-04 13:11:54+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL
Server 5.7.29-1debian9 started.
2020-02-04 13:11:54+00:00 [Note] [Entrypoint]: Initializing database files
......
Version: '5.7.29' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL
Community Server (GPL)

范例:

[root@ubuntu1804 ~]#sysctl -a |grep swappiness
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.docker0.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.swappiness = 60

Swap 限制

kubernets 对swap的要求

K8s 1.8.3更新日志:

宿主机开启交换分区，会在安装之前的预检查环节提示相应错误信息: https://github.com/kubernetes/kubernetes/blob/release-1.8/CHANGELOG-1.8.md

docker run 命令可以使用--memory-swap 选项控制swap的使用

--memory-swap #只有在设置了 --memory 后才会有意义。使用 Swap,可以让容器将超出限制部分的内存置换到磁盘上，WARNING: 经常将内存交换到磁盘的应用程序会降低性能

不同的--memory-swap 设置会产生不同的效果:

--memory-swap	--memory	功能
正数 S	正数 M	容器可用内存总空间为 S，其中 RAM 为 M，Swap 为 S-M。若 S=M，则无可用 Swap 资源。
0	正数 M	相当于未设置 Swap (unset)。
unset	正数 M	若主机 (Docker Host) 启用了 Swap，则容器的可角用 Swap 为 2*M。
-1	正数 M	若主机 (Docker Host) 启用了 Swap，则容器可使用最大至主机上所有 Swap 空间。

-memory-swap #值为正数， 那么--memory 和--memory-swap 都必须要设置，--memory-swap 表示你能使用的内存和 swap 分区大小的总和，例如: --memory=300m, --memory-swap=1g, 那么该容器能够使用 300m 物理内存和 700m swap，即--memory 是实际物理内存大小值不变，而 swap 的实际大小计算方式为(--memory-swap)-(--memory)=容器可用 swap
--memory-swap #如果设置为 0，则忽略该设置，并将该值视为未设置，即未设置交换分区
--memory-swap #如果等于--memory 的值，并且--memory 设置为正整数，容器无权访问 swap
-memory-swap #如果未设置，如果宿主机开启了 swap，则实际容器的swap 值最大为 2x( --memory)，即两倍于物理内存大小，例如，如果--memory="300m"与--memory-swap没有设置，该容器可以使用300m总的内存和600m交撒空间,但是并不准确(在容器中使用free 命令所看到的 swap 空间并不精确，毕竟每个容器都可以看到具体大小，宿主机的 swap 是有上限的，而且不是所有容器看到的累计大小)
--memory-swap #如果设置为-1，如果宿主机开启了 swap，则容器可以使用主机上 swap 的最大空间

注意: 在容器中执行free命令看到的是宿主机的内存和swap使用，而非容器自身的swap使用情况

范例: 在容器中查看内存

[root@ubuntu1804 ~]#free
 		 total used free shared buff/cache available
Mem: 3049484 278484 1352932 10384 1418068 2598932
Swap: 1951740 0 1951740
[root@ubuntu1804 ~]#docker run -it --rm -m 2G centos:centos7.7.1908 bash
[root@f5d387b5022f /]# free
		total used free shared buff/cache available
Mem: 3049484 310312 1320884 10544 1418288 2566872
Swap: 1951740 0 1951740
[root@f5d387b5022f /]#

使用 stress-ng 测试内存限制

假如一个容器未做内存使用限制，则该容器可以利用到系统内存最大空间，默认创建的容器没有做内存资源限制。

范例: 默认一个workers 分配256M内存，2个即占512M内存

[root@ubuntu1804 ~]#docker run --name c1 -it --rm lorel/docker-stress-ng --vm 2
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

#因上一个命令是前台执行，下面在另一个终端窗口中执行，可以看到占用512M左右内存
[root@ubuntu1804 ~]#docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
fd184869ff7e c1 91.00% 524.3MiB / 962MiB
54.50% 766B / 0B 860kB / 0B 5

范例: 指定内存最大值

[root@ubuntu1804 ~]#docker run --name c1 -it --rm -m 300m lorel/docker-stress-ng --vm 2
WARNING: Your kernel does not support swap limit capabilities or the cgroup is
not mounted. Memory limited without swap.
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

[root@ubuntu1804 ~]#vim /etc/default/grub
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1 net.ifnames=0"
[root@ubuntu1804 ~]#update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-29-generic
Found initrd image: /boot/initrd.img-4.15.0-29-generic
done
[root@ubuntu1804 ~]#reboot
[root@ubuntu1804 ~]#docker run --name c1 -it --rm -m 300m lorel/docker-stress-ng --vm 2
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

#在另一个终端窗口执行
[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
6a93f6b22034 c1 27.06% 297.2MiB / 300MiB
99.07% 1.45kB / 0B 4.98GB / 5.44GB 5

范例: 容器占用内存造成OOM

[root@ubuntu1804 ~]#docker run -it --rm --name c1 lorel/docker-stress-ng --vm 6
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 6 vm
#另一个终端窗中同时执行下面命令
[root@ubuntu1804 ~]#docker run -it --rm --name c2 lorel/docker-stress-ng --vm 6
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 6 vm

[root@ubuntu1804 ~]#docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
f33cebf5b55d c2 -- -- / --
-- -- -- --
b14b597c5a4f cool_banach -- -- / --
-- -- -- --

#观察日志出现OOM现象
[root@ubuntu1804 ~]#tail /var/log/syslog
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.928835] [ 2575] 0 2575 67104
39218 544768 22906 1000 stress-ng-vm
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.928836] [ 2594] 0 2594 67104
37503 409600 7725 1000 stress-ng-vm
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.928837] [ 2601] 0 2601 67104
38815 438272 9779 1000 stress-ng-vm
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.928838] [ 2602] 0 2602 1568
861 49152 0 1000 stress-ng-vm
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.928839] [ 2610] 0 2610 1568
861 49152 0 1000 stress-ng-vm
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.928840] [ 2614] 0 2614 1157
174 53248 0 0 update-motd-hwe
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.928841] [ 2615] 0 2615 3100
15 61440 0 0 apt-config
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.928842] Out of memory: Kill process
2570 (stress-ng-vm) score 1090 or sacrifice child
Feb 4 22:59:40 ubuntu1804 kernel: [ 785.929493] Killed process 2570 (stress-
ng-vm) total-vm:268416kB, anon-rss:170352kB, file-rss:632kB, shmem-rss:28kB
Feb 4 22:59:40 ubuntu1804 kernel: [ 786.018319] oom_reaper: reaped process
2570 (stress-ng-vm), now anon-rss:0kB, file-rss:0kB, shmem-rss:28kB

范例: 查看内存限制

#启动两个工作进程，每个工作进程最大允许使用内存 256M，且宿主机不限制当前容器最大内存
[root@ubuntu1804 ~]#docker run -it --name c1 --rm lorel/docker-stress-ng --vm 2
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

[root@ubuntu1804 ~]#docker ps -a
CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
13e46172e1ae lorel/docker-stress-ng "/usr/bin/stress-ng …" 24 seconds
ago Up 22 seconds 						gallant_moore

#Ubuntu22.04新版不支持,Rocky8和Ubuntu18.04支持
[root@ubuntu1804 ~]#ls /sys/fs/cgroup/memory/docker/
13e46172e1ae8593569f05a3bebc7b41b7839da44369d43b29102661364ac2cd
memory.kmem.tcp.limit_in_bytes memory.numa_stat
cgroup.clone_children
memory.kmem.tcp.max_usage_in_bytes memory.oom_control
cgroup.event_control
memory.kmem.tcp.usage_in_bytes memory.pressure_level
cgroup.procs
memory.kmem.usage_in_bytes memory.soft_limit_in_bytes
memory.failcnt
memory.limit_in_bytes memory.stat
memory.force_empty
memory.max_usage_in_bytes memory.swappiness
memory.kmem.failcnt
memory.memsw.failcnt memory.usage_in_bytes
memory.kmem.limit_in_bytes
memory.memsw.limit_in_bytes memory.use_hierarchy
memory.kmem.max_usage_in_bytes
memory.memsw.max_usage_in_bytes notify_on_release
memory.kmem.slabinfo
memory.memsw.usage_in_bytes tasks
memory.kmem.tcp.failcnt
memory.move_charge_at_immigrate

[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/13e46172e1ae8593569f05a3bebc7b41b7839da44369d43b29102661364ac2cd/memory.limit_in_bytes
9223372036854771712
[root@ubuntu1804 ~]#echo 2^63|bc
9223372036854775808

范例: 内存限制200m

#宿主机限制容器最大内存使用:
[root@ubuntu1804 ~]#docker run -it --rm --name c1 -m 200M lorel/docker-stress-ng --vm 2 --vm-bytes 256M
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
f69729b2acc1 sleepy_haibt 85.71% 198MiB / 200MiB
98.98% 1.05kB / 0B 697MB / 60.4GB 	5

[root@ubuntu1804 ~]#
#查看宿主机基于 cgroup 对容器进行内存资源的大小限制

[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/f69729b2acc16e032658a4efdab64d21ff97dcb6746d1cef451ed82d5c98a81f/memory.limit_in_bytes
209715200


[root@ubuntu1804 ~]#echo 209715200/1024/1024|bc
200

#动态修改内存限制
[root@ubuntu1804 ~]#echo 300*1024*1024|bc
314572800

[root@ubuntu1804 ~]#echo 314572800 > /sys/fs/cgroup/memory/docker/f69729b2acc16e032658a4efdab64d21ff97dcb6746d1cef451ed82d5c98a81f/memory.limit_in_bytes
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/f69729b2acc16e032658a4efdab64d21ff97dcb6746d1cef451ed82d5c98a81f/memory.limit_in_bytes
314572800
[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
f69729b2acc1 sleepy_haibt 76.69% 297.9MiB / 300MiB
99.31% 1.05kB / 0B 1.11GB / 89.1GB 5

#通过echo 命令可以改内存限制的值，但是可以在原基础之上增大内存限制，缩小内存限制会报错write
error: Device or resource busy
[root@ubuntu1804 ~]#echo 209715200 > /sys/fs/cgroup/memory/docker/f69729b2acc16e032658a4efdab64d21ff97dcb6746d1cef451ed82d5c98a81f/memory.limit_in_bytes
-bash: echo: write error: Device or resource busy
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/f69729b2acc16e032658a4efdab64d21ff97dcb6746d1cef451ed82d5c98a81f/memory.limit_in_bytes
314572800

范例: 内存大小软限制

[root@ubuntu1804 ~]#docker run -it --rm -m 256m --memory-reservation 128m --name c1 lorel/docker-stress-ng --vm 2 --vm-bytes 256M
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
aeb38acde581 c1 72.45% 253.9MiB / 256MiB 99.20%
	976B / 0B 9.47GB / 39.4GB 5

#查看硬限制
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/aeb38acde58155d421f998a54e9a99ab60635fe00c9070da050cc49a2f62d274/memory.limit_in_bytes
268435456

#查看软限制
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/aeb38acde58155d421f998a54e9a99ab60635fe00c9070da050cc49a2f62d274/memory.soft_limit_in_bytes
134217728
#软限制不能高于硬限制
[root@ubuntu1804 ~]#docker run -it --rm --name c1 -m 256m --memory-reservation 257m --name c1 lorel/docker-stress-ng --vm 2 --vm-bytes 256M
docker: Error response from daemon: Minimum memory limit can not be less than
memory reservation limit, see usage.
See 'docker run --help'.

关闭OOM 机制

# docker run -it --rm -m 256m --oom-kill-disable --name c1 lorel/docker-stress-ng --vm 2 --vm-bytes 256M
# cat /sys/fs/cgroup/memory/docker/容器 ID/memory.oom_control
oom_kill_disable 1
under_oom 1
oom_kill 0

范例: 关闭OOM机制

#查看docker OOM机制默认值
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/memory.oom_control
oom_kill_disable 0
under_oom 0
oom_kill 0

#启动容器时关闭OOM机制
[root@ubuntu1804 ~]#docker run -it --rm -m 200m --name c1 --oom-kill-disable lorel/docker-stress-ng --vm 2 --vm-bytes 256M
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
b655d88228c0 silly_borg 0.00% 197.2MiB / 200MiB
98.58% 1.31kB / 0B 1.84MB / 484MB 5

[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/b655d88228c04d7db6a6ad833ed3d05d4cd596ef09834382e17942db0295dc0c/memory.oom_control
oom_kill_disable 1
under_oom 1
oom_kill 0
[root@ubuntu1804 ~]#

交换分区限制:

# docker run -it --rm -m 256m --memory-swap 512m --name c1 centos bash
# cat /sys/fs/cgroup/memory/docker/容器 ID/memory.memsw.limit_in_bytes 536870912
#返回值

范例:

[root@ubuntu1804 ~]#docker run -it --rm --name c1 -m 200m --memory-swap 512m lorel/docker-stress-ng --vm 2
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 2 vm

#宿主机cgroup验证:
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/memory/docker/23733a0cafa21f3e94ca8c96110978b12e53076261f1b92fd2052bafe659c8ab/memory.memsw.limit_in_bytes
536870912

容器的 CPU 限制

容器的 CPU 限制介绍

官方文档说明: https://docs.docker.com/config/containers/resource_constraints/

一个宿主机，有几十个核心的CPU，但是宿主机上可以同时运行成百上千个不同的进程用以处理不同的任务，多进程共用一个 CPU 的核心为可压缩资源，即一个核心的 CPU 可以通过调度而运行多个进程，但是同一个单位时间内只能有一个进程在 CPU 上运行，那么这么多的进程怎么在 CPU 上执行和调度的呢？

Linux kernel 进程的调度基于CFS(Completely Fair Scheduler)，完全公平调度

服务器资源密集型

CPU 密集型的场景: 计算密集型任务的特点是要进行大量的计算，消耗CPU 资源，比如计算圆周率、数据处理、对视频进行高清解码等等，全靠CPU 的运算能力。
IO 密集型的场景: 涉及到网络、磁盘IO 的任务都是IO 密集型任务，这类任务的特点是 CPU 消耗很少，任务的大部分时间都在等待 IO 操作完成（因为 IO 的速度远远低于 CPU 和内存的速度），比如 Web 应用，高并发，数据量大的动态网站来说，数据库应该为IO 密集型

CFS原理

cfs定义了进程调度的新模型，它给cfs_rq（cfs的run queue）中的每一个进程安排一个虚拟时钟vruntime。如果一个进程得以执行，随着时间的增长，其vruntime将不断增大。没有得到执行的进程vruntime不变, 而调度器总是选择vruntime跑得最慢的那个进程来执行。这就是所谓的“完全公平”。为了区别不同优先级的进程，优先级高的进程vruntime增长得慢，以至于它可能得到更多的运行机会。CFS的意义在于，在一个混杂着大量计算型进程和IO交互进程的系统中，CFS调度器相对其它调度器在对待IO交互进程要更加友善和公平。

配置默认的 CFS 调度程序

默认情况下，每个容器对主机的CPU周期的访问都是不受限制的。可以设置各种约束，以限制给定容器对主机CPU周期的访问。大多数用户使用并配置默认的CFS调度程序。在Docker 1.13及更高版本中，还可以配置 realtime scheduler。

CFS是用于常规Linux进程的Linux内核CPU调度程序。通过几个运行时标志,可以配置对容器拥有的CPU资源的访问量。使用这些设置时，Docker会在主机上修改容器cgroup的设置。

选项	描述
`--cpus=`	指定一个容器可以使用的最多可用 CPU 核心资源。例如，主机有 2 个 CPU，设置 `--cpus="1.5"`，则保证容器最多使用 1.5 个 CPU（可以是 4 核 CPU 上每核用一点，总计 1.5 核）。相当于设置了 `--cpu-period="100000"` 和 `--cpu-quota="150000"`。此选项较常用。
`--cpu-period=`	过时选项，指定 CPU CFS 调度程序周期，必须与 `--cpu-quota` 一起使用。默认为 100 毫秒（100000 微秒）。大多数用户不会更改默认设置。Docker 1.13 或更高版本请改用 `--cpus`。
`--cpu-quota=`	过时选项，在容器上添加 CPU CFS 配额，计算方式为 `cpu-quota / cpu-period` 的结果值。Docker 1.13 及以上版本通常使用 `--cpus` 设置此值。
`--cpuset-cpus`	用于指定容器运行的 CPU 编号，即所谓的 CPU 绑定。可以使用逗号分隔的列表（如 `0,3`）或连字号分隔的范围（如 `0-3`）。第一个 CPU 编号为 0。
`--cpu-shares`	用于设置 CFS 中调度的相对最大比例权重。这是一个软限制。例如容器 A 为 1024，容器 B 为 2048，则 B 分得的时间片将是 A 的两倍。默认值为 1024。注意：只有在 CPU 核心数被多个进程争抢时才能看到效果。

使用 Stress-ng 测试 CPU配置

范例: 查看 stress-n 关于cpu的帮助

[root@ubuntu1804 ~]#docker run -it --rm --name c1 lorel/docker-stress-ng|grep cpu
c N, --cpu N 		start N workers spinning on sqrt(rand())
	--cpu-ops N 	stop when N cpu bogo operations completed
-l P, --cpu-load P 	load CPU by P %%, 0=sleep, 100=full load (see -c)
	--cpu-method m 	specify stress cpu method m, default is all
Example: stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 10s
[root@ubuntu1804 ~]#

范例: 不限制容器CPU

[root@ubuntu1804 ~]#lscpu |grep CPU
CPU 运行模式：                           32-bit, 64-bit
CPU:                                     32
在线 CPU 列表：                          0-31
CPU 系列：                               25
CPU(s) scaling MHz:                      63%
CPU 最大 MHz：                           2501.0000
CPU 最小 MHz：                           427.8030
NUMA 节点0 CPU：                         0-31
Vulnerability Tsa:                       Mitigation; Clear CPU buffers

#占用4个CPU资源.但只是平均的使用CPU资源
[root@ubuntu1804 ~]#docker run -it --rm --name c1 lorel/docker-stress-ng --cpu 4
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 4 cpu, 4 vm

[root@ubuntu1804 ~]#docker stats --no-stream

CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
818a85e1da2f frosty_taussig 595.57% 1.037GiB / 2.908GiB
35.64% 1.12kB / 0B 0B / 0B 13

[root@ubuntu1804 ~]#cat /sys/fs/cgroup/cpuset/docker/818a85e1da2f9a4ef297178a9dc09b338b2308108195ad8d4197a1c47febcbff/cpuset.cpus
0-5
[root@ubuntu1804 ~]#top

范例: 限制使用CPU

[root@ubuntu1804 ~]#docker run -it --rm --name c1 --cpus 1.5 lorel/docker-stress-ng --cpu 4
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 4 cpu, 4 vm

[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
9f8b2e693113 busy_hodgkin 147.71% 786.8MiB / 2.908GiB
26.42% 836B / 0B 0B / 0B 13
[root@ubuntu1804 ~]#top

范例: 限制CPU

[root@ubuntu1804 ~]#docker run -it --rm --name c1 --cpu-quota 2000 --cpu-period 1000 lorel/docker-stress-ng --cpu 4
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 4 cpu, 4 vm
[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE /
LIMIT MEM % NET I/O BLOCK I/O PIDS
bd949bb6698e affectionate_chebyshev 185.03% 1.037GiB /
2.908GiB 35.64% 836B / 0B 0B / 0B 13
[root@ubuntu1804 ~]#

范例: 绑定CPU

#一般不建议绑在0号CPU上，因0号CPU一般会较忙
[root@ubuntu1804 ~]#docker run -it --rm --name c1 --cpus 1.5 --cpuset-cpus 2,4-5 lorel/docker-stress-ng --cpu 4
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 4 cpu, 4 vm

[root@ubuntu1804 ~]#ps axo pid,cmd,psr |grep stress
1964 /usr/bin/stress-ng --cpu 4 2
1996 /usr/bin/stress-ng --cpu 4 5
1997 /usr/bin/stress-ng --cpu 4 2
1998 /usr/bin/stress-ng --cpu 4 4
1999 /usr/bin/stress-ng --cpu 4 2
2002 grep --color=auto stress 1
[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
585879094e73 hungry_albattani 154.35% 1.099GiB / 2.908GiB
37.79% 906B / 0B 0B / 0B 13
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/cpuset/docker/585879094e7382d2ef700947b4454426eee7f943f8d1438fe42ce34df789227b/cpuset.cpus
2,4-5
[root@ubuntu1804 ~]#top

范例: 多个容器的CPU利用率比例

#同时开两个容器
[root@ubuntu1804 ~]#docker run -it --rm --name c1 --cpu-shares 1000 lorel/docker-stress-ng --cpu 4
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 4 cpu, 4 vm
[root@ubuntu1804 ~]#docker run -it --rm --name c2 --cpu-shares 500 lorel/docker-stress-ng --cpu 4
stress-ng: info: [1] defaulting to a 86400 second run per stressor
stress-ng: info: [1] dispatching hogs: 4 cpu, 4 vm

#注意:进程数要多于CPU的核数才能看到效果,如果两个容器使用的CPU总数不超过CPU实际的核心数，两个容器都显示400%

[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
a1d4c6e6802d c2 195.88% 925.3MiB / 2.908GiB
31.07% 726B / 0B 0B / 0B 13
d5944104aff4 c1 398.20% 1.036GiB / 2.908GiB
35.64% 906B / 0B 0B / 0B 13
[root@ubuntu1804 ~]#

#查看c1容器的cpu利用比例
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/cpu,cpuacct/docker/d5944104aff40b7b76f536c45a68cd4b98ce466a73416b68819b9643e3f49da7/cpu.shares
1000

#查看c2容器的cpu利用比例
[root@ubuntu1804 ~]#cat /sys/fs/cgroup/cpu,cpuacct/docker/a1d4c6e6802d1b846b33075f3c1e1696376009e85d9ff8756f9a8d93d3da3ca6/cpu.shares
500

#再打开新的容器，cpu分配比例会动态调整
[root@ubuntu1804 ~]#docker run -it --rm --name c3 --cpu-shares 2000 lorel/docker-stress-ng --cpu 4
[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
c2d54818e1fe c3 360.15% 664.5MiB / 2.908GiB
22.31% 726B / 0B 1.64GB / 150MB 13
a1d4c6e6802d c2 82.94% 845.2MiB / 2.908GiB
28.38% 936B / 0B 103MB / 4.54MB 13
d5944104aff4 c1 181.18% 930.1MiB / 2.908GiB
31.23% 1.12kB / 0B 303MB / 19.8MB 13

范例: 动态调整cpu shares值

[root@ubuntu1804 ~]#echo 2000 > /sys/fs/cgroup/cpu,cpuacct/docker/a1d4c6e6802d1b846b33075f3c1e1696376009e85d9ff8756f9a8d93d3da3ca6/cpu.shares

[root@ubuntu1804 ~]#docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
a1d4c6e6802d c2 389.31% 1.037GiB / 2.908GiB
35.64% 1.01kB / 0B 1.16GB / 14MB 13
d5944104aff4 c1 200.28% 1.036GiB / 2.908GiB
35.63% 1.19kB / 0B 2.66GB / 26.7MB 13
[root@ubuntu1804 ~]#