overlay2 use xfs filesystem cause system hang

日志报错

  • 报错信息
    1
    2
    3
    4
    5
    6
    7
    
    [166973.065674] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250)
    [166974.848634] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250)
    [166976.857584] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250)
    [166978.697604] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250)
    [166980.524526] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250)
    [166982.529419] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250)
    [166984.534372] XFS: runc:[1:CHILD](13230) possible memory allocation deadlock in kmem_zone_alloc (mode:0x8250)
    

排查问题

  • 排查
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    https://access.redhat.com/solutions/532663
    决议
    This is a long standing issue with xfs and highly fragmented files.
    Our engineering team is working on a long term resolution for this issue.
    Workarounds
    There are several solutions that can be used to avoid high file fragmentation:
    Preallocate the space to be used by the file with unwritten extents. This gives the allocator the opportunity to allocate the whole file in one go and use the least amount of extents. As the blocks are written they will break up the unwritten extents into written/unwritten space and when all of the unwritten space has been converted the extent map will match the original optimal preallocated state.
      
    Use the extent size hint feature of XFS. This feature tells the allocator to allocate more space than may be needed by the current write request so that a minimum extent size is used. The extent will initially be allocated as an unwritten extent and will be converted as the individual blocks within the extent are written. As with preallocated files, when the entire extent has been written the extent size will match the original unwritten extent. The extent size hint feature can be set on a file or directory with this command:
    Raw
    $ xfs_io -c "extsize <extent size>" <dir or file>
      
    If set on a directory then all files created within that directory after the hint is set will inherit the feature. You cannot set the hint on files that already have extents allocated. If it is not possible to modify the application then this is the suggested option to use.
      
    Use asynchronous buffered I/O. This will offer the chance to have many logically consecutive pages build up in the cache before being written out. Extents can then be allocated for the entire range of outstanding pages instead of each page individually. This will not only reduce fragmentation but means less I/Os need to be issued to the storage device.
      
    Avoid writing the file in a random order. If blocks can be coalesced within the application before being written out using direct I/O then there's a chance the file can be written sequentially which the allocator can use to allocate extents contiguously.
      
    Use xfs_fsr to defragment individual large files. Note xfs_fsr is unable to defragment files that are currently in use, using the -v option is recommended to report on any issues that prevent defragmentation.
    Raw
    xfs_fsr -v /path/to/large/file
    

临时解决办法

  • 清空cache

    1
    2
    3
    
    echo 1 > /proc/sys/vm/drop_caches
    echo 2 > /proc/sys/vm/drop_caches
    echo 3 > /proc/sys/vm/drop_caches
    
  • 碎片整理

    1
    2
    
    xfs_db -c frag -r <dev>
    xfs_fsr -v <dev>
    

调整系统参数

  • 参考:http://www.cnblogs.com/itfriend/archive/2011/12/14/2287160.html

  • Linux 提供了这样一个参数min_free_kbytes,用来确定系统开始回收内存的阀值,控制系统的空闲内存。值越高,内核越早开始回收内存,空闲内存越高。

  • 设置/proc/sys/vm/min_free_kbytes的值为4G bytes

    1
    
    echo 4194304 > /proc/sys/vm/min_free_kbytes
    

升级内核

1
2
3
yum provides kernel
yum install -y kernel-3.10.0-1062.9.1.el7.x86_64
```〔拼音〕