RHAAP红帽Ansible自动化平台

 2025/02/19 

本篇主要涉及理论知识，附带一些实战练习

我没有记录全部的内容，这个课多少有点逆天了，只记录我觉得有意思的知识，真的太粪了，真有神人会用ansible-navigator吗

RHAAP介绍

RHAAP（RedHat Ansible Automation Platform）红帽Ansible自动化平台，其实就是开源的Ansible

由Ansible Core和Ansible Content Collections组成

Ansible Core

提供了Ansible playbook的基本功能，比如循环、条件等

相当于Ansible阉割版本

Ansible Content Collections

为什么需要Collections？

模块太多了，导致直接装Ansible太大了
所以现在把不同的模块封装到不同的Collections中。同时还可以避免模块名相同的问题，同时更加方便与版本管理

现在在调用模块时，加入Collection前缀即可

NaviGator与Automation Exection Environment自动化执行环境

通过封装core和Collection到一个镜像内，通过运行naviGator来操作容器内的Ansible
这样容器中的Ansible环境就会非常统一

相当于是Ansible原本在系统环境里运行，而现在在容器中运行

Automation Controller自动化版本控制器Tower

提供一个图形界面与REST API，可以更方便的加入CI
与NaviGator集成

collections内容集

上面已经介绍，Collections被封装到自动化执行环境—也就是image中

ansible官方文档：Ansible Documentation

查看当前执行环境包含的内容集

可以通过ansible-navigator collections来

1	ansible-navigator collections --eei hub.lab.example.com/ee-supported-rhel8

查看内容集的文档

ansible-navigator doc redhat.insights.insights_register \
--mode stdout --eei ee-supported-rhel8:latest

在使用时，也需要指定内容集的FQCN--即完整集合名称
redhat.insights.insights_register

使用内容集完整名称

修改前
---
- name: Configure a basic web server
  hosts: servere.lab.example.com
  become: true
  tasks:
    - name: Install software
      yum:
        name:
          - httpd
          - firewalld
        state: latest

    - name: Start and enable services
      service:
        name: "{{ item }}"
        state: started
        enabled: true
      loop:
        - httpd
        - firewalld

    - name: Open access to http
      firewalld:
        service: http
        immediate: true
        permanent: true
        state: enabled

    - name: Configure simple index.html
      copy:
        content: "Hello world from {{ ansible_facts['fqdn'] }}.\n"
        dest: /var/www/html/index.html
        
修改方法(1)--修改模块为全称
---
- name: Configure a basic web server
  hosts: servere.lab.example.com
  become: true
  tasks:
    - name: Install software
      ansible.builtin.yum:
        name:
          - httpd
          - firewalld
        state: latest

    - name: Start and enable services
      ansible.builtin.service:
        name: "{{ item }}"
        state: started
        enabled: true
      loop:
        - httpd
        - firewalld

    - name: Open access to http
      ansible.posix.firewalld:
        service: http
        immediate: true
        permanent: true
        state: enabled

    - name: Configure simple index.html
      ansible.builtin.copy:
        content: "Hello world from {{ ansible_facts['fqdn'] }}.\n"
        dest: /var/www/html/index.html

修改方法(2)--加入collections字段
---
- name: Configure a basic web server
  hosts: serverf.lab.example.com
  become: true
  collections:
    - ansible.builtin
    - ansible.posix
  tasks:
...

Collections的来源与安装

来源-官方支持、社区galaxy等，可以在官网上下载tar包，也可以通过galaxy install进行本地安装

Hybrid Cloud Console

本地tar包
ansible-galaxy collection install /tmp/redhat-insights-1.0.5.tar.gz

网络下载
ansible-galaxy collection install http://www.example.com/redhat-insights-1.0.5.tar.gz

从git安装
ansible-galaxy collection install git@github.com:ansible-collections/community.mysql.git

写一个yaml列出所有需要的Collections，进行安装
创建collections/requirements.yml
ansible-galaxy collection install -r requirements.yml

ansible-galaxy默认使用galaxy.ansible.com的galaxy下载集合，可以通过ansible.cfg配置文件进行配置
[galaxy]
server_list = automation_hub, galaxy
[galaxy_server.automation_hub]
url=xxxxx
auth_url=xxxxx
token=xxxx
[galaxy_server.galaxy]
url=https://galaxy.ansible.com

使用ansible-galaxy进行安装，像是这样，指定安装到Collections文件夹下
ansible-galaxy collection install ansible.netcommon -p collections/

在ansible.cfg中添加collection路径，默认安装在第一个路径下
[defaults]
inventory = ./inventory
collections_paths = ./collextions:/usr/share/ansible/collections

ansible-galaxy collection list
# /home/student/manage-finding/collections/ansible_collections
Collection        Version
----------------- -------
ansible.netcommon 4.1.0
ansible.utils     2.8.0

# /usr/share/ansible/collections/ansible_collections
Collection               Version
------------------------ -------
redhat.rhel_system_roles 1.16.2

动态管理inventory

如果要使用ansible去管理一个云环境或虚拟化环境，大量的主机会频繁的创建或销毁，如果使用静态的inventory，就需要频繁的进行修改inventory，这非常不利于环境整体的稳定

ansible支持两种类型生成inventory

1.inventory插件
2.inventory脚本

也就是，需要通过一些脚本去获取云平台上的主机，比如获取OpenStack上的实例、或是通过qm list获取pve的虚机列表，像是这样

root@pve:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       100 1panel               running    30720            500.00 1273
       101 Centos7.9-template   stopped    4096              50.00 0
       102 almalinux9.3-template stopped    4096              50.00 0
       103 ws-k8s-master1       stopped    16384            200.00 0
       104 ws-k8s-node1         stopped    12288            200.00 0
       105 ws-k8s-master2       stopped    8192             200.00 0
       106 ws-k8s-node2         stopped    12288            200.00 0
       107 Harbor               running    4096              50.00 1406

不光是获取系统中设定的主机名，还有hostname和ip地址

并且进行格式化，在命令行获取的过程中，可能还需要认证credit

开发动态inventory脚本

需要注意的是，ansible只认识几种格式，比如yaml json ini，但yaml用脚本生成就不是很好控制缩进，所以需要将其变为json格式，像这样

   使用json格式
   {
	    "webservers": ["web1.example.com","web2.example.com"],
	    "databases": ["db1.example.com","db2.example.com"],
	    "bostion": {
		    "children": [
			    "backup",
			    "ipa"
		    ],
		    "hosts": [
			    "servera",
			    "serverb",
			    "serverc"
		    ]
	    }
    }
    
    使用yaml格式
lb_servers:
  hosts:
	  proxy.example.com
webservers:
	hosts:
		web1.example.com
		web2.example.com
backend_server_pool:
	hosts:
		appserver[a:e].example.com

Ansible最佳实践

增加可读性

不要使用折叠语法

使用标准，并且缩进、空格数量一致的yaml语法

添加步骤名称——也就是注释

---
- name: loop
  hosts: worknode
  vars:
    web_service:
    - firewalld
    - httpd
  tasks:
    - name: install
    - name: xxxx1
    - name: xxxx2

使用已存在的模块

而不是使用shell，因为大多数的shell都无法保证幂等性，而ansible在处理模块时有absent和present两种状态

就自带了幂等性的特性

统一风格

团队在维护大项目时，应当所有人都保持一种风格

比如统一变量命名规则，缩进等

合理的项目结构

这对于大型ansible项目来说尤为关键

下面一个目录例子：

.
├── dbservers.yml
├── inventories
│   ├── prod            # 生产环境
│   │   ├── group_vars  # 针对主机组的变量
│   │   ├── host_vars   # 针对单个主机的变量
│   │   └── inventory   # 资产列表
│   └── stage           # 测试环境
│       ├── group_vars
│       ├── host_vars
│       └── inventory
├── roles               # 角色目录
├── site.yml            # 主剧本文件，通过这个文件调用同级的yml和roles
├── storage.yml
└── webservers.yml

除此之外还可以通过一些条件判断，来引用一些role，像是

- name: Configure test hosts
  hosts: test
  become: yes
  tasks:
    - name: Include Nginx role if the host is CentOS
      include_role:
        name: nginx
      when: ansible_distribution == "CentOS"

小型项目目录结构可以使用这个例子：

[root@ansible ansible]# tree
.
├── ansible.cfg
├── inventory
├── playbook
│   └── centos_nginx_install.yml
└── roles
    └── nginx
        ├── files
        │   └── index.html
        ├── handlers
        │   └── main.yml
        ├── tasks
        │   └── main.yml
        ├── templates
        │   └── nginx.conf.j2
        └── vars
            └── main.yml

cat ansible.cfg
[defaults]
inventory = /root/inventory
#使用root进行ssh连接
remote_user = root
#ssh连接时不提示输入密码
ask_pass = false
host_key_checking = false
roles_path = ./roles
inventory = ./inventory

[privilege_escalation]
#如果是remote_user是root，就不需要提权
#是否需要提权
become = false
#提权方式
become_method = sudo
#提权到root用户
become_user = root
#用sudo时不需要输入密码
become_ask_pass = false

cat playbook/centos_nginx_install.yml
 - name: install_nginx
   hosts: centos:other # 指定他俩的交集
   become: yes
   roles:
   - nginx

cat roles/nginx/files/index.html
test

cat roles/nginx/tasks/main.yml
    - name: 关闭nginx进程
      shell: pkill nginx
      ignore_errors: yes
    - name: 清理用户
      user:
        name: www
        state: absent
        remove: yes
    - name: 清理文件
      file:
        path:
            - /usr/local/nginx
            - /tmp/nginx-1.18.0.tar.gz
            - /tmp/nginx-1.18.0
        state: absent
    - name: 安装依赖
      yum:
        name:
            - "@Development Tools"
            - openssl-devel.x86_64
            - epel-release
            - pcre
            - pcre-devel
        state: present
    - name: 创建专用用户
      user:
        name: www
        shell: /sbin/nologin
        create_home: no
        state: present
    - name: 下载nignx包
      get_url:
        url: https://nginx.org/download/nginx-1.18.0.tar.gz
        dest: /tmp/nginx-1.18.0.tar.gz
        mode: '0644'
    - name: 解压nignx包
      unarchive:
        src: /tmp/nginx-1.18.0.tar.gz
        dest: /tmp/
        remote_src: yes
        owner: root
        group: root
        mode: '0755'
    - name: 编译安装nginx
      shell: |
        cd /tmp/nginx-1.18.0 && \
        ./configure \
        --user=www \
        --group=www \
        --prefix=/usr/local/nginx \
        --sbin-path=/usr/local/nginx/sbin/nginx \
        --conf-path=/usr/local/nginx/conf/nginx.conf \
        --error-log-path=/usr/local/nginx/logs/error.log \
        --http-log-path=/usr/local/nginx/logs/access.log \
        --pid-path=/var/run/nginx.pid \
        --lock-path=/var/lock/subsys/nginx \
        --with-http_stub_status_module \
        --with-http_ssl_module \
        --with-http_gzip_static_module \
        --with-pcre && \
        make && \
        make install
    - name: copy nginx.conf
      template:
        src: nginx.conf.j2
        dest: /usr/local/nginx/conf/nginx.conf
    - name: 测试启动
      shell: "/usr/local/nginx/sbin/nginx"
    - shell: "ss -tunlp | grep -w 80"
    - debug:
        msg: "nginx启动成功"

cat roles/nginx/templates/nginx.conf.j2
#user  nobody;
worker_processes  {{ worker_connections }};
...

cat roles/nginx/vars/main.yml
worker_connections: 2

变量文件结构设计

理想情况下只有项目目录下的一个变量文件，每次修改只需要修改一个变量文件再运行即可

存放变量的位置

1.roles中defaults/main.yml和vars/main.yml
2.inventory文件中主机和主机组变量
3.group_vars和hosts_vars下的变量
4.play、role和task中临时定义

变量定义的原则

1.简单
2.不重复（容易被优先级高的覆盖）
3.有组织—风格等

变量合并与变量优先级

实际工作中，如果项目结构设计很烂，就会出现需要考虑变量优先级的情况

因为如果设计了不同的地方定义相同的变量，每次运行还要考虑哪个生效，那就会很蛋疼

从低到高优先级：
1.inventory主机组变量
2.inventory目录中groups_vars/all中主机组变量
3.palybook目录中groups_vars/all中主机组变量
4.inventory目录中groups_vars中主机组变量
5.playbook目录中groups_vars中主机组变量
6.inventory主机变量
7.inventory目录中host_vars中变量
8.playbook目录中host_vars中变量
9.facts变量和缓存的facts变量

play中变量的优先级从低到高

1.play层面vars变量
2.play中vars_prompt提升输入变量
3.vars_files中的变量
4.vars/main.yml中的变量
5.block中的变量
6.task中vars的变量
7.include_vars模块的变量
8.set_facts模块的变量
9.include_role中的变量
10.include_tasks中的变量

不建议在inventory中定义变量，最科学的是全部都在group_vars和host_vars中定义，像是这样

[root@1panel ~]# tree ./group_vars/
./group_vars/
├── db_servers.yml
├── lb_servers.yml
└── web_servers.yml

[root@1panel ~]# !tree
tree ./group_vars/
./group_vars/
├── db_servers
│   └── 1
├── lb_servers
│   └── 2
└── web_servers
    └── 123

ansible运行顺序

1.pre_tasks
2.pre_tasks的handlers
3.roles
4.tasks
5.roles和tasks的handlers
6.post_tasks
7.port_tasks的handlers

通过include_roles字段，可以将roles放在tasks中执行

通过meta字段，也可以手动控制handlers在tasks中执行（强制执行）

通过listen字段，可以监听handlers，使多个tasks可以触发同一主题

handlers:
  - name: Restart Service
    listen: service_restarted --监听的主题
    service:
      name: myservice
      state: restarted
 
 tasks:
  - name: Update Configuration File
    copy:
      src: new_config.conf
      dest: /etc/myservice/config.conf
    notify: service_restarted --触发主题
    
 使用listen可以很好的将task和handlers解耦，同时提高可读性

Filter过滤器转换数据

ansible可以使用jinja2模板，将变量渲染到playbook中

此外，j2模板还支持过滤器，过滤器用于处理在剧本和模板中的变量值

filter的本质其实是函数，也可以自己开发python脚本作为filter来处理数据，如

{{ ( ansible_facts['data_time']['hour'] | int ) + 1 }}
ansible_facts['data_time']['hour']本身是个字符串，int将其变为整数

{{ 1764 | root }}跟求

操作列表:
{{ [ 2,4,5,7,8,10 ] | sum }}  求和
{{ [1,1,2,2,3,3,4,4 ] | unique | list }} 去重
union合集
intersect交集
difference差集
{{ [2,4,6,8,10] | difference([2,4,6,16]) }}

操作字典:
combine组合字典
{{ {'A':1,'B':2} | combine({'B':4,'C':5}) }}
dict2items转换为列表

hash转换为哈希
b64encode编码和解码

支持字符串格式化、正则搜索

提取信息并转换为json
community.general.json_query
to_json

过滤map，直接找一个字段，类似于grep
{{ webapp_find_files['files'] | map(attribute='path' ) | list }}
{{ webapp_deployed_files | map('relpath', webapp_content_root_dir) | list }}

使用过滤器收集网络信息

{{ '192.0.2.1/24' | ansible.netcommon.ipaddr('address') }}
掩码255.255.255.0

{{ '192.0.2.1/24' | ansible.netcommon.ipaddr('prefix') }}
掩码24

{{ '192.0.2.1/24' | ansible.netcommon.ipaddr('broadcast') }}
广播地址192.0.2.255

{{ '192.0.2.1/24' | ansible.netcommon.ipaddr('network') }}
网段192.0.2.0

lookup插件

Ansible的Lookup插件是一种用于在Playbook执行期间从外部源动态获取数据的机制。它在控制节点上运行，允许用户灵活地检索变量或配置信息，从而增强Playbook的动态处理能力。

核心功能

动态数据获取：从文件、数据库、API、环境变量等外部源实时获取数据。
控制节点执行：在Ansible控制节点而非目标主机上执行，确保本地资源访问。
灵活应用：可在任务参数、变量定义或模板中使用，支持多种数据源集成。

常见例子

获取域名列表

1 2	vars: hosts: "{{ lookup('file','/etc/hosts','/etc/issue')}}"

注入公钥

- name: Add authorized keys
	ansible.posix.authorized_key:
		user: "{{ item }}"
		key: "{{ lookup('file', item + .'.key.pub') }}"
	loop: "{{ user }}"

从k8s的API中获取信息

1	{{ lookup('kubernetes.core.k8s', kind='Deployment', namespace='ns', resource_name='my_res') }}

使用循环：with与loop

with_xxx与loop其实都是用来做循环的

with是早期版本的循环实现方式，基于looku插件，例如with_items,with_file等

loop从Ansible 2.5开始引入，目标是替代 with，提供更统一、高效的循环语法

数据类型	使用 `loop`	*使用 `with_`**	适用场景
简单列表	`loop: "{{ list }}"`	`with_items: "{{ list }}"`	直接迭代
字典	`loop + dict2items`	`with_dict`	键值对迭代
文件内容	`loop + query('lines', 'file.txt')`	`with_lines: "cat file.txt"`	逐行读取文件
嵌套结构	`loop` + 索引访问（`item.0`）	`with_nested`	多级嵌套迭代
动态生成数据	`loop + range` 或生成器函数	`with_sequence` / `with_random`	数字序列、随机值等
注册变量结果	`loop: "{{ result.stdout_lines }}"`	`with_items: "{{ result.stdout }}"`	处理命令输出

委派、并行与滚动更新

委托

正常情况下，ansible会在被管理主机上执行任务

但是，有时候可能需要某个任务不在目标主机上执行，而是在另一台特定的主机上运行。这时候就需要用到delegate_to了。比如，当需要修改负载均衡器配置，或者向监控系统发送通知时，可能需要在一个中心管理节点上执行任务，而不是在所有的目标主机上。

就可以通过delegate_to字段将操作委托给另一台主机

比如

haproxy模块，通过socket操作web服务器状态，就需要把任务放在web_server中运行

- name: Remove the server from HAProxy
	haproxy:
		state: disabled
		host: "{{ ansible_faces['fqdn'] }}“
		socket: /var/lib/haproxy/stats
	delegate_to: "{{ item }}"
	loop: "{{ groups['lbservers']}}"

并行

并行有两个字段可以选择，forks和serial

特性	`forks`	`serial`
作用范围	全局（所有Play）	单个Play内部
主要目的	提升执行速度（并行）	控制流程顺序（分批次）
优先级	低（`serial`会覆盖`forks`）	高（直接影响批次逻辑）
配置位置	`ansible.cfg` 或命令行	Play的YAML定义
错误处理	不影响其他主机执行	批次失败则停止后续批次

**forks**：强调“并行”，用于全局加速。
**serial**：强调“顺序”，用于分批次控制流程。

根据需求选择：需要快速执行用forks，需滚动更新或避免中断用serial，二者可结合使用。

---
- name: Rolling update
  hosts: webservers
  serial: 2
  tasks:
  - name: install apache
	  yum:
		  name: httpd
		  state: latest
		notify: restart apache

管理滚动更新

1.回滚受影响批次的主机配置

2.隔离受影响的主机以启用对失败部署的分析：
失败的主机会被删除，其余主机继续运行

3.发送部署通知

serial关键字
max_fail_percentage关键字——指定最多失败的百分比，如果超过就play失败
run_once关键字——一次性任务

#设置固定的批量大小
serial:2

#设置百分比批量大小
serial: 25%

#设置可以在play执行期间更改的批量大小
serial:
- 1
- 10%
- 25%

最多容忍30%失败
max_fail_percentage: 30%

一次性任务：
- name: Reactive Hosts
	shell: /sbin/activate.sh {{ active_host_string }}
	run_once: yes
	delegate_to: monitor.example.com
	vars:
		active_hosts_string: "{{ ansible_play_batch | join{' '} }}"

如果serial关键字和max_fail_percentage值没有定义，所有的host都是在一个批次中完成的。
如果所有主机都失败，则play失败。

如果定义了serial关键字，那么主机将在多个批次中运行，
如果任何一个批次中的所有主机都失败，则该play将失败。

如果定义了max_fail_percentage关键字，则如果批处理中失败的主机超过该百分比，则play失败。
如果play失败，Ansible将终止剧本中所有剩余的play。

原文作者：王盛

原文链接：https://akemi.zj.cn/2025/02/19/RHCA-RHAAP/

发表日期：February 19th 2025, 7:44:08 pm

更新日期：February 20th 2025, 6:37:27 pm

Next Post

使用rclone进行对象存储迁移-R2→OSS
Previous Post

基于ECS的网站访问速度与nginx优化

CATALOG

9. 使用循环：with与loop

10. 委派、并行与滚动更新

10.1. 委托
10.2. 并行
10.3. 管理滚动更新