通过 Helm 使用自定义 Dockerfile 在本地部署 Airflow-ubuntu-IT问答社区-解决你的IT疑问

通过 Helm 使用自定义 Dockerfile 在本地部署 Airflow

JaredPar 2月前

我在尝试使用带有自定义 Dockerfile 的 Helm 图表在本地部署 Apache Airflow 时遇到了困难。以下是我的问题的详细描述：上下文环境：本地 Kubernetes 集群...

我在尝试使用带有自定义 Dockerfile 的 Helm 图表在本地部署 Apache Airflow 时遇到困难。以下是我的问题的详细描述：

上下文环境：本地 Kubernetes 集群（Kind Cluster）编排工具：Helm chartsAirflow 执行器：KubernetesExecutor自定义：使用自定义 Dockerfile 进行 Airflow

1）我创建了一个dockerfile：

FROM apache/airflow:2.8.0

USER root



USER airflow


RUN pip install apache-airflow-providers-apache-spark

2）之后，进行构建：

sudo docker build -t my-custom/airflow:latest .

3）加载到类型：

sudo kind load docker-image my-custom/airflow:latest

4）应用values.yaml：

helm install airflow apache-airflow/airflow -f values.yaml

我的values.yaml：

executor: KubernetesExecutor

airflow:
  image:
    repository: my-custom/airflow
    tag: latest
    pullPolicy: IfNotPresent
    pullSecret: ""
    uid: 50000
    gid: 0

config:
  core:
    load_examples: 'False'
    load_default_connections: 'False'
  webserver:
    expose_config: 'False'
  logging:
    remote_logging: 'True'
    remote_log_conn_id: "google_cloud_default"
    remote_base_log_folder: "gs://my-gcs-bucket"
        
        
dags:
  gitSync:
    enabled: true
    repo: #my external repo
    branch: main
    rev: HEAD
    subPath: dags
    depth: 1
    wait: 60

scheduler:
  replicas: 1

web:
  replicas: 1
  service:
    type: LoadBalancer
  resources:
    requests:
      cpu: 500m
      memory: 7Gi
    limits:
      cpu: 500m
      memory: 16Gi
  initContainers:
    - name: wait-for-scheduler
      image: busybox
      command: ['sh', '-c', 'until nslookup airflow-scheduler; do echo waiting for scheduler; sleep 2; done;']
  livenessProbe:
    initialDelaySeconds: 1800
    periodSeconds: 20
    timeoutSeconds: 5
    failureThreshold: 5
  readinessProbe:
    initialDelaySeconds: 1800
    periodSeconds: 20
    timeoutSeconds: 5
    failureThreshold: 5
  startupProbe:
    initialDelaySeconds: 300
    periodSeconds: 30
    timeoutSeconds: 5
    failureThreshold: 10

postgresql:
  enabled: false

data:
  metadataConnection:
    protocol: postgres
    host: 192.168.15.8
    port: 5432
    db: airflow
    user: postgres
    pass: airflow

flower:
  enabled: false

redis:
  enabled: false

triggerer:
  enabled: true

ingress:
  enabled: false

它安装成功，但是日志显示 docker 镜像中指定的 airflow 版本不同，并且访问 airflow UI 时未安装 spark 模块。

NAME: airflow
LAST DEPLOYED: Thu Aug  1 17:33:21 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
**Thank you for installing Apache Airflow 2.9.3!**

Your release is named airflow.
You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser:

有没有人遇到过类似的问题，或者知道是什么原因导致了这种行为？任何关于如何调试和解决此问题的提示都将不胜感激。

预先感谢您的帮助！

kind

帖子版权声明 1、本帖标题：通过 Helm 使用自定义 Dockerfile 在本地部署 Airflow
本站网址：http://xjnalaquan.com/
2、本网站的资源部分来源于网络，如有侵权，请联系站长进行删除处理。
3、会员发帖仅代表会员个人观点，并不代表本站赞同其观点和对其真实性负责。
4、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
5、站长邮箱：yeweds@126.com 除非注明，本帖由JaredPar在本站《ubuntu》版块原创发布，转载请注明出处！

最新回复 (0)

最新倒序只看楼主

user9437856 2月前 0 只看Ta

引用 2楼
我正在尝试为存储在 .csv 文件中的不同 \'ID\' 创建目录（它们存储在 \'ID\' 列下）。但是，snakemake 似乎无法检测到...

我正在尝试为存储在 .csv 文件中的不同 \'ID\' 创建目录（它们存储在 \'ID\' 列下）。但是，snakemake 似乎无法在运行输出中检测到 \'ID\'。
```
rule make_directories:
    input:"exptXXXXX_metadata.csv"
    output:
        directory(expand("tissues/{id}", id = IDs))
    run:
        import pandas as pd
        df =  pd.read_csv('exptXXXXX_metadata.csv')

        ##unique IDs which I want to make different directories of under the parent_dir "tissues"
        IDs =  set(df['ID'])

        for i in IDs:
            f = output[id]
            shell("mkdir {f}")
```
我尝试了 snakemake 文档中的不同建议： https://snakemake.readthedocs.io/en/stable/project_info/faq.html#how-do-i-access-elements-of-input-or-output-by-a-variable-index
mg tint 2月前 0 只看Ta

引用 3楼

Snakemake 按照以下顺序解决所有作业：输出，然后输入，然后运行（或 shell 或脚本）。

因此，您不能引用输出规则的运行中计算的

我的猜测是，您想将读取 CSV 文件并获取 ID 列表的代码完全放在规则之外。然后该 IDs 规则部分 output: 中使用