我想知道 output_gradients 参数在 tensorflow 中 GradientTape 对象的梯度函数中起什么作用。根据 https://www.tensorflow.org/api_docs/python/tf/GradientTape#gra...
我想知道该 output_gradients
参数在 tensorflow 中 GradientTape 对象的梯度函数中起什么作用。根据 https://www.tensorflow.org/api_docs/python/tf/GradientTape#gradient ,此变量应包含“梯度列表,每个梯度对应目标的每个可微分元素”。其默认值为 None。
目前还不是很清楚这到底起什么作用。
当我省略参数时,该函数会根据某个函数计算雅可比矩阵,例如, z <- f(x)
我得到 dz/dx
。我认为传递 output_gradients dL/dz
(相对于损失 L)将根据链式规则计算它们的乘积,即 dL/dx = dz/dx * dL/dz
,但测试这个我得到了不同的结果。output_gradients 做了什么??文档中没有关于此的真实信息...
以下是一些虚拟代码(来自 gpflow repo):
from typing import Optional
import tensorflow as tf
import numpy as np
import gpflow
def natgrad_apply_gradients(
q_mu_grad: tf.Tensor,
q_sqrt_grad: tf.Tensor,
q_mu: gpflow.Parameter,
q_sqrt: gpflow.Parameter,
xi_transform: Optional[gpflow.optimizers.natgrad.XiTransform] = None,
) -> None:
gamma = 1
xi_transform = gpflow.optimizers.natgrad.XiNat()
dL_dmean = gpflow.base._to_constrained(q_mu_grad, q_mu.transform)
dL_dvarsqrt = gpflow.base._to_constrained(q_sqrt_grad, q_sqrt.transform)
with tf.GradientTape(persistent=True, watch_accessed_variables=False) as tape:
tape.watch([q_mu.unconstrained_variable, q_sqrt.unconstrained_variable])
eta1, eta2 = gpflow.optimizers.natgrad.meanvarsqrt_to_expectation(q_mu, q_sqrt)
meanvarsqrt = gpflow.optimizers.natgrad.expectation_to_meanvarsqrt(eta1, eta2)
dL_deta1, dL_deta2 = tape.gradient(
meanvarsqrt, [eta1, eta2], output_gradients=[dL_dmean, dL_dvarsqrt]
)
dtheta_deta1, dtheta_deta2 = tape.gradient(
meanvarsqrt, [eta1, eta2], output_gradients=None
)
return dL_deta1, dL_deta2, dtheta_deta1, dtheta_deta2
X_data = tf.ones(5)
num_latent_gps = 1
static_num_data = X_data.shape[0]
q_sqrt_unconstrained_shape = (num_latent_gps, gpflow.utilities.triangular_size(static_num_data))
num_data = gpflow.Parameter(tf.shape(X_data)[0], shape=[], dtype=tf.int32, trainable=False)
dynamic_num_data = tf.convert_to_tensor(num_data)
mu = np.array([[0.93350756], [0.15833747], [0.23830378], [0.28742445], [0.14999759]])
q_mu = gpflow.Parameter(mu, shape=(static_num_data, num_latent_gps))
q_sqrt = tf.eye(dynamic_num_data, batch_shape=[num_latent_gps])
q_sqrt = gpflow.Parameter(
q_sqrt,
transform=gpflow.utilities.triangular(),
unconstrained_shape=q_sqrt_unconstrained_shape,
constrained_shape=(num_latent_gps, static_num_data, static_num_data),
)
q_mu_grad = q_mu.unconstrained_variable * 0.33
q_sqrt_grad = q_sqrt.unconstrained_variable
dL_deta1, dL_deta2, dtheta_deta1, dtheta_deta2 = natgrad_apply_gradients(q_mu_grad, q_sqrt_grad, q_mu, q_sqrt)
dL_deta1 !== dtheta_deta1 * q_mu_grad
希望有人知道并提前致谢!