8wDlpd.png
8wDFp9.png
8wDEOx.png
8wDMfH.png
8wDKte.png

使用单个标记和二元标记进行语料库预处理的最佳方法?

b w 2月前

40 0

我想知道是否有关于解决这个问题的最聪明方法的一般建议。我正在使用 word2vec 来确定规范之间的相似度分数(这是我感兴趣的最终输出)...

我想知道是否有关于如何以最明智的方式解决这个问题的一般建议。

我使用 word2vec 来确定特定单词之间的相似度得分(这是我感兴趣的最终输出)——其中一些是单个标记,但其他应该是二元组。为了使事情复杂化,我使用了 tensorflow(为了学习如何使用 tensorflow)。

我想保留在单独列表中找到的二元词组:

Bigram_list = ["northern lights", "cloud cover", "table leg",...]

目前,该过程看起来应该是这样的:

  1. 识别语料库中的二元词组(使用 nltk 搭配)
  2. 创造 identified_bigrams_list = ["northern lights", "cloud cover", "banana peel",...]
  3. 搜索 identified_bigrams_list 匹配项 Bigram_list
  4. 问题 :用 \'_\' 替换语料库中的匹配项,例如 \'northern_light\'、\'cloud_cover\'。我尝试使用 Bigram_list 的词典(例如 "northern lights": "northern_lights" )。所以我试图将它放回语料库中,这样它将被视为单个标记并作为单个嵌入进行处理

即使我能让它工作,但这在计算上似乎效率低下,特别是当我转向更大的语料库进行实际训练时(目前使用一个很小的语料库才能让它工作)。

有什么建议吗?

帖子版权声明 1、本帖标题:使用单个标记和二元标记进行语料库预处理的最佳方法?
    本站网址:http://xjnalaquan.com/
2、本网站的资源部分来源于网络,如有侵权,请联系站长进行删除处理。
3、会员发帖仅代表会员个人观点,并不代表本站赞同其观点和对其真实性负责。
4、本站一律禁止以任何方式发布或转载任何违法的相关信息,访客发现请向站长举报
5、站长邮箱:yeweds@126.com 除非注明,本帖由b w在本站《tensorflow》版块原创发布, 转载请注明出处!
最新回复 (0)
  • 我正在尝试使用贝叶斯神经网络,向神经网络模型添加贝叶斯层。这是我使用的代码,来自 keras 网站。import numpy as npimport tensorflow as t...

    我正在尝试使用贝叶斯神经网络,向神经网络模型添加贝叶斯层。是我使用的代码,来自 keras 网站。

    import numpy as np
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers
    import tensorflow_datasets as tfds
    import tensorflow_probability as tfp
    
    
    
    dataset_size = 4898
    batch_size = 256
    train_size = int(dataset_size * 0.85)
    
    # Create training and evaluation datasets
    def get_train_and_test_splits(train_size, batch_size=1):
        # We prefetch with a buffer the same size as the dataset because th dataset
        # is very small and fits into memory.
        dataset = (
            tfds.load(name="wine_quality", as_supervised=True, split="train")
            .map(lambda x, y: (x, tf.cast(y, tf.float32)))
            .prefetch(buffer_size=dataset_size)
            .cache()
        )
        # We shuffle with a buffer the same size as the dataset.
        train_dataset = (
            dataset.take(train_size).shuffle(buffer_size=train_size).batch(batch_size)
        )
        test_dataset = dataset.skip(train_size).batch(batch_size)
    
        return train_dataset, test_dataset
    
    # get train and test data
    train_dataset, test_dataset = get_train_and_test_splits(train_size, batch_size)
    
    
    # Compile, train, and evaluate the model
    hidden_units = [8, 8]
    learning_rate = 0.001
    num_epochs = 100
    mse_loss = keras.losses.MeanSquaredError()
    
    
    def run_experiment(model, loss, train_dataset, test_dataset):
    
        model.compile(
            optimizer=keras.optimizers.RMSprop(learning_rate=learning_rate),
            loss=loss,
            metrics=[keras.metrics.RootMeanSquaredError()],
        )
    
        print("Start training the model...")
        model.fit(train_dataset, epochs=num_epochs, validation_data=test_dataset)
        print("Model training finished.")
        _, rmse = model.evaluate(train_dataset, verbose=0)
        print(f"Train RMSE: {round(rmse, 3)}")
    
        print("Evaluating model performance...")
        _, rmse = model.evaluate(test_dataset, verbose=0)
        print(f"Test RMSE: {round(rmse, 3)}")
    
    
    # Create model inputs
    FEATURE_NAMES = [
        "fixed acidity",
        "volatile acidity",
        "citric acid",
        "residual sugar",
        "chlorides",
        "free sulfur dioxide",
        "total sulfur dioxide",
        "density",
        "pH",
        "sulphates",
        "alcohol",
    ]
    
    
    def create_model_inputs():
        inputs = {}
        for feature_name in FEATURE_NAMES:
            inputs[feature_name] = layers.Input(
                name=feature_name, shape=(1,), dtype=tf.float32
            )
        return inputs
    
    # Define the prior weight distribution as Normal of mean=0 and stddev=1.
    # Note that, in this example, the we prior distribution is not trainable,
    # as we fix its parameters.
    def prior(kernel_size, bias_size, dtype=None):
        n = kernel_size + bias_size
        prior_model = keras.Sequential(
            [
                tfp.layers.DistributionLambda(
                    lambda t: tfp.distributions.MultivariateNormalDiag(
                        loc=tf.zeros(n), scale_diag=tf.ones(n)
                    )
                )
            ]
        )
        return prior_model
    
    
    # Define variational posterior weight distribution as multivariate Gaussian.
    # Note that the learnable parameters for this distribution are the means,
    # variances, and covariances.
    def posterior(kernel_size, bias_size, dtype=None):
        n = kernel_size + bias_size
        posterior_model = keras.Sequential(
            [
                tfp.layers.VariableLayer(
                    tfp.layers.MultivariateNormalTriL.params_size(n), dtype=dtype
                ),
                tfp.layers.MultivariateNormalTriL(n),
            ]
        )
        return posterior_model
    
    def create_bnn_model(train_size):
        inputs = create_model_inputs()
        features = keras.layers.concatenate(list(inputs.values()))
        features = layers.BatchNormalization()(features)
    
        # Create hidden layers with weight uncertainty using the DenseVariational layer.
        for units in hidden_units:
            features = tfp.layers.DenseVariational(
                units=units,
                make_prior_fn=prior,
                make_posterior_fn=posterior,
                kl_weight=1 / train_size,
                activation="sigmoid",
            )(features)
    
        # The output is deterministic: a single point estimate.
        outputs = layers.Dense(units=1)(features)
        model = keras.Model(inputs=inputs, outputs=outputs)
        return model
    
    num_epochs = 500
    train_sample_size = int(train_size * 0.3)
    small_train_dataset = train_dataset.unbatch().take(train_sample_size).batch(batch_size)
    
    bnn_model_small = create_bnn_model(train_sample_size)
    run_experiment(bnn_model_small, mse_loss, small_train_dataset, test_dataset)
    
    sample = 10
    examples, targets = list(test_dataset.unbatch().shuffle(batch_size * 10).batch(sample))[
        0
    ]
    def compute_predictions(model, iterations=100):
        predicted = []
        for _ in range(iterations):
            predicted.append(model(examples).numpy())
        predicted = np.concatenate(predicted, axis=1)
    
        prediction_mean = np.mean(predicted, axis=1).tolist()
        prediction_min = np.min(predicted, axis=1).tolist()
        prediction_max = np.max(predicted, axis=1).tolist()
        prediction_range = (np.max(predicted, axis=1) - np.min(predicted, axis=1)).tolist()
    
        for idx in range(sample):
            print(
                f"Predictions mean: {round(prediction_mean[idx], 2)}, "
                f"min: {round(prediction_min[idx], 2)}, "
                f"max: {round(prediction_max[idx], 2)}, "
                f"range: {round(prediction_range[idx], 2)} - "
                f"Actual: {targets[idx]}"
            )
    
    
    compute_predictions(bnn_model_small)
    
    num_epochs = 500
    bnn_model_full = create_bnn_model(train_size)
    run_experiment(bnn_model_full, mse_loss, train_dataset, test_dataset)
    
    compute_predictions(bnn_model_full)
    

    但是我收到此错误:

    File "/Users/S/Documents/B/Prediction/test1.py", line 148, in <module>
        bnn_model_small = create_bnn_model(train_sample_size)
      File "/Users/S/Documents/B/Prediction/test1.py", line 131, in create_bnn_model
        features = tfp.layers.DenseVariational(
      File "/Users/S/.local/share/virtualenvs/B-hz56sUDM/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "/Users/S/.local/share/virtualenvs/B-hz56sUDM/lib/python3.9/site-packages/tf_keras/src/engine/input_spec.py", line 251, in assert_input_compatibility
        ndim = x.shape.rank
    AttributeError: 'tuple' object has no attribute 'rank'
    

    我尝试使用 from tf_agents.environments import tf_py_environment environment = tf_py_environment.TFPyEnvironment(environment) 此处 推荐的方法 。问题是它需要 numpy<1.20,这导致与其他库发生许多冲突。

  • 我的 TensorFlow 图神经网络有一个损失函数,如下所示。它根据输入的密钥计算损失。由于 gnn 以字节格式存储密钥,我需要对其进行评估以

    我的 TensorFlow 图神经网络有一个损失函数,如下所示。它根据输入键计算损失。由于 gnn 以字节格式存储键,我需要将其评估为字符串以对特征进行哈希处理。但由于禁用了 Eager Execution,它失败了。

    #! /usr/bin/python
    import tensorflow as tf
    
    tf.config.run_functions_eagerly(False) # numpy() works when eager is enabled
    
    @tf.function
    def loss_fn(d):
        return tf.reduce_mean(d['features'][d['key'][0].numpy().decode('utf-8')])
    
    d = {'key': tf.constant(['A01', 'B01']),
         'features': {'A01': [0.1,0.2],
                      'B01': [0.3,0.4]}}
    loss=loss_fn(d)
    
    print(loss)
    
    
    Traceback (most recent call last):
      File "/home/mikehuang/programs/test.py", line 13, in <module>
        loss=loss_fn(d)
             ^^^^^^^^^^
      File "/home/mikehuang/.local/lib/python3.12/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "/tmp/__autograph_generated_filezbap1vrc.py", line 12, in tf__loss_fn
        retval_ = ag__.converted_call(ag__.ld(tf).reduce_mean, (ag__.ld(d)['features'][ag__.converted_call(ag__.converted_call(ag__.ld(d)['key'][0].numpy, (), None, fscope).decode, ('utf-8',), None, fscope)],), None, fscope)
        ^^^^^
    AttributeError: in user code:
    
        File "/home/mikehuang/programs/test.py", line 8, in loss_fn  *
            return tf.reduce_mean(d['features'][d['key'][0].numpy().decode('utf-8')])
    
        AttributeError: 'SymbolicTensor' object has no attribute 'numpy'
    

    如何根据输入键访问该功能?

  • 导出默认函数 Journal() { const myImg = new Image() myImg.src = \'src/images/my_japan_trip.jpg\' const journal = { title: \'我的日本之旅\', image: myImg.src...

    export default function Journal() {
    
        const myImg = new Image()
        myImg.src = "src/images/my_japan_trip.jpg"
        
        const journal = {
            title: "My Japan trip",
            image: myImg.src,
            text: "I went to Japan last winter."
        }
        return (
            <div id="journal">
    
                <div>
                    <h2>
                        {journal.title}
                    </h2>
                </div>
    
                <div>
                    <img src={journal.image} />
                </div>
    
                <div>
                    <p>
                        {journal.text}
                    </p>
                </div>
                
            </div>
        )
    }

    以下是我尝试制作的日记帐分录元素的屏幕截图 以下是我用来制作日记帐分录元素的代码的屏幕截图

    我正在开发一个单页 React 应用程序,并制作了一个用于制作日记帐分录的模块。该模块有一个名为的组件 Journal 。我有一个变量,它使用 Image 构造函数来创建一个新的图像元素,然后我为它的 src 属性分配一个值,该值等于位于我的项目资产内的图像文件路径。创建图像变量后,我为日记帐分录创建了一个对象。它具有标题、图像和文本属性。return 语句有 jsx,用于制作日记帐分录。对象的标题和文本值出现在网页上,但我无法使用 img 元素在网页上显示图像。目前它显示的是损坏的 img 图标。我只是将 img 复制并粘贴到 myImg 变量的 src 属性的值中。有人可以帮我找到如何显示图像吗?谢谢。

  • 我想知道你是否应该尝试 /path_to_src/src/images/my_japan_trip.jpg (注意开头的斜杠)。或者使用相对路径 ../../images/my_japan_trip.jpg .

  • 如果不知道你的文件结构,很难判断源是否错误,但是尝试

    myImg.src = "./src/images/my_japan_trip.jpg"

    可能对你有用

返回
作者最近主题: