I want to use Tensorflow Dataset api to initialize my dataset using tensorflow Hub. I want to use dataset.map function to convert my text data into embedding. My Tensorflow version is 1.14.
Since I used elmo v2 modlule which converts bunch of sentences array into their word embeddings, I used the following code:
import tensorflow as tf
import tensorflow_hub as hub
sentences_array = load_sentences()
#Sentence_array=["I love Python", "python is a good PL"]
elmo = hub.Module("./ELMO")
embeddings = elmo([sentences], signature="default", as_dict=True)
dataset = tf.data.TextLineDataset(sentences_array)
dataset = dataset.apply(tf.data.experimental.map_and_batch(map_func =
I want embedding of text array like [batch_size, max_words_in_batch, embedding_size], but I got an error message as:
"NotImplementedError: Using TF-Hub module within a TensorFlow defined
function is currently not supported."
How can I get the expected results?
Unfortunately this is not supported in TensorFlow 1.x
It is, however, supported in TensorFlow 2.0 so if you can upgrade to tensorflow 2 and choose from the available text embedding modules for tf 2 (current list here) then you can use this in your dataset pipeline. Something like this:
If you are tied to 1.x or tied to Elmo (which I don't think is yet available in the new format) then the only option I can see for embedding in the preprocessing stage is to first run your dataset through a simple embedding model and save the results then use the embedded vectors for the downstream task separately. (I appreciate this is less than ideal).