Tensorflow-hub Text-Module Preprocessing

I'm playing around with the new Modules which are available on the tensorflow-hub (which I really like - thanks for that).

Whats unclear to me, is the preprocessing which should take place when feeding a sentence. The module documentation says, that in the preprocessing step the inputj sentences gets splitted at the spaces.

However, when I run the following program, I only get a single vector:

with tf.device("/cpu:0"):
  embed = hub.Module("https://tfhub.dev/google/nnlm-en-dim128/1")

global_step1 = tf.train.get_or_create_global_step()
with tf.device("/cpu:0"):
  embeddings = embed({"default": ["Cat sat on mat"]})

with tf.train.MonitoredTrainingSession(is_chief=True) as sess:
  message_embeddings_cat = sess.run(embeddings)
  print(message_embeddings_cat.shape) # (result: (1, 128))

How do I get the embeddings for each word, and what does the single vector represents? A fixed-dimensional representation of the sentence, the Unknown-Word embedding or something else?

Thanks in advance!

Edit: It seems the result is a combined embedding created with tf.nn.embedding_lookup_sparse. (Thanks for the confirmation @svsgoogle)


Yes, the output represents a fixed-dimensional representation of the entire sentence.

You can also embed single words to get their vectors. In your case:

embeddings = embed({"default": ["Cat", "sat", "on", "mat"]})

Should give you a result with shape (4, 128).

Posted on by svsgoogle