Reuse Jupyter Notebooks

Use Jupyter Notebooks as "Objects"

Jupyter Notebooks are great for data exploration, visualizing, documenting, prototyping and interacting with the code, but when it comes to creating a program out of a notebook or using the notebook in porduction they fall short. I often get myself copying cells from a notebook into a script so that I can run the code with command line arguments. There is no easy way to run a notebook and return a result from its execution, can't passing arguments to a notebook or running individual code cells programmatically. Have you ever wrapped a code cell to a function just so you want to call it in a loop with different parameters?

I wrote a small utility tool nbloader that enables code reusing from jupyter notebooks. With it, you can import a notebook as an object, pass variables to its namespace, run code cells and pull out variables from its namespace.

This tutorial will show you how to make your notebooks reusable with nbloader.

Read more…

Word2Vec with TensorFlow

Word2Vec with Skip-Gram and TensorFlow

This is a tutorial and a basic example for getting started with word2vec model by Mikolov et al. It is used for learning vector representations of words, called "Words Embeddings". For more information about Embeddings, read my previous post.

The word2vec model can be trained with two different word representations:

  • Continuous Bag-of-Words (CBOW): predicts target words (e.g. 'mat') from source context words ('the cat sits on the')
  • Skip-Gram: predicts source context-words from the target words

Skip-Gram tends to do better and this tutorial will implement a word2vec with skip-grams.

The goal of the model is to train it's embeddings layer in a way that similar by meaning words are close to each other in their N-dimensional vector representation. The model has two layers: the embeddings layer and a linear layer. Because of the last layer is linear, the distance between embedding vectors for words is linearly related to the distance in the meaning of those words. In other words, we are able to do such mathematical operations with the vectors: [king] - [man] + [woman] ~= [queen]

Read more…

Embeddings with TensorFlow

Embeddings in TensorFlow

To represent discrete values such as words to a machine learning algorithm, we need to transform every class to a one-hot encoded vector or to an embedding vector.

Using embeddings for a sparse data often results in more efficient representation as compared to the one-hot encoding approach. For example, a typical vocabulary size for NLP problems is usually from 20,000 to 200,000 unique words. It will be very inefficient to represent every word by a vector of thousands of 0s and only one 1.

Embeddings can also be "trained" by an optimizer to have different similarities which could represent semantic similarities between words. For example, a model using trained embeddings could predict a test dataset with words unseen before in the training dataset and still have a logical inference based on similar words before seen when training.

In this post, I'll show and describe use cases of embeddings with Python and TensorFlow.

Read more…

Jupyter Blogging

Jupyter Blogging in 5 minutes.

Q: What is Jupyter Blogging?
A: Blogging with jupyter notebooks.

Q: Why Jupyter Blogging?
A: As a Data Scientists, I use jupyter to create notebooks with code, equations, visualizations, documentation, etc. "Jupyter Blogging" allows me to share those notebooks with the world without any additional work.

Q: How is this achieved?
A: Jupyter Notebook + Github Pages + Nikola = Jupyter Blogging.

This tutorial will give you basic instructions for setting up a minimum blog powered by jupyter notebooks. This tutorial is written on a jupyter notebook.

Read more…