title: Contextual Subtitle for Language Learning description: “” category: tags: []

As a Chinese, I occasionally learn oral English via watching American TV Series. One problem is about the subtitle. If I use the Chinese subtitle, then I rely on the subtitle for understanding, rather than “hearing”; otherwise, if I use the English subtitle, then there are regular words that are out of my vocabulary, which makes the experience not enjoyable. Then I come up with the idea - a mixed subtitle.

I wrote a python script to perform the creation of the mixed subtitle, which go through the English subtitle file (.srt format) of the movie to check for unacquainted word and add the Chinese translation of the word immediately after the word. Here are two screenshots from the subtitle generated for The Big Bang theory season 2.

image image

To perform the generation, first I need an English-Chinese dict. The LDC provides an such wordlist. Second, an common English word list is needed. This is very easy to find. I use resouce1, resource2, and resouce3 as the known wordlist. One more file is created to store the words frequently appears in the movie I am watching (such as names).

The source code file can be download from the github gist

Problems

Todos

Anyway, the current version is already in use:). I have enjoyed many episodes of The Big Bang Theory with the mixed subtitle.