Han Xiaogang's Blog

Information Retrieval & Data Mining | Data platform | Python | Growing toy ideas into next big things.

Setup Pelican

Here is the simply step by step guide to setup pelican (3.4) on github pages

Install pelican

virtualenv pelican-env
source pelican-env/bin/activate
pip install pelican, markdown

Bootstrap the blog

mkdir myblog
cd myblog
pelican-quickstart

Connect to Github Pages

cd myblog
git clone git@github.com ...

How to use the APIs to collect data from Sina Weibo

Suppose you want to extract tweets posted by a selected group of Weibo users. Here are the steps:

Step 1: Create a new Weibo account

After the account is created, manually add the target users (whose tweets you want to collect) as your followees.

Step 2: Create a new app ...

SSH Login Without Password

Suppose you want to access to Linux machine B from windows machine A. Here is the steps to setup passwordless ssh login.

Generate public/private keys on local machine A

  1. Use PuttyGEN to generate public/private keys and save them to the same directory on machine A. For the public ...

Setup SCM Manager

SCM Manager is an easy to use Git/Mercurial/Subvision repositories server. It's written in Java and very easy to install and config (web-interface). It is standalone (no apache or database required) and provides user/group/permission management.

I primarily use Mercurial. This post is the step-by-step guide to ...

style-check.py

Writing readable scientific papers is hard, especially for beginners. Although books (e.g., The Elements of Style) and guides are written to promote the best practices, it is still desired that such guides can be automatically applied to our writing in the form of concrete suggestions.

style-check.rb is such ...

A Twitter Bot for programming.reddit

Being a long-term programming.reddit reader, I have tried many ways (web browsing, GReader, and now by following @reddit_prog on twitter) to follow stories appearing there. Following @reddit_prog is very comfortable for me, because it saved me a lot of time and it is easy to skip irrelevant stories at ...

Clustered Hacker News

The goal

Topic based organization of information sources might be a solution to the information (cognition) overload problem on the social Web. For example, There are lots of news posted everyday on Hacker News, a startup news aggregation site. The problem is that different users have different preference on the ...

Contextual Subtitle for Language Learning

As a Chinese, I occasionally learn oral English via watching American TV Series. One problem is about the subtitle. If I use the Chinese subtitle, then I rely on the subtitle for understanding, rather than "hearing"; otherwise, if I use the English subtitle, then there are regular words that are ...