Index | Archives | Atom Feed

How to use the APIs to collect data from Sina Weibo

Suppose you want to extract tweets posted by a selected group of Weibo users. Here are the steps:

Step 1: Create a new Weibo account

After the account is created, manually add the target users (whose tweets you want to collect) as your followees.

Step 2: Create a new app ...

SSH Login Without Password

Suppose you want to access to Linux machine B from windows machine A. Here is the steps to setup passwordless ssh login.

Generate public/private keys on local machine A

  1. Use PuttyGEN to generate public/private keys and save them to the same directory on machine A. For the public ...

Setup SCM Manager

SCM Manager is an easy to use Git/Mercurial/Subvision repositories server. It's written in Java and very easy to install and config (web-interface). It is standalone (no apache or database required) and provides user/group/permission management.

I primarily use Mercurial. This post is the step-by-step guide to ...

Writing readable scientific papers is hard, especially for beginners. Although books (e.g., The Elements of Style) and guides are written to promote the best practices, it is still desired that such guides can be automatically applied to our writing in the form of concrete suggestions.

style-check.rb is such ...

Printable Bash Shortcuts Cheat Sheet

Alan Skorkin wrote a great post on bash shortcuts. I made a printable version using Latex to keep it on my desk. You can download the PDF here.

Preview: bash shortcuts

My Solution to the Quora ML Problem (Predicting whether a question gets an upvoted answer)

The Quora ML CodeSprint 2013 just finished on My entry for the first task - Quora ML Problem: Answered, was ranked 7th/71 on the dashboard.

The objective of the task is to predict whether a question gets an upvoted answer within 1 day, given Quora question text and ...

A Twitter Bot for programming.reddit

Being a long-term programming.reddit reader, I have tried many ways (web browsing, GReader, and now by following @reddit_prog on twitter) to follow stories appearing there. Following @reddit_prog is very comfortable for me, because it saved me a lot of time and it is easy to skip irrelevant stories at ...

Clustered Hacker News

The goal

Topic based organization of information sources might be a solution to the information (cognition) overload problem on the social Web. For example, There are lots of news posted everyday on Hacker News, a startup news aggregation site. The problem is that different users have different preference on the ...

Contextual Subtitle for Language Learning

As a Chinese, I occasionally learn oral English via watching American TV Series. One problem is about the subtitle. If I use the Chinese subtitle, then I rely on the subtitle for understanding, rather than "hearing"; otherwise, if I use the English subtitle, then there are regular words that are ...

Parsing Evernote export file (.enex) using Python


My wife keeps all the stories written about our baby in Evernote and then publishes them on her blog every few days. Recently, as the baby boy is going to celebrate his first birthday, my wife plans to publish the stories as a book. This post is about how ...

© Han Xiaogang. Built using Pelican. Theme by Giulio Fidente on github. .