XG's Blog

Train NLP and image recognition models; Build distributed systems (k8s and mesos); Love Python and Golang

Text classification and similarity search with Python and sklearn

Motivation Nearest neighbour search and classifcation are two most common use cases. In this post, I will summarize how to setup basic flow for both cases. Text classification code from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import <span... [Read More]

Sina Data Crawler

Title: How to use the APIs to collect data from Sina Weibo Date: 2014-07-01 15:40 Category: Coding Tags: sns, weibo, data Slug: weibo-data Suppose you want to extract tweets posted by a selected group of Weibo users. Here are the steps: Step 1: Create a new Weibo account After... [Read More]

Style Check.py

Title: style-check.py Date: 2014-01-25 17:16 Category: Writing Tags: style-check, python, sublime text Slug: style-check Writing readable scientific papers is hard, especially for beginners. Although books (e.g., The Elements of Style) and guides are written to promote the best practices, it is still desired that such guides can be automatically... [Read More]