XG's Blog

Train NLP and image recognition models; Build distributed systems (k8s and mesos); Love Python and Golang

Text classification and similarity search with Python and sklearn

Motivation Nearest neighbour search and classifcation are two most common use cases. In this post, I will summarize how to setup basic flow for both cases. Text classification code from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import <span... [Read More]

Sina Data Crawler

Title: How to use the APIs to collect data from Sina Weibo Date: 2014-07-01 15:40 Category: Coding Tags: sns, weibo, data Slug: weibo-data Suppose you want to extract tweets posted by a selected group of Weibo users. Here are the steps: Step 1: Create a new Weibo account After... [Read More]