How to perform K-means clustering with Python in Scikit?

Click for: original source

While deep learning algorithms belong to today’s fashionable class of machine learning algorithms, there exists more out there. Clustering is one type of machine learning where you do not feed the model a training set, but rather try to derive characteristics from the dataset at run-time in order to structure the dataset in a different way. It’s part of the class of unsupervised machine learning algorithms. By Christian Versloot.

k-means clustering is a method (…) that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.

Wikipedia

The article dives into:

  • What is K-means clustering?
  • Introducing K-means clustering
  • The K-means clustering algorithm
  • Inertia / Within-cluster sum-of-squares criterion
  • On convergence of K-means clustering
  • The drawbacks of K-means clustering – when is it a bad choice?
  • Implementing K-means clustering with Python and Scikit-learn
  • Generating convex and isotropic clusters
  • Applying the K-means clustering algorithm
  • Full model code
  • Results

In this blog post, we looked at K-means clustering with Python and Scikit-learn. You will also get good explanation of Python code, links to further reading and resources together with some video talks explaining the concepts and science behind the article. Nice one!

[Read More]

Tags python data-science analytics big-data