The data science behind Natural Language Processing

Click for: original source

John Thuma published this piece about the data science behind Natural Language Processing (NLP). Human communication is one of the most fascinating attributes of being sentient. We communicate in a variety of ways including speech and written symbols.

Chris Manning, Professor in Machine Learning at Stanford University describes communication as “a discrete, symbolic, categorical signaling system”.

Natural language processing (NLP) is a discipline in computer science and artificial intelligence. The article details some of the basic capabilities of these algorithms in the field of natural language processing:

  • Tokenization
  • Parts of Speech
  • Stop Word Removal
  • Stemming
  • Lemmatization

Linguistics is the study of language, morphology, syntax, phonetics, and semantics. This field, including data science and computing, has blown up over the past 60 years. Code examples in Python included. Insightful!

[Read More]

Tags miscellaneous big-data data-science learning