Tag: Data science
-
Finding CRAN packages right from the R console
Posted on June 25, 2019, Level intermediate Resource Length short
The article from Joachim Zuckarelli about working woth Rlang. Currently, there are more than 14,000 R package contributions on CRAN providing R with an unparalleled wealth of features. The downside of the large and increasing amount of packages is that it becomes increasingly difficult to find the right tools to tackle a specific problem.
Tags programming big-data data-science
-
Image recognition in Python with TensorFlow and Keras
Posted on June 14, 2019, Level intermediate Resource Length medium
One of the most common utilizations of TensorFlow and Keras is the recognition/classification of images. If you want to learn how to use Keras to classify or recognize images, this article will teach you how.
Tags python big-data data-science
-
Look at how Twitter handles its time series data ingestion challenges
Posted on June 11, 2019, Level intermediate Resource Length short
Ram Dagar is author of this overview on the time series topic. The components of time-series are as complex and sophisticated as the data itself. With increasing time, the data obtained increases and it doesn't always mean that more data means more information but, larger sample avoids the error that due to random sampling.
Tags devops database machine-learning data-science software
-
How we use Apache Kafka and the Confluent Platform
Posted on June 4, 2019, Level intermediate Resource Length short
Jendrik Poloczek from confluent.io published this article about their experience building the core infrastructure to integrate, clean, and analyze blockchain data.
Tags blockchain apache apis data-science scala
-
Great engineer needs the liberal arts
Posted on May 23, 2019, Level beginner Resource Length medium
Thomas Betts wrote for infoq.com about how liberal arts eductaion can provide new insights and perspectives that shine a light on technical tasks for any software developer. E.g. empathy helps you know your audience and create great software that delights your customers.
Tags miscellaneous data-science learning programming
-
How to create histogram in Rlang
Posted on May 22, 2019, Level intermediate Resource Length short
In this article the author will show you how to create histogram in R using ggplot2 package. Written by Data Sharkie. When we get a new dataset for our analysis or research, often we would like to learn about the frequency of occurrence distribution of the variable of interest.
Tags analytics miscellaneous big-data cio data-science
-
Building self-served ETL pipeline for third-party data ingestion
Posted on April 18, 2019, Level intermediate Resource Length medium
An article by Nikolaos Tsipas from Skyscanner with help of colleagues Omar Kooheji and Michael Okarimia about how to solve the puzzle when there is a need to import datasets from external sources, and make them available for querying. Examples of imported data include: analytics metrics, advertising data, and currency exchange rates, all of which are used by Skyscanner engineers and data scientists.
Tags big-data data-science software-architecture
-
Google's EdgeTPU benchmarked vs Intel's Movidius
Posted on March 24, 2019, Level beginner Resource Length short
An article written by Frederik Bode about the first benchmark of Google's EdgeTPU Dev Board is in. Read about comparison is made against Intel's (first generation) Movidius Neural Compute Stick, and Google is the clear winner regarding inference time.
Tags big-data data-science analytics machine-learning
-
The data science behind Natural Language Processing
Posted on March 22, 2019, Level beginner Resource Length medium
John Thuma published this piece about the data science behind Natural Language Processing (NLP). Human communication is one of the most fascinating attributes of being sentient. We communicate in a variety of ways including speech and written symbols.
Tags miscellaneous big-data data-science learning
-
Code Migration in Production: Rewriting the Sharding Layer of Uber's Schemaless Datastore
Posted on March 20, 2019, Level intermediate Resource Length medium
An older article by Jesper Lindstrøm Nielsen and Anders Johnsen about how Uber Engineering built Schemaless, their fault-tolerant and scalable datastore, to facilitate the rapid growth of our company.
Tags database data-science nosql
-
Managing analysis workflows in geospatial data science with GNU Make
Posted on March 3, 2019, Level intermediate Resource Length long
Martà Bosch wrote this guide how to go about using Jupyter Notebooks while using iterative approach to both data analysis and software development. He will also explain how to avoid some bad practices. Many issues can be settled by choosing helpful file names, good organization, documentation and source control of the code.
Tags big-data machine-learning data-science miscellaneous python
-
Understanding stabilising experience replay for deep multi-agent reinforcement learning
Posted on March 1, 2019, Level advanced Resource Length long
An article by Parnian Barekatain in which she describes some basic concepts in Reinforcement Learning. She also provides you with the link to Udacity's free course on Deep Learning with Pytorch.
Tags big-data machine-learning data-science miscellaneous