Kevin Vecmanis
Engineering + Data Science

Try > Fail > Learn > Repeat



Accelerate - The Science of Lean Software and Devops


A review of key concepts I learned in the book Accelerate: The Science of Lean Software and Devops

Read more


How to Help People When You Aren't the Subject Matter Expert


The ability to help a colleague or direct report when they’re stuck is an important part of being an engineer. Sometimes, you lack the same context or technical expertise as the person seeking help. This is a strategy I learned for being helpful in these situations.

Read more


Clustering Frequency Domain Data


Unsupervised clustering algorithms can be a great way to explore any structure that is inherent to the data and perhaps not immediately obvious to the analyst.

Read more


Binary Tree Methods in Python


In this post I show you a class for creating binary trees (and a cool way to display them!), as well as some methods for analyzing binary trees. Enjoy!

Read more


Stream Data to Google BigQuery with Apache Beam


In this post I walk through the process of handling unbounded streaming data using Apache Beam, and pushing it to Google BigQuery as a data warehouse.

Read more


Train and Evaluate Machine Learning Models in Google BigQuery


In this post I walk-through one of the coolest features in Google BigQuery - the ability to training and evaluate machine learning models directly in BigQuery using SQL syntax. I’m going to import Google’s public e-commerce dataset from Google Analytics and build a machine learning model that predicts return buyers.

Read more


Create Fast, Fault-Tolerant ETL Pipelines with Google Kubernetes


In this post I do a walk-through demonstrating how to distribute a data ingestion process across a Kubernetes cluster to achieve fast, inexpensive, and fault-tolerant data pipelines on Google Cloud Platform. This model can be used for many kinds of distributed computing - not just data pipelines! I enjoyed learning this because it cut my data processing costs for VanAurum significantly. I hope you enjoy it!

Read more


Building a Brain - Distributing Machine Learning Models in NoSQL


In this post I walk through an architecture model for building better operational intelligence into VanAurum by distributing and accessing many machine learning models in MongoDB, a popular open source NoSQL database framework.

Read more


Analyzing the S&P 500 with PySpark


The Spark dataframe API is moving undeniably towards the look and feel of Pandas dataframes, but there are some key differences in the way these two libraries operate. In this post I walk through an analysis of the S&P500 to illustrate common data analysis functionality in PySpark.

Read more


How to implement Bayesian Optimization in Python


In this post I do a complete walk-through of implementing Bayesian hyperparameter optimization in Python. This method of hyperparameter optimization is extremely fast and effective compared to other “dumb” methods like GridSearchCV and RandomizedSearchCV.

Read more


Complete Guide to Installing PySpark on MacOS


Getting PySpark set up locally can be a bit of an involved process that took me a few tries to get right. In this post I cover the entire process of succesfully installing PySpark on MacOS. Enjoy!

Read more


Analyzing the Cost Benefits of Robo-Advisors


Robo-advisors are on a clear path to dominating the future of asset management. I was curious about how beneficial some of the cost benefits of robo advisors can be on the growth of a portfolio. In this series we perform some hypothesis testing to analyze the benefits offered by robo-advisors rather than directing your own portfolio. How significant are they?

Read more


Support Vector Machine Hyperparameter Tuning - A Visual Guide


In this post I walk through the powerful Support Vector Machine (SVM) algorithm and use the analogy of sorting M&M’s to illustrate the effects of tuning SVM hyperparameters.

Read more


XGBoost Hyperparameter Tuning - A Visual Guide


XGBoost is a very powerful machine learning algorithm that is typically a top performer in data science competitions. In this post I’m going to walk through the key hyperparameters that can be tuned for this amazing algorithm, vizualizing the process as we go so you can get an intuitive understanding of the effect the changes have on the decision boundaries.

Read more


An Intuitive Walk-through of Linear Regression in Python


In this article I do my best to explain some of the more confusing regression-related topics using something that we can all relate to - pizza! Through the course of the tutorial, I introduce regressions, the intuition behind them, how to execute them in Python, and how to interpret the results.

Read more


Algorithmic Portfolio Optimization in Python


In this installment I demonstrate the code and concepts required to build a Markowitz Optimal Portfolio in Python, including the calculation of the capital market line. I build flexible functions that can optimize portfolios for Sharpe ratio, maximum return, and minimal risk.

Read more


Selecting Machine Learning Algorithms Part 1


In this post I demonstrate how to build a spot-checking algorithm that can evaluate a basket of machine learning algorithms on scaled and un-scaled data. By establishing a baseline performance you can then move on to forming and testing hypotheses regarding how transformations and parameter tweaks might affect your model performance.

Read more


Kevin Vecmanis