Try > Fail > Learn > Repeat
A review of key concepts I learned in the book Accelerate: The Science of Lean Software and Devops
The ability to help a colleague or direct report when they’re stuck is an important part of being an engineer. Sometimes, you lack the same context or technical expertise as the person seeking help. This is a strategy I learned for being helpful in these situations.
In this post I walk-through one of the coolest features in Google BigQuery - the ability to training and evaluate machine learning models directly in BigQuery using SQL syntax. I’m going to import Google’s public e-commerce dataset from Google Analytics and build a machine learning model that predicts return buyers.
In this post I do a walk-through demonstrating how to distribute a data ingestion process across a Kubernetes cluster to achieve fast, inexpensive, and fault-tolerant data pipelines on Google Cloud Platform. This model can be used for many kinds of distributed computing - not just data pipelines! I enjoyed learning this because it cut my data processing costs for VanAurum significantly. I hope you enjoy it!
In this post I walk through an architecture model for building better operational intelligence into VanAurum by distributing and accessing many machine learning models in MongoDB, a popular open source NoSQL database framework.
The Spark dataframe API is moving undeniably towards the look and feel of Pandas dataframes, but there are some key differences in the way these two libraries operate. In this post I walk through an analysis of the S&P500 to illustrate common data analysis functionality in PySpark.
In this post I do a complete walk-through of implementing Bayesian hyperparameter optimization in Python. This method of hyperparameter optimization is extremely fast and effective compared to other “dumb” methods like GridSearchCV and RandomizedSearchCV.
Robo-advisors are on a clear path to dominating the future of asset management. I was curious about how beneficial some of the cost benefits of robo advisors can be on the growth of a portfolio. In this series we perform some hypothesis testing to analyze the benefits offered by robo-advisors rather than directing your own portfolio. How significant are they?
In this post I walk through the powerful Support Vector Machine (SVM) algorithm and use the analogy of sorting M&M’s to illustrate the effects of tuning SVM hyperparameters.
XGBoost is a very powerful machine learning algorithm that is typically a top performer in data science competitions. In this post I’m going to walk through the key hyperparameters that can be tuned for this amazing algorithm, vizualizing the process as we go so you can get an intuitive understanding of the effect the changes have on the decision boundaries.
In this article I do my best to explain some of the more confusing regression-related topics using something that we can all relate to - pizza! Through the course of the tutorial, I introduce regressions, the intuition behind them, how to execute them in Python, and how to interpret the results.
In this installment I demonstrate the code and concepts required to build a Markowitz Optimal Portfolio in Python, including the calculation of the capital market line. I build flexible functions that can optimize portfolios for Sharpe ratio, maximum return, and minimal risk.
In this post I demonstrate how to build a spot-checking algorithm that can evaluate a basket of machine learning algorithms on scaled and un-scaled data. By establishing a baseline performance you can then move on to forming and testing hypotheses regarding how transformations and parameter tweaks might affect your model performance.