Kevin Vecmanis

How to Think About Bitcoin Multi-Sig Arrangements

Jan 4, 2022

Using multi-signature self-custody arrangements can be confusing for newcomers to the space and those interested in setting up inheritance structures. In this post I talk about a useful way to conceptualize multisig

bitcoin multisig

Accelerate - The Science of Lean Software and Devops

Nov 20, 2020

A review of key concepts I learned in the book Accelerate: The Science of Lean Software and Devops

devops agile management

How to Help People When You Aren't the Subject Matter Expert

Nov 16, 2020

The ability to help a colleague or direct report when they’re stuck is an important part of being an engineer. Sometimes, you lack the same context or technical expertise as the person seeking help. This is a strategy I learned for being helpful in these situations.

problem solving management

Clustering Frequency Domain Data

Jul 18, 2019

Unsupervised clustering algorithms can be a great way to explore any structure that is inherent to the data and perhaps not immediately obvious to the analyst.

machine learning kmeans clustering

Binary Tree Methods in Python

Jun 20, 2019

In this post I show you a class for creating binary trees (and a cool way to display them!), as well as some methods for analyzing binary trees. Enjoy!

python data structures binary trees

Stream Data to Google BigQuery with Apache Beam

Jun 18, 2019

In this post I walk through the process of handling unbounded streaming data using Apache Beam, and pushing it to Google BigQuery as a data warehouse.

python apache beam google cloud platform BigQuery

Train and Evaluate Machine Learning Models in Google BigQuery

Jun 16, 2019

In this post I walk-through one of the coolest features in Google BigQuery - the ability to training and evaluate machine learning models directly in BigQuery using SQL syntax. I’m going to import Google’s public e-commerce dataset from Google Analytics and build a machine learning model that predicts return buyers.

SQL Google Cloud BigQuery

Create Fast, Fault-Tolerant ETL Pipelines with Google Kubernetes

Jun 10, 2019

In this post I do a walk-through demonstrating how to distribute a data ingestion process across a Kubernetes cluster to achieve fast, inexpensive, and fault-tolerant data pipelines on Google Cloud Platform. This model can be used for many kinds of distributed computing - not just data pipelines! I enjoyed learning this because it cut my data processing costs for VanAurum significantly. I hope you enjoy it!

python ETL Google Cloud Platform Kubernetes Docker

Building a Brain - Distributing Machine Learning Models in NoSQL

Jun 6, 2019

In this post I walk through an architecture model for building better operational intelligence into VanAurum by distributing and accessing many machine learning models in MongoDB, a popular open source NoSQL database framework.

python NoSQL data science machine learning ETL

Analyzing the S&P 500 with PySpark

Jun 2, 2019

The Spark dataframe API is moving undeniably towards the look and feel of Pandas dataframes, but there are some key differences in the way these two libraries operate. In this post I walk through an analysis of the S&P500 to illustrate common data analysis functionality in PySpark.

python pyspark data science

How to implement Bayesian Optimization in Python

Jun 1, 2019

In this post I do a complete walk-through of implementing Bayesian hyperparameter optimization in Python. This method of hyperparameter optimization is extremely fast and effective compared to other “dumb” methods like GridSearchCV and RandomizedSearchCV.

python statistics machine learning SMBO bayesian optimization

Complete Guide to Installing PySpark on MacOS

May 31, 2019

Getting PySpark set up locally can be a bit of an involved process that took me a few tries to get right. In this post I cover the entire process of succesfully installing PySpark on MacOS. Enjoy!

python pyspark

Analyzing the Cost Benefits of Robo-Advisors

May 31, 2019

Robo-advisors are on a clear path to dominating the future of asset management. I was curious about how beneficial some of the cost benefits of robo advisors can be on the growth of a portfolio. In this series we perform some hypothesis testing to analyze the benefits offered by robo-advisors rather than directing your own portfolio. How significant are they?

python finance statatistics

Support Vector Machine Hyperparameter Tuning - A Visual Guide

May 12, 2019

In this post I walk through the powerful Support Vector Machine (SVM) algorithm and use the analogy of sorting M&M’s to illustrate the effects of tuning SVM hyperparameters.

dataviz python xgboost machine learning

XGBoost Hyperparameter Tuning - A Visual Guide

May 11, 2019

XGBoost is a very powerful machine learning algorithm that is typically a top performer in data science competitions. In this post I’m going to walk through the key hyperparameters that can be tuned for this amazing algorithm, vizualizing the process as we go so you can get an intuitive understanding of the effect the changes have on the decision boundaries.

dataviz python xgboost machine learning

An Intuitive Walk-through of Linear Regression in Python

Apr 5, 2019

In this article I do my best to explain some of the more confusing regression-related topics using something that we can all relate to - pizza! Through the course of the tutorial, I introduce regressions, the intuition behind them, how to execute them in Python, and how to interpret the results.

dataviz python statistics regression

Algorithmic Portfolio Optimization in Python

Apr 2, 2019

In this installment I demonstrate the code and concepts required to build a Markowitz Optimal Portfolio in Python, including the calculation of the capital market line. I build flexible functions that can optimize portfolios for Sharpe ratio, maximum return, and minimal risk.

dataviz minimization python

Selecting Machine Learning Algorithms Part 1

Mar 2, 2019

In this post I demonstrate how to build a spot-checking algorithm that can evaluate a basket of machine learning algorithms on scaled and un-scaled data. By establishing a baseline performance you can then move on to forming and testing hypotheses regarding how transformations and parameter tweaks might affect your model performance.

dataviz python statistics machine learning