←
Home
Archive
Tags
About
LinkedIn
ds/dx - a data science & ml engineering blog
–
2023
Mar 25
Scheduling a Google Cloud Function to periodically backup your personal Spotify Discover Weekly playlist
serverless
gcp
cloud function
cloud run
cloud scheduler
python
spotify
spotipy
terraform
cloud infrastructure
2021
Jul 20
Using Terraform to build a serverless Airflow via Amazon Managed Workflows and automatic DAG sync using GitHub Actions
airflow
mwaa
terraform
gcp
serverless
workflows
pipeline
dag
apache
github actions
cloud
Apr 24
Building a serverless, containerized batch prediction model using Google Cloud Run, Pub/Sub, Cloud Storage and Terraform
model
containerized
machine learning
gcp
cloud run
cloud storage
pub-sub
terraform
prediction
batch-prediction
serverless
cloud
Jan 10
Building a serverless, containerized machine learning model API using AWS Lambda & API Gateway and Terraform
model deployment
containerized
machine learning
api
aws
lambda
api gateway
terraform
prediction
inference
serverless
s3
cloud
2020
Dec 18
Simulating the effect of multicollinearity in linear modelling using R, purrr & parallel computing
linear regression
modeling
model diagnostics
pitfalls
multicollinearity
anova
t-test
residuals
outliers
log-model
log-log-model
vif
purrr
simulation
glm
glmnet
Aug 13
Speeding up a sklearn model pipeline to serve single predictions with very low latency
sklearn
speedup
tuning
performance
service
endpoint
predictions
pipeline
transformers
optimization
Jun 24
Estimating travel times using geospatial feature engineering and tree based models
tree
travel-times
geospatial
haversine-distance
azimuth
coordinates
taxi
bigquery
gradientboosting
lightgbm
xgboost
bigquery
seaborn
numpy
pyproj
Apr 19
Managing cloud infrastructure in code with Terraform: spawning your own jupyterhub instance on AWS with mounted S3 bucket
terraform
aws
jupyterhub
s3
cloud
ec2
tljh
Mar 8
Using BigQuery on hundreds of millions of events: deduplicate, analyse, visualize, find connected intervals and aggregate to timeseries
bigquery
timeseries
deduplication
bigdata
sql
island
gaps
intervals
grid
events
Feb 23
Using neural networks with embedding layers to encode high cardinality categorical variables
neuralnetwork
embeddings
sklearn
onehotencoding
regression
highcardinality
2019
Dec 15
Writing your own sklearn transformer: DataFrames, feature scaling and ColumnTransformer (writing your own sklearn functions, part 2)
sklearn
transformer
columntransformer
dataframes
features
scaling
pipeline
pandas
numpy
Dec 8
Using a Keras model in combination with sklearn preprocessing, pipelines, grid search and cross-validation
keras
sklearn
gridsearch
pipeline
preprocessing
Oct 30
Combining tree based models with a linear baseline model to improve extrapolation (writing your own sklearn functions, part 1)
trees
sklearn
extrapolation
estimator
regression
ensembling
Sep 11
Caching multiple R functions to a s3 bucket via memoise and aws.s3
caching
memoization
memoise
aws.s3
Sep 3
Dockerizing a R machine learning model with s3 connection and end-to-end tests on Travis
docker
travis
end-to-end
test
machine learning
s3
data lake
random forest
pytest
ci
2018
May 23
Parallelized batch time series forecasting and forecast blending with purrr and multidplyr
forecasting
timeseries
parallelization
ensembleforecasting
purrr
Apr 4
Visualizing Berlin population using leaflet and sf
sf
leaflet
shapefile
maps