Director, Platform Engineering
April 2018 — Present
Took on a leadership role for the platform engineering team. Grew the team from 4 people to over 50 with 6 managers, creating and standardizing an objective based hiring process that included post-interview surveys and NPS scores, and managed cross-functional requirements for different product groups to deliver major milestones for the product. Continued to lead many cross team and business unit initiatives and projects with competing priorities and deliver them within timelines.
Developed several systems to track releases, release frequency, and defect escape rates using in house tooling alongside Sentry across 50+ microservices.
Iterated and built out a post-mortem process that focused on identifying scope of issues and rapid triaging. Used PagerDuty to track recovery times and report on MTTR metrics and teams/individuals affected to reduce any bus-factor and also improve run books for production.
Built out a mentoring program and hiring process for interns from colleges and special interest groups to help increase diversity. The program continues to generate referrals and good will as it expands, reducing effort needed to find talented juniors.
Worked on a continuous improvement process to our engineering on-boarding, reducing the time it took to reach a productive level by 2x.
Pushed for releasing the companies first product related SDK and handled customer questions to help drive API adoption across different vendors.
Developed a career planning guide and promotion guide for team members to use and understand what expectations were and how to advance through the company both on the individual contributor side and management side. Developed NPS surveys for the engineering team and product groups to continuously improve. Worked on building out programs and projects that aligned with business needs and interests from critical talent to keep attrition low and provide opportunities for ICs to lead valuable initiatives.
Started and ran an anonymous retrospective process for the engineering organization to do things like share cross-team concerns, things that are going well / not going well, shout outs, etc. Made sure the teams concerns were heard and acted on, and took mood metrics to track against decisions being made.
Gave various tech talks and presentations on different topics such as running a honeypot network and engagement and team happiness and how to drive recognition and appreciation. Started internal training and learning screencasts for working with our stack, deploying, and using kubernetes.
Originally, I joined as an individual contributor and moved into a leadership role. Led and developed major features prior to product launch: a standard api gateway for all services to use that handled validation and security checks as well as auditing, a syslog receiver for customers to use, standards like common code frameworks to use (buffalo), championed a design doc process, wrote operators for kubernetes to interact with consul and sentry, proof of concept work with open service brokers to ease development and deployment, a CA/PKI api, a back office tenant manager, and run books for service outages and postmortems.
Championed and led the migration of APIs from REST to GraphQL using gqlgen and nautilus. Created a developer focused CLI to help with things like code gen and project creation. Wrote various services automating provisioning of jupyter notebooks using sagemaker and cli tooling. Wrote release management tooling to track releases going to different environments and dependencies.
Applications Security Platform Engineering
March 2017 — April 2018
Built a container scanning pipeline system to provide static analysis of the container operating system and packages, as well as analysis of the applications’ language packages to check for insecure packages in golang.
Built a WAF with microsecond response times in golang to protect customer applications from things like directory traversal, DNS checks (against known block lists), HTTP verb checking, SQL injection, XSS injection, CSP enforcement, Content-Type enforcement, and X-Frame enforcement. Built tooling and a test suite around this to verify different types of attacks, as well as the ability to replay ELB logs from already deployed applications to ensure applications wouldn’t break. Ability to deploy it as a sidecar application/helm dependency so it was easy to integrate with existing applications for customers.
Built out several kubernetes controllers/operators to enable customer deployments and integration with things like secret storage, automatic oauth client generation, and existing legacy applications. Added per-tenant encryption to a shared secret system to ensure all customer data was secured at rest and in transit.
Created a release management system to enable developers to continuously deploy as well as provide a tracking and promotion mechanism for higher level environments (ie, dev to staging to prod) that integrated with helm, kubernetes, and our cicd process.
Lead a cross-team group for several months to ensure customer success by focusing on end-to-end integration of the product from customer standup to application installation on the tenant kubernetes clusters.
Lead Devops Engineer
March 2017 — April 2018
Moved all servers to use Ansible as configuration management, as well as teaching other teams how to use configuration management. Worked to find and build a hybrid solution (mix of dedicated / AWS) for our security scanning infrastructure and performance needs, including procurement in Europe and China. Heavy focus on performance optimizations with Go-lang and security tools like nmap; took lead on building out different tools and automation using Go.
Built, scaled, and setup monitoring for our elasticsearch clusters with over 50 billion documents. Lead the effort to consolidate disparate http services/endpoints and redis servers and move towards a queueing infrastructure with RabbitMQ for reliability. Scaled and debugged RMQ issues at scale. Converted a large MySQL DB to Postgres with a final cutover and zero downtime. Managed elasticsearch upgrades on clusters with tens of billions of documents across multiple point and major releases.
Built from the ground up a (Shodan-like) tool to scan the entire internet using masscan and nmap, Go-lang, RabbitMQ and elasticsearch.
Setup monitoring stack using Grafana, StatsD, InfluxDB for application metrics, Zabbix for alerting and server monitoring, and the ELK stack for log aggregation. Wrote tooling and libraries in Go to clean up projects, helped educate others on Go best practices. Migrated disparate server environments and one-off-tools to Docker and deployed via Jenkins/Ansible.
Setup build environment and automation for all projects using Jenkins, Docker, and Ansible, including security and auditing tools to run on every build to check for vulnerabilities. Lead the way in implementing tests for both rails and go projects. Gave talks on various topics for security, docker, and infrastructure automation. Built several tools in house to audit AWS security groups and github permissions automatically.
Software Engineer & DevOps
April 2014 — June 2015
2014 — Present
Software Engineer & DevOps
June 2012 — April 2015
Software Engineer & Systems Configuration, Applications Programmer III
Tallahassee, FL & Remote
October 2011 — July 2012
Software Engineer & Partner
2005 — October 2011
Software & Systems Engineer
2008 — October 2011