Dave Beckett — Resume
- Location
- San Francisco, California, USA
- Email
- dave@dajobe.org
- Telephone
- Google Voice: 650-450-8421 (will call screen)
- Sites
- Home page: www.dajobe.org
- Blog: www.dajobe.org/blog/
- Software: github.com/dajobe
- Digital resumes
- www.dajobe.org/cv/
- Stack Overflow
- www.linkedin.com/in/dajobe
Interests and Experience
- Reliability: scaling, complexity, change
- Web: technologies, software design and architecture
- Data and metadata: Big Data (Hadoop full stack), NoSQL, Semantic Web and RDF, relational (SQL), semi-structured, real time/low latency, distributed
- Open: standards, Open Source / Free Software development, open data
Key Skills
- Software development
- Analysis, design and architecture for large-scale software systems
- Strong skills in technical leadership, training, mentoring and communicating
- Coding considering long-term portability, packaging, maintenance and support
- Technical writing, documentation and presentations
- Languages: C, Python, Perl, automake, autoconf, shell, flex and bison (expert 15+ years);
Ansible, Hive SQL, MySQL (experienced);
- Expert on Resource Description Framework (RDF) and Semantic Web Technology
- Expert on XML, XML Namespaces, XML Infoset and web architecture
- Extensive experience with Web concepts, architecture and technologies
- Experience with geo and local search technologies and business.
- Experience with social networking techologies and products.
- Systems and reliability engineering
- Capacity planning of servers, services to the multi-datacenter level
- Forecasting of future capacity. Expert at spreadsheets (google
sheets /excel) and pivot tables.
- Systems, services and distributed monitoring and observability
- Deployment and configuration management.
- Performance optimizing, tuning, specifying hardware requirements and workig with hardware engineers to develop, test and productionize new hardware.
- Problem and incident analysis, remediation and identification of longer term steps
- SRE practice and processes: SLA, SLI, SLO, error budgets and executing that with development partners. Training on how SRE works.
- Free Software / Open Source
- Licensing, collaboration, community, policy issues.
- Founder of Redland RDF, Flickcurl Flickr API projects
- Standards development activity: W3C, RDF and Dublin Core
- Co-author of 1 W3C Recommendation on Turtle with Sir Tim Berners-Lee, Eric Prud'hommeaux and Gavin Carothers (Feb 2014)
- Editor of 3 W3C RDF Recommendations, 1 Dublin Core Recommendation
- Member of W3C RDF Data Access Working Group (2004-2005)
- Member of W3C RDF Core Working Group (2001-2004)
- W3C representative for the University of Bristol (2002-2005) and University of Kent (2000)
- Portable Network Graphics (PNG) (1995-) and the first browser implementation of it
- Technologies
- Hadoop stack: HDFS, Map Reduce, YARN, Hive, HBase
- Operation of Linux (RedHat CentOS, Debian / Ubuntu, Gentoo), OSX
- Familiar with Docker such as used in docker nghttp2
- Configuration management: Puppet, Ansible, some chef
- Linux systems administration and network administration.
- Software, Community and Professional roles
- Program committee member for O'Reilly Strata conference on big data (2011-2015)
- W3C Semantic Web Interest Group (2000-2015)
- Debian Project Developer (2005-2022)
- Co-founded planetrdf.com (2004-2020)
- Co-ran W3C Semantic Web Interest Group IRC logs and community scratchpad (2000-2020)
Professional Experience
- Mar 2023 — Present: Google Inc, Sunnyvale California, USA
- Staff System Engineer (March 2023 — Present)
- A Staff System Engineer (SRE) in the Site Reliability
Engineering part of the Google Cloud Platform (GCP) organization
working on a Sovereign Trusted Cloud project with an external partner
organization. Developed the operational time model for partners across
all Google scope of 100s of services to aid prioritizing and planning
of project and hiring. Identified operational tools gaps needed for
service operations, created new source of truth to capture the state
and drove making them available for use. Supported training partners
on google operations across dozens of services, with office hours for
interactive help. Oncall for partner service testing to get feedback to
improve operational practices. Advocate for SRE training and improving
operational practices in company and for partners.
- May 2016 — Feb 2023: Twitter Inc, San Francisco, California, USA
- Senior Staff Site Reliability Engineer (Jun 2021 — Feb 2023)
- Kept the Data Platform at Twitter operating both on-premise
and in GCP. Included some of the largest Hadoop clusters in the
world. Leading the team into improving automation, capacity planning,
fixing operational problems, adding operational features and performing
upgrades. Worked with management on strategic and technical challenges,
organizationand planning. Dedicated to making SRE team successful
wherever they worked.
- Achievements: Optimizing and planning capacity saving many $Ms several times. Setting optimal hardware requirements and working to execute it. Safety and reliability strategy company wide. Coded automation isystem for removing toil of hadoop bare metal fleet maintenance including reboots, upgrades, problem discovery, remediation and more
- Staff Site Reliability Engineer (Mar 2017 — Jun 2021)
- Keeping the Hadoop clusters at Twitter (some of the largest in
the world) running both on-premise and in cloud along with the rest
of the Data Platform. Leading the team into improving automation,
capacity planning, fixing operational problems, adding operational
features and performing upgrades. Working with management on
strategic and technical challenges and planning.
- Achievements: Leading data platform cloud migration for both
deployment and network design, in collaboration with cloud vendor
(GCP).
- Senior Site Reliability Engineer (May 2016 — Mar 2017)
- Keeping the Hadoop clusters at Twitter (some of the largest in
the world) running on-premise (bare metal). Leading the team into
improving automation, capacity planning, fixing operational
problems, adding operational features and performing
upgrades.
- Achievements: technical
evaluation of multiple cloud vendors for data platform. Planning
migration approaches with leadership. Providing technical input
into cloud decision process.
- July 2013 — May 2016: Rackspace Hosting Inc, San Francisco, California, USA
- Senior Software Engineer
- Building Hadoop-based big data enterprise platforms coding in python
and devops with Chef and Ansible. Application coding in Map-Reduce
Hadoop with HBase and Hive in Java and some Scala. Performing Hadoop
day-to-day operations (HDFS, Map-Reduce, Hbase, Hive, ...) including
operation, deployment and debugging of job issues. Single handedly
administering and supporting multiple HDP clusters via command line
and more recently with Apache Ambari. Developed Hive-based analytics
over large data feeds including managing data schema mappings and data
management with Airflow and some Cascading and Scalding. I track big
data industry technology trends developing longer term tech strategies.
Learning Spark.
- Achievements: Optimizing of large scale reporting Scalding jobs with custom Hive windowing functions.
- June 2012 — July 2013: Turner Broadcasting Inc, San Francisco, California, USA
- Senior Software Engineer
- Social news and content managment software in Python with Linux and Chef configuration management work.
- September 2010 — May 2012: Digg Inc, San Francisco, California, USA
- Lead Software Engineer (September 2010-May 2012)
- Coding with Python, PHP and a little JavaScript. Working with
Cassandra, Redis, Memcached, Hive, Hadoop Map-Reduce and Tornado.
Developed with Gerrit code review and GIT with continuous integration
via Hudson. Engineering infrastructure design and architecture.
Documented existing systems design and synthesized architecture.
Lead on tracking and analytics stack supporting business metrics and
analysis needs. Mobile device and mobile web lead fixing Digg main
and mobile sites on touch and small screen devices. Lead on public
web API supporting IOS app, dealing with client and server OAuth and
developing new APIs. Doing whatever it takes to get the job done.
- October 2005 — August 2010: Yahoo! Inc, Sunnyvale, California, USA
- Principal Software Architect (Jan 2010 — Aug 2010)
- Social media technology domain architect for Yahoo! Media
property group: News, Sports, Finance, Entertainment globally.
Providing technical leadership over multiple projects
in the social media area, looking at integration with Facebook,
Twitter and other networks, social engagement technology such as
blogging and commenting, polls, ratings, reviews.
Designing integrations and developing social technology
strategy working with product, business and technology leadership.
Mentoring and training other technical contributors.
- Senior Software Architect (Feb 2009 — Jan 2010)
- Technical leadership over multiple projects and Technical Leads
using Web, Storage and Serving technologies at large scale.
Designing major projects from scratch with global reach,
scaling as needed, with best of breed storage and search technology.
Architect of Yahoo! Local
serving local event and business listings integrated with maps
and geo/local search.
- Software Architect (Jul 2007 — Feb 2009)
- Technical leadership over multiple projects and Technical Leads
using Web, Database, XML, Semantic/Natural Language and Semantic Web and other novel technologies.
Designing software architectures, large scale deployments and developing
the long term technical plans and visions. Participating in
company-wide leading-edge technological developments and plans.
- Principal/Senior Software Engineer (Oct 2005-Jul 2007)
- Technical lead on projects using Web and Semantic Web technologies.
Designing web APIs and implementing them in PHP and C. Moved
RDF via the Redland libraries into a key
technology for managing Yahoo! content and metadata.
- 2000 — October 2005: University of Bristol, UK
- Senior Technical Researcher, technical leader, IEMSR Project (Aug 2004-Oct 2005)
- Management and administration: responsibilities including project technical direction, project team management, co-leading ILRT Web Futures Group including bidding for funding.
- Worked on the W3C RDF Data Access Working Group developing the SPARQL RDF query language (2004-).
- Java development with Eclipse, SWT and JFace.
- Senior Technical Researcher, SWAD Europe (Dec 2002-Oct 2004)
- Ran development, outreach and workshops for SWAD Europe
- Designed and developed the portable Redland RDF API, Raptor RDF parser and Rasqal RDF query libraries
- Worked on the W3C RDF Core Working Group (WG) editing two W3C Recommendations
- Participated in many RDF developer communities and activities
- Built Web Search Environments (WSE) novel web crawling/metadata system
- 1998 — 2000: University of Kent at Canterbury, UK
- Research Fellow
- UK Mirror Service (UKMS): designed, implemented and operated.
- Created the UKMS metadata, search, web mirroring and logging systems.
- Extensive Linux and Solaris administration.
- Created the premier online RDF Resource Guide (1998-present)
- Operated and maintained the database-driven department web site.
- 1990 — 1998: University of Kent at Canterbury, UK
- Computing Officer
- Parallel computing with INMOS Tranputers, Meiko, occam language
- Support Parallel Computing/HPC service center for south east UK
- Created and operated the Internet Parallel Computing Archive (IPCA) (1993-1998).
- Participated in the Dublin Core Metadata Initiative (1995-)
Education
- 1987-1990, University of Bristol
- BSc (Hons) Degree in Computer Science
Selected publications
- Boosting Hadoop* Performance and Cost Efficiency with Caching, Fast SSDs, and More Compute, Dave Becket, Matt Singer, Millind Damle, Rakes Radhakrishnan, Barrie Wheeler, white paper with Intel, 2019 (PDF copy)
- Turtle - Terse RDF Triple Language, W3C Recommendation. Edited by Eric Prud'hommeaux and Gavin Carothers. Co-authored with Sir Tim Berners-Lee, Eric Prud'hommeaux and Gavin Carothers, 25 February 2014
- SPARQL Query Results XML Format, Sandro Hawke (second edition editor), Dave Beckett and Jeen Broekstra (editors), W3C Recommendation, 21 March 2013.
- Semantics Through the Tag paper (slides) presented at XTech 2006, Amsterdam 19 May 2006.
- RDF/XML Syntax Specification (Revised), Dave Beckett (editor), W3C Recommendation, 10 February 2004
- RDF Test Cases, Jan Grant and Dave Beckett (editors), W3C Recommendation, 10 February 2004
- SWAD Europe deliverable report on Workshop on Semantic Web Storage and Retrieval, held 13-14 November 2003 at Vrije Universiteit, Amsterdam. 12 January 2004
Selected presentations and events
- Accelerating Hadoop at Twitter with NVMe SSDs: A Hybrid Approach - Matthew Singer, Dave Beckett, Varun Sampat ,Mark Schonbach at Intel Flash Memory Summit, 2019. PDF.
- Why Twitter moved its big data into Google Cloud with Derek Lyon. Promotional video for Google Cloud, 10 Aug 2018.
- How @TwitterHadoop Chose Google Cloud
presentation and video of presentation at Google Cloud Next 2018 with Derek Lyon. Videoed with Derek for promotion that originally appeared at cloud.google.com/twitter but is now at Interet archive
- Moving the Twitter Hadoop Elephant Partly on Clouds
The 2020 Cloud Next was cancelled but I had an accepted presentation on the SRE work for above
- Screencast video: Command Line Semantic Web with Redland presented at the Semantic Web Austin Meetup during SXSW, 15 March 2010.
- Open Source Semantic Web, Semantic Technoogy Conference 2009 open source Code Camp, 14 June 2009.
- Invited keynote panel speaker, Semantic Technology Conference, San Jose, May 2007
- Redland, Raptor and Rasqal - Open Source RDF in C, Perl, Python, PHP, Ruby, Tcl, Java and C#, invited talk at XMLOpen, Cambridge, 21-23 September 2004
- Invited participant to speak on the semantic web at the Rueschlikon conference on information policy in the New Economy, organised by the John F. Kennedy School of Government, Harvard University, sponsored by The Rueschlikon Centre for Global Dialogue, Switzerland, 19-21 June 2003
- Semantic Web Technologies for UK HE and FE Institutions (session details), Invited lecture given at Institutional Web Management Workshop 2003, University of Kent, Canterbury, 12 June 2003
- Semantic Web Today, invited lecture in Electronic Commerce and New Media series, Department of Information Systems, Vienna University of Economics and Business Administration, Austria, 21 May 2001.