Jazz 2020
Automated code generation
using
Jazz, the lightweight data
processing framework
Open Expo Europe June 2020
Open source software released by
BBVA Data & Analytics
Jazz 2020
1. Introduction to Jazz
2. The ARC Challenge
3. How Big is Big?
4. Code Generation in Nature
5. Formal Fields
6. The Present State of Jazz
Jazz 2020
Jazz in 2018
• Efficiency is never wrong
• Key-value storage on memory mapped file (LMDB)
• Built-in http client and server (libcurl, µhttpd)
• Highly efficient data structures and containers
• One multithreaded process, the same in every node
• Bebop, a language for the backend (was a project in 2018)
Introduction to Jazz
Jazz 2020
Big Thanks to
Releasing Jazz as OSS shows a lot of
courage, vision and commitment.
Digital transformation is not just about
creating great apps, it is a fundamental
change in the relationship with people.
Introduction to Jazz
Jazz 2020
The problem for our solution
Automated code generation
Introduction to Jazz
Jazz 2020
Automated Code Generation
Is NOT: Expressing
ideas in a different
formal language or
expressing them
using a diagram and
compiling into an
executable form.
Introduction to Jazz
IS: Creating a formal
(unambiguous,
executable) piece of
code to solve a problem.
The problem can be
expressed in human
language (ambiguous,
inconsistent, assuming
prior knowledge, etc.)
Jazz 2020
Title Author, Year Affiliation
Neural Turing Machines,
(+ more)
Graves++, 2014-2018 Google DeepMind,
London, UK
Reinforcement Learning
Neural Turing Machines
Zaremba & Sutskever,
2015-2019
Facebook AI Research &
Google Brain
Stoke (5 papers) Schkufza++, 2013-2016 Stanford
Prose Team (2 papers) Gulwani++, 2011-2018 Microsoft Research
Sketch (3 papers) Solar-Lezama, 2008-2013 MIT
… many more … many 2011-2020 Dawn Song (Berkeley), Quoc Le
(Google Brain), Reed & de Freitas
(Google DeepMind), Joulin
(Facebook AI Research), Yale-
NUS (Yale), UTOPIA Research
Group (University of Texas), ...
Some recent results
Introduction to Jazz
Jazz 2020
1. Introduction to Jazz
2. The ARC Challenge
3. How Big is Big?
4. Code Generation in Nature
5. Formal Fields
6. The Present State of Jazz
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
The ARC Challenge
Jazz 2020
1. Introduction to Jazz
2. The ARC Challenge
3. How Big is Big?
4. Code Generation in Nature
5. Formal Fields
6. The Present State of Jazz
Jazz 2020 Class 1 (of 7): Macroscopic
How Big is Big?
Computer resources by the € Clock Cycles Evaluations @(1M/sec)
1 euro 4 x 1013 4 x 1010
1 million euros 4 x 1019 4 x 1016
1 billion euros 4 x 1022 4 x 1019
GDP of the whole planet 8.6 x 1013 (USD)
Age of the universe 1.4 x 1010 (years) 4.3 x 1017 (sec)
Weight of all biomass of the planet 5.6 x 1011 (ton) 5.6 x 1014 (Kg)
Human population
Human population 7.8 x 109
Doing something 1000 times (taking
pictures, owning money, eating, …)
7.8 x 1012
Doing something a million times 7.8 x 1015
Jazz 2020
Class 2 (of 7): The Universe
Number of particles in the universe: 1080
How Big is Big?
Jazz 2020
Class 3 (of 7): Combinatorial
• Do a sequence of decisions taken from a
limited set: (choose words, play games, type
text, do complex manipulation, …)
• Fit a large model
• Generate images
• Code
• …
How Big is Big?
Jazz 2020 Class 4 (of 7): Formal
A(4) = 4!!!! cannot be computed on Earth
B(9) = A(A(A(A(A(A(A(A(A(9))))))))) cannot be computed using the whole
universe as a computer
C(9) = B(B(B(B(B(B(B(B(B(9))))))))) is still a finite number
How Big is Big?
Jazz 2020
Classes (5, 6, 7):
Countable, Continuum and the
Power Set of Ʀ
Countable:
How Big is Big?
0
1
2
Continuum:
Power Set of Ʀ:
• The set of all integers
• The set of rational numbers
• The set of tuples of integers
. . .
• The set of all real numbers
• The set of tuples of real numbers
. . .
• The set of all sets of real numbers
• The set of functions from Ʀ → Ʀ
. . .
Jazz 2020
1. Introduction to Jazz
2. The ARC Challenge
3. How Big is Big?
4. Code Generation in Nature
5. Formal Fields
6. The Present State of Jazz
Jazz 2020
Combinatorial and Formal
Code Generation in Nature
• Both classes are bigger than anything material, including
the universe. Only trivial combinatorial problems are
small enough to be “brute forced” (e.g., tic-tac-toe).
• We know how to search problems in the class of
combinatorial problems, given there is some structure
and we are not searching for a “needle in a haystack”.
• We don’t know how to search the class of formal
problems. Without some limitation, it is “just too big”.
Jazz 2020
The origin of everything
(before code)
1. Matter: Higgs mechanism, 1960-2012
Code Generation in Nature
2. Protons, e-: Particle Physics, 1950-1970s
3. Atoms: Nuclear Physics, 1906-1950s
4. Molecules: Chemistry XIX (1869, Mendeleev)
5. Biomolecules: Miller-Urey, 1952
Jazz 2020
The origin of everything
(code)
We understand how code
works in nature, but …
Code Generation in Nature
… we don’t know how, when
or even where it originated.
Jazz 2020
The origin of everything
(after code)
Lamarck & Darwin XIX: “Theory of Evolution”,
Watson & Crick, 1953 + others: Science of Evolution
Code Generation in Nature
Jazz 2020
Important Ideas
1. If automatic code creation was impossible, we
would not exist.
2. It is not as hard as solving abiogenesis. We don’t
create code out of nothing, we:
● Copy existing short code items
● Mutate code items
● Fit arguments to existing code items
● Recombine items to form new snippets
Code Generation in Nature
Jazz 2020
Forms of code (1 of 2)
Code Generation in Nature
Jazz 2020
Forms of code (2 of 2)
Code Generation in Nature
Jazz 2020
Takeaways from
code in Nature
1. Code is a sequence run once
● It can stop anytime (error is a result)
● Conditionals by “inhibition” (arguments,
not jumps)
2. Code has structure
● Primary structure (opcodes & types)
● Secondary (items are evaluated)
● Tertiary (snippets have a goal)
Code Generation in Nature
Jazz 2020
1. Introduction to Jazz
2. The ARC Challenge
3. How Big is Big?
4. Code Generation in Nature
5. Formal Fields
6. The Present State of Jazz
Jazz 2020 Putting it all Together
• In general, code generation belongs to the class of
formal problems. Even if some problems in that class can
be written simply (low Kolmogorov complexity), anything
in that class is “just too big” and non computable.
• Nature solves this problem by creating code that
executes in sequence just once and does conditionals
through inhibition or in translation.
• Accepting some (apparent) limitations we can still have
Turing-complete code snippets and search them in the
class of not too big combinatorial problems with feedback
and structure.
Formal Fields
Jazz 2020
Formal Fields
Is a framework to automate code generation across domains
using the same algorithms and language grammar.
Formal Fields
Kind: A set of types used in source and destination
Formal field: A source and destination + a domain language
Relation: A field + a code base
Prior: Value of code item from previous experience
Evaluation: A vector with intermediate goals for an item
Reward: A value based on evaluation used in search
Formal fields are intended to enable multi-domain
lifelong learning systems.
Jazz 2020 Bebop
• One time sequence of opcodes
• Strictly typed
• Express complex domains and functions: speech, video, …
• Break as fast as possible (HCF is a result)
• Do conditionals via arguments
• Do loops by rewarding repetition
Formal Fields
Level term description
Primary structure opcodes built-in functions
Secondary struct. items shortest seq. that can be evaluated
“ alleles items with same code and different args.
“ isomorphisms items with same type seq. (form)
Tertiary struct. snippets complete programs source → dest.
“ code base collection of working snippets
HCF is a core opcode.
Jazz 2020
The Formal Fields Paper
Will be linked in the github repository:
https://github.com/kaalam/JazzARC
Formal Fields
Jazz 2020
1. Introduction to Jazz
2. The ARC Challenge
3. How Big is Big?
4. Code Generation in Nature
5. Formal Fields
6. The Present State of Jazz
Jazz 2020
We are thrilled !!
1. We found the problem
for our solution
2. We experience
intelligence emerging
from a machine
3. We have many years
ahead tackling exciting
problems while building
Open Source Software
The Present State of Jazz
Jazz 2020
Jazz is more important
than ever !!!
1. Going from proof of
concept to a production
level, most efficient
possible, scalable
process.
2. Building code,
knowledge and a team
to last many years.
The Present State of Jazz
Jazz 2020
Now, we are on our own, with our
strengths …
• We don't just shoot at the moon, we shoot
at a moon in a distant galaxy.
• We have decades of successful experience,
in AI to understand what works and why.
• We deliver.
• We stand on the shoulders of giants. OSS
• We are used to wearing the running shoes.
• We enjoy doing it.
Two years ago: Jazz @ Open Expo 2018
Jazz 2020
… and weaknesses
• We still have to finish the MVP.
• We have to create a community.
• We need success stories.
• Short version: We need help.
Two years ago: Jazz @ Open Expo 2018
Jazz 2020
And remember, the product is
the second most important thing
in any Open Source Software
Project
Two years ago: Jazz @ Open Expo 2018
Jazz 2020
The most important thing in any
Open Source Software Project is,
of course …
Two years ago: Jazz @ Open Expo 2018
Jazz 2020
The community
Two years ago: Jazz @ Open Expo 2018
Jazz 2020
We need volunteers!
Two years ago: Jazz @ Open Expo 2018
Jazz 2020
Thank you!
ARC challenge: https://github.com/kaalam/JazzARC
Development: https://github.com/kaalam/Jazz
Programming doc: https://kaalam.github.io/develop
Jazz reference: https://kaalam.github.io/jazz_reference
kaalam.ai
@kaalam_ai

Jazz @ Open Expo Europe June 2020

  • 1.
    Jazz 2020 Automated codegeneration using Jazz, the lightweight data processing framework Open Expo Europe June 2020 Open source software released by BBVA Data & Analytics
  • 2.
    Jazz 2020 1. Introductionto Jazz 2. The ARC Challenge 3. How Big is Big? 4. Code Generation in Nature 5. Formal Fields 6. The Present State of Jazz
  • 3.
    Jazz 2020 Jazz in2018 • Efficiency is never wrong • Key-value storage on memory mapped file (LMDB) • Built-in http client and server (libcurl, µhttpd) • Highly efficient data structures and containers • One multithreaded process, the same in every node • Bebop, a language for the backend (was a project in 2018) Introduction to Jazz
  • 4.
    Jazz 2020 Big Thanksto Releasing Jazz as OSS shows a lot of courage, vision and commitment. Digital transformation is not just about creating great apps, it is a fundamental change in the relationship with people. Introduction to Jazz
  • 5.
    Jazz 2020 The problemfor our solution Automated code generation Introduction to Jazz
  • 6.
    Jazz 2020 Automated CodeGeneration Is NOT: Expressing ideas in a different formal language or expressing them using a diagram and compiling into an executable form. Introduction to Jazz IS: Creating a formal (unambiguous, executable) piece of code to solve a problem. The problem can be expressed in human language (ambiguous, inconsistent, assuming prior knowledge, etc.)
  • 7.
    Jazz 2020 Title Author,Year Affiliation Neural Turing Machines, (+ more) Graves++, 2014-2018 Google DeepMind, London, UK Reinforcement Learning Neural Turing Machines Zaremba & Sutskever, 2015-2019 Facebook AI Research & Google Brain Stoke (5 papers) Schkufza++, 2013-2016 Stanford Prose Team (2 papers) Gulwani++, 2011-2018 Microsoft Research Sketch (3 papers) Solar-Lezama, 2008-2013 MIT … many more … many 2011-2020 Dawn Song (Berkeley), Quoc Le (Google Brain), Reed & de Freitas (Google DeepMind), Joulin (Facebook AI Research), Yale- NUS (Yale), UTOPIA Research Group (University of Texas), ... Some recent results Introduction to Jazz
  • 8.
    Jazz 2020 1. Introductionto Jazz 2. The ARC Challenge 3. How Big is Big? 4. Code Generation in Nature 5. Formal Fields 6. The Present State of Jazz
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Jazz 2020 1. Introductionto Jazz 2. The ARC Challenge 3. How Big is Big? 4. Code Generation in Nature 5. Formal Fields 6. The Present State of Jazz
  • 22.
    Jazz 2020 Class1 (of 7): Macroscopic How Big is Big? Computer resources by the € Clock Cycles Evaluations @(1M/sec) 1 euro 4 x 1013 4 x 1010 1 million euros 4 x 1019 4 x 1016 1 billion euros 4 x 1022 4 x 1019 GDP of the whole planet 8.6 x 1013 (USD) Age of the universe 1.4 x 1010 (years) 4.3 x 1017 (sec) Weight of all biomass of the planet 5.6 x 1011 (ton) 5.6 x 1014 (Kg) Human population Human population 7.8 x 109 Doing something 1000 times (taking pictures, owning money, eating, …) 7.8 x 1012 Doing something a million times 7.8 x 1015
  • 23.
    Jazz 2020 Class 2(of 7): The Universe Number of particles in the universe: 1080 How Big is Big?
  • 24.
    Jazz 2020 Class 3(of 7): Combinatorial • Do a sequence of decisions taken from a limited set: (choose words, play games, type text, do complex manipulation, …) • Fit a large model • Generate images • Code • … How Big is Big?
  • 25.
    Jazz 2020 Class4 (of 7): Formal A(4) = 4!!!! cannot be computed on Earth B(9) = A(A(A(A(A(A(A(A(A(9))))))))) cannot be computed using the whole universe as a computer C(9) = B(B(B(B(B(B(B(B(B(9))))))))) is still a finite number How Big is Big?
  • 26.
    Jazz 2020 Classes (5,6, 7): Countable, Continuum and the Power Set of Ʀ Countable: How Big is Big? 0 1 2 Continuum: Power Set of Ʀ: • The set of all integers • The set of rational numbers • The set of tuples of integers . . . • The set of all real numbers • The set of tuples of real numbers . . . • The set of all sets of real numbers • The set of functions from Ʀ → Ʀ . . .
  • 27.
    Jazz 2020 1. Introductionto Jazz 2. The ARC Challenge 3. How Big is Big? 4. Code Generation in Nature 5. Formal Fields 6. The Present State of Jazz
  • 28.
    Jazz 2020 Combinatorial andFormal Code Generation in Nature • Both classes are bigger than anything material, including the universe. Only trivial combinatorial problems are small enough to be “brute forced” (e.g., tic-tac-toe). • We know how to search problems in the class of combinatorial problems, given there is some structure and we are not searching for a “needle in a haystack”. • We don’t know how to search the class of formal problems. Without some limitation, it is “just too big”.
  • 29.
    Jazz 2020 The originof everything (before code) 1. Matter: Higgs mechanism, 1960-2012 Code Generation in Nature 2. Protons, e-: Particle Physics, 1950-1970s 3. Atoms: Nuclear Physics, 1906-1950s 4. Molecules: Chemistry XIX (1869, Mendeleev) 5. Biomolecules: Miller-Urey, 1952
  • 30.
    Jazz 2020 The originof everything (code) We understand how code works in nature, but … Code Generation in Nature … we don’t know how, when or even where it originated.
  • 31.
    Jazz 2020 The originof everything (after code) Lamarck & Darwin XIX: “Theory of Evolution”, Watson & Crick, 1953 + others: Science of Evolution Code Generation in Nature
  • 32.
    Jazz 2020 Important Ideas 1.If automatic code creation was impossible, we would not exist. 2. It is not as hard as solving abiogenesis. We don’t create code out of nothing, we: ● Copy existing short code items ● Mutate code items ● Fit arguments to existing code items ● Recombine items to form new snippets Code Generation in Nature
  • 33.
    Jazz 2020 Forms ofcode (1 of 2) Code Generation in Nature
  • 34.
    Jazz 2020 Forms ofcode (2 of 2) Code Generation in Nature
  • 35.
    Jazz 2020 Takeaways from codein Nature 1. Code is a sequence run once ● It can stop anytime (error is a result) ● Conditionals by “inhibition” (arguments, not jumps) 2. Code has structure ● Primary structure (opcodes & types) ● Secondary (items are evaluated) ● Tertiary (snippets have a goal) Code Generation in Nature
  • 36.
    Jazz 2020 1. Introductionto Jazz 2. The ARC Challenge 3. How Big is Big? 4. Code Generation in Nature 5. Formal Fields 6. The Present State of Jazz
  • 37.
    Jazz 2020 Puttingit all Together • In general, code generation belongs to the class of formal problems. Even if some problems in that class can be written simply (low Kolmogorov complexity), anything in that class is “just too big” and non computable. • Nature solves this problem by creating code that executes in sequence just once and does conditionals through inhibition or in translation. • Accepting some (apparent) limitations we can still have Turing-complete code snippets and search them in the class of not too big combinatorial problems with feedback and structure. Formal Fields
  • 38.
    Jazz 2020 Formal Fields Isa framework to automate code generation across domains using the same algorithms and language grammar. Formal Fields Kind: A set of types used in source and destination Formal field: A source and destination + a domain language Relation: A field + a code base Prior: Value of code item from previous experience Evaluation: A vector with intermediate goals for an item Reward: A value based on evaluation used in search Formal fields are intended to enable multi-domain lifelong learning systems.
  • 39.
    Jazz 2020 Bebop •One time sequence of opcodes • Strictly typed • Express complex domains and functions: speech, video, … • Break as fast as possible (HCF is a result) • Do conditionals via arguments • Do loops by rewarding repetition Formal Fields Level term description Primary structure opcodes built-in functions Secondary struct. items shortest seq. that can be evaluated “ alleles items with same code and different args. “ isomorphisms items with same type seq. (form) Tertiary struct. snippets complete programs source → dest. “ code base collection of working snippets HCF is a core opcode.
  • 40.
    Jazz 2020 The FormalFields Paper Will be linked in the github repository: https://github.com/kaalam/JazzARC Formal Fields
  • 41.
    Jazz 2020 1. Introductionto Jazz 2. The ARC Challenge 3. How Big is Big? 4. Code Generation in Nature 5. Formal Fields 6. The Present State of Jazz
  • 42.
    Jazz 2020 We arethrilled !! 1. We found the problem for our solution 2. We experience intelligence emerging from a machine 3. We have many years ahead tackling exciting problems while building Open Source Software The Present State of Jazz
  • 43.
    Jazz 2020 Jazz ismore important than ever !!! 1. Going from proof of concept to a production level, most efficient possible, scalable process. 2. Building code, knowledge and a team to last many years. The Present State of Jazz
  • 44.
    Jazz 2020 Now, weare on our own, with our strengths … • We don't just shoot at the moon, we shoot at a moon in a distant galaxy. • We have decades of successful experience, in AI to understand what works and why. • We deliver. • We stand on the shoulders of giants. OSS • We are used to wearing the running shoes. • We enjoy doing it. Two years ago: Jazz @ Open Expo 2018
  • 45.
    Jazz 2020 … andweaknesses • We still have to finish the MVP. • We have to create a community. • We need success stories. • Short version: We need help. Two years ago: Jazz @ Open Expo 2018
  • 46.
    Jazz 2020 And remember,the product is the second most important thing in any Open Source Software Project Two years ago: Jazz @ Open Expo 2018
  • 47.
    Jazz 2020 The mostimportant thing in any Open Source Software Project is, of course … Two years ago: Jazz @ Open Expo 2018
  • 48.
    Jazz 2020 The community Twoyears ago: Jazz @ Open Expo 2018
  • 49.
    Jazz 2020 We needvolunteers! Two years ago: Jazz @ Open Expo 2018
  • 50.
    Jazz 2020 Thank you! ARCchallenge: https://github.com/kaalam/JazzARC Development: https://github.com/kaalam/Jazz Programming doc: https://kaalam.github.io/develop Jazz reference: https://kaalam.github.io/jazz_reference kaalam.ai @kaalam_ai