EN: The presentation will be in English. Info in
English below.

 

Příští setkání Pražské Czech Java User Group
proběhne v pondělí 27.5. od 19h v posluchárně S5 na
Matematicko-fyzikální fakultě Karlovy Univerzity na
Malostranském náměstí 25, Praha 1. Vstup na akce
CZJUGu je zdarma, a není třeba se předem registrovat.
Pokud se chystáte přijít, dejte nám vědět přihlášením
na tuto událost na facebooku
: https://www.facebook.com/events/898763063796125/

 

Čas: 27.5. 2019 19:00

Název: CZJUG Praha – A look under the hood of H2O – machine
learning for developers

Místo: Posluchárna S5 na Matematicko-fyzikální fakultě
Karlovy Univerzity na Malostranském náměstí 25, Praha
1

 

Občerstvení na toto setkání zabezpečuje firma H2O.ai, za co jim moc děkujeme.

 

CZJUG dále pravidelně podporují:

  • Avast – nahrávání přednášek pro shlédnutí
    online

  • JetBrains – licence pro vývojářské nástroje pro
    přednášející

  • další: Oracle, portál www.java.cz

 

EN: The next meetup will happen on Monday 27th May
at 19:00, in room S5, 2nd floor in the building of
the Charles University at address Malestranské
Náměstí 25, Praha 1.


Title: Productionizing H2O Models with Apache
Spark

 

Spark pipelines represent a powerful concept to
support productionizing machine learning workflows.
Their API allows to combine data processing with
machine learning algorithms and opens opportunities
for integration with various machine learning
libraries. However, to benefit from the power of
pipelines, their users need to have a freedom to
choose and experiment with any machine learning
algorithm or library. Therefore, we developed
Sparkling Water that embeds H2O machine learning
library of advanced algorithms into the Spark
ecosystem and exposes them via pipeline API.
Furthermore, the algorithms benefit from H2O MOJOs –
Model Object Optimized – a powerful concept shared
across entire H2O platform to store and exchange
models. The MOJOs are designed for effective model
deployment with focus on scoring speed, traceability,
exchangeability, and backward compatibility. In this
talk we will explain the architecture of Sparkling
Water with focus on integration into the Spark
pipelines and MOJOs. We’ll demonstrate creation of
pipelines integrating H2O machine learning models and
their deployments using Scala or Python.

 

Speaker: Jakub Háva

 

Jakub (or “Kuba” as we call him) completed his
Bachelor’s Degree in Computer Science and Master’s
Degree in Software Systems at Charles University in
Prague. During his master’s degree studies, he
developed a cluster monitoring tool for JVM based
languages which makes debugging and reasoning the
performance of distributed systems easier using a
concept called distributed stack traces. Kuba enjoys
dealing with problems and learning new programming
languages. At H2O.ai, Kuba leads development of
Sparkling Water project and takes care of integration
of Sparking Water with the rest of H2O.ai
ecosystem.

 

H2O internals from the technical point of view

 

H2O-3 is an open-source machine learning platform
made to be scalable and fast. While providing
interfaces faimiliar to data scientists (Python, R,
Scala, Web UI and others), H2O-3 itself is
implemented in Java. It contains many of the most
popular machine learning algorithms, including
Gradient Boosting Machines, XGBoost, Generalized
Linear Models, Deep Learning and much more. It is a
distributed, scalable platform users can start with
by simply running it on their laptops with minimal
requirements and then taking it to the cloud, running
large H2O clusters and processing vast amounts of
data. An introduction to H2O’s features and mission
will be done in order to demonstrate the challenges
faced while implementing such system. A look under
H2O’s hood follows, revealing some of the internal
machanisms used to make machine learning algorithms
distributed and fast. And what challenges does that
bring. Finally, H2O is not an isolated island
floating in the waters of machine learning only. Lots
of engineering effort goes into integration with
other systems, such as databases, file systems and
distributed computing platform. Also, resulting
models must be productionized. There will be a guided
tour through the engineering of such parts, focusing
on challenges introduced by the distributed nature of
the system.

 

Speaker: Pavel Pscheidl

 

Pavel is a machine learning engineer at H2O. Holding
a master’s degree in Applied Informatics from Faculty
of Informatics UHK, his main focus during his studies
was applied statistics & stochastic methods,
agent-based simulations and optimization. He joined a
research team as a Ph.D. candidate while working on
various problems like the effectiveness of fraud
detection methods in highly-distributed systems. Due
to his roots in computer science, his commercial
focus was on enterprise Java systems and related
standards. He is an author of Java EE 8 Microservices
book. In 2017, Pavel joined H2O’s awesome team,
abandoning all other activities. At H2O, he is proud
of being able to leverage his passion for algorithms
and optimization while diving deeper into many other
fields, including statistics, by learning from the
all-start H2O team.