OUR TRAINING
BigQuery
Two days training: 70% theory + 30% practice;
BigQuery is a serverless, highly scalable and cost-effective enterprise data warehouse that is available on multiple cloud platforms, including the Google Cloud Platform. Google's enterprise data warehouse was designed to make large scale data analysis accessible to everyone, from data-savvy profiles like data engineers and data scientists, to less technical profiles like report / dashboard builders and business analysts.
We'll show you how BigQuery can help you get valuable insights from your data with ease, whether it is log data from thousands of retail systems or IoT data streams from millions of vehicle sensors.
If your company is on the road to become data-driven, it is probably analysing big volumes of data, sitting in a data warehouse architecture or Hadoop clusters, either on-premise or in the cloud. Very quickly, you'll run into problems with data and usage growth, scalability, maintenance, upgrades and licensing costs. If you have trouble finding the necessary skills and budgets for managing a scalable setup, then a cloud data platform like BigQuery may be the answer. Say goodbye to data silo's, huge licensing costs, infrastructure maintenance and upgrades, and start analysing all of your data.
Because BigQuery is a serverless compute architecture that decouples compute and storage, this enables diverse layers of the architecture to perform and scale independently, which also gives data developers flexibility in design and deployment.
This two-day workshop shows you how and why BigQuery has become one of the best platforms for analyzing and learning from data. Powerful features include Standard SQL, deeply nested data, user defined functions in JavaScript and SQL, geospatial data, integrated machine learning, URL addressable data sharing, federated queries, integration with other Google products, ... just to name a few. All these features allow you to implement self-service ad hoc data exploration that many users demand.
During this workshop, we will use Google's Colaboratory, or 'Colab' for short, as development environment. This allows you to write and execute SQL asnd Python in your browser without installing and configuring any software, while having free access to GPUs and really easy sharing. Think about Colab as a Jupyter notebook stored in Google Drive. Whether you're a student, a data scientist or an AI researcher, Colab can make your work easier. Watch Introduction to Colab to find out more.
For this workshop, we will set up a personal BigQuery / Colab development environment, where you can execute the prepared exercises, in the browser, and experiment with your own code and data. Many exercises will be based (with permission) on the fantastic Google BigQuery: Definitive Guide book by Valliappa "Lak" Lakshmanan.
Google BigQuery was recognised by Gartner as a leader in the Magic Quadrant for Cloud Database Management Systems (23 November 2020).
APACHE SPARK
Two days training: 60% theory + 40% practice
Big Data is the hype of the moment in ICT and marketing. Since its inception in 2007, Apache Hadoop has been looked at as the de facto standard for the storage and processing of big data volumes in batch.
But every technology has its limitations, and this is no different for Hadoop: it is batch-oriented and the MapReduce framework is too limited for handling all types of data analysis within the same technology stack.
Apache Spark makes big data easy to implement, it was developed in 2009 at the AMPLab (Algorithms, Machines, and People Lab) of the University of California in Berkeley, and donated to the open source community in 2010. It is faster than Hadoop, in some cases 100 times faster, and it offers a framework that supports different types of data analysis within the same technology stack: fast interactive queries, streaming analysis, graph analysis and machine learning. During this two-day hands-on workshop, we discuss the theory and practice of several data analysis applications.
Notebook technologies like Zeppelin, Jupyter, Spark Notebook and Databricks Cloud allows you to go from prototypes into production workflows in one go. Notebooks allow to implement “repeatable research” by mixing executable code with comments, images, tables, links, …
We’ve chosen Databricks Cloud as notebook technology because it is the most mature enterprise-ready notebook technology on the market at this moment. It’s available on AWS and Azure.
This course supports Spark 2.x using Python & Scala.
KSTREAM AND KSQL
Available soon
Two days training: One day of theory, additional day with exercises on demand
Apache Kafka is an open-source stream-processing software platform. It is used for many use case where data needs to flow in real time and be readily available.
Kstreams and KSQL are libraries for building streaming applications. These applications aim at transforming input Kafka topics into output Kafka topics. Kstream lets you do this with concise code in a way that is distributed and fault-tolerant; while KSQL doesn't require any code.
Learn more soon about the content of this course
AI FOR BUSINESS
Half-day seminar for business people
This seminar brings you up to speed in the state-of-the-art in artificial intelligence, and offers you a guided tour through the fascinating world of automation, (chat)bots, data mining, neural networks, (un)supervised learning, machine learning, deep learning and data science.
AI and its subdomain Data Science are in the news almost every day: from how “sexy” the job of a data scientist is to the “infinite” possibilities of Artificial Intelligence.
What do the internet gigants like GOFA do with AI?
Beneath all the hype that surrounds artificial intelligence (AI), automation and data science, there are real breakthroughs in AI happening at this moment, and they are transforming the way we do business. AI developers are creating software that doesn’t just do what is programmed for, but is able to anticipate the needs of customers and users through a combination of pattern recognition, knowledge mining, planning and reasoning.
How to tackle your AI projects? What kind of data architecture is optimal ?
This seminar will explain how Artificial Intelligence evolved throughout its “winters” into the narrow AI we all use in our daily life. Some studies predict that AI will change our jobs and our economy in a huge way. Even today, companies can already plug into AI from the cloud to start enhancing your employees.
Data Science is a subdomain of Artificial Intelligence which is already widely adopted by large organisations. It helps those organizations to define the next best offer to their customers, predict those customers with high probability in churning, segmenting customers into segments etc. We will give an overview what it means to be a data scientist, how data science can be used for business and how to start adopting data science within your organisation.
HADOOP
Two days training: one day theory + one day practice; can be shortened to half a day on demand.
We still offer this course for legacy reasons.
The rise of the internet, social media and mobile technologies and in the very near future the Internet of Things ensures that our data footprint is rising fast.
Companies like Google and Facebook were quickly confronted with massive data sets, this led to a new way of thinking about data. Hadoop provides an open source solution based on the same technology used within Google. It allows you to store and analyze in a scalable way huge amounts of data to create new insights.
With this workshop we want to give everyone the opportunity to get acquainted with the Hadoop Ecosystem.