One minute
Predictive Analytics with Big Data Tecnologies
The objective of this project was the creation of a distributed ETL data flow for an aviation maintenance business case. The original dataset is a set of maintenance and flight data from a real airline (AIMS and AMOS extracted data), and the goal was cleaning the data and extracting the necessary KPI’s in order to train a model to predict whether maintenance was required for a specific airplane.
All code was done having distribution in mind, with Python and Spark API, pyspark. Prediction was done with mllib.
The final result is a script that can be executed with a terminal, that is capable of training a model with data extracted online and predict on-demand with user input data and different configurable characteristics
The code developed and extra information on how to use the tool can be found at this github repository.