Posts

Showing posts from April, 2020

Great Taxi Data Demo with FLaNK

My friend, Ian Brooks , just wrote an amazing demo utilizing the FLaNK Stack . https://github.com/BrooksIan/Flink2Kafka It is using Apache NiFi to read NYC Taxi data (CSV), preprocessing it, transforming it and then publishing it to an Apache Kafka topic. An Apache Flink SQL Streaming job reads the Kafka events, enriches them and then publishes them to another Apache Kafka topic. Finally, Apache NiFi consumes those events from that topic. The next step is to store both of these feeds in Apache Kudu (or another datastore in CDP say Hive, Impala (Parquet), HBase, Druid, HDFS/S3 and then write some queries / reports on top with say DAS, Hue, Zeppelin or Jupyter. Finally doing some additional machine learning with CML and writing a visual application in CML. Really cool app and great use of Flink SQL! Also, please note he developed and ran all of this utilizing IntelliJ, nice use of local tools and then we can push the final app to a large cloud or K8 hosted cluster like Cloude