Great Taxi Data Demo with FLaNK

My friend, Ian Brooks, just wrote an amazing demo utilizing the FLaNK Stack.
It is using Apache NiFi to read NYC Taxi data (CSV), preprocessing it, transforming it and then publishing it to an Apache Kafka topic. An Apache Flink SQL Streaming job reads the Kafka events, enriches them and then publishes them to another Apache Kafka topic. Finally, Apache NiFi consumes those events from that topic.
The next step is to store both of these feeds in Apache Kudu (or another datastore in CDP say Hive, Impala (Parquet), HBase, Druid, HDFS/S3 and then write some queries / reports on top with say DAS, Hue, Zeppelin or Jupyter. Finally doing some additional machine learning with CML and writing a visual application in CML.
Really cool app and great use of Flink SQL!
Also, please note he developed and ran all of this utilizing IntelliJ, nice use of local tools and then we can push the final app to a large cloud or K8 hosted cluster like Cloudera Data Platform.

#FLaNKStack #ApacheFlink #ApacheNiFi #ApacheKafka


  1. There are so many companies available in Zwolle that provide taxi services. But finding the best taxi service is quite difficult. If you are searching for the best taxi in Zwolle then you can contact Taxibel Centrale. Taxi Zwolle Schiphol

  2. I found your blog on Google and read a few of your other posts. I just added you to my Google News Reader. You can also visit decommission for more IT Recycling Solution related information and knowledge, Keep up the great work Look forward to reading more from you in the future.


Post a Comment

Popular posts from this blog

Introducing Mm FLaNK... An Apache Flink Stack for Rapid Streaming Development From Edge 2 AI