Posts

Great Taxi Data Demo with FLaNK

My friend, Ian Brooks, just wrote an amazing demo utilizing the FLaNK Stack. https://github.com/BrooksIan/Flink2Kafka It is using Apache NiFi to read NYC Taxi data (CSV), preprocessing it, transforming it and then publishing it to an Apache Kafka topic. An Apache Flink SQL Streaming job reads the Kafka events, enriches them and then publishes them to another Apache Kafka topic. Finally, Apache NiFi consumes those events from that topic. The next step is to store both of these feeds in Apache Kudu (or another datastore in CDP say Hive, Impala (Parquet), HBase, Druid, HDFS/S3 and then write some queries / reports on top with say DAS, Hue, Zeppelin or Jupyter. Finally doing some additional machine learning with CML and writing a visual application in CML. Really cool app and great use of Flink SQL! Also, please note he developed and ran all of this utilizing IntelliJ, nice use of local tools and then we can push the final app to a large cloud or K8 hosted cluster like Cloudera Data Platf…

Introducing Mm FLaNK... An Apache Flink Stack for Rapid Streaming Development From Edge 2 AI

Image
Introducing Mm FLaNK...  An Apache Flink Stack for Rapid Streaming Development From Edge 2 AI See:  https://blog.cloudera.com/announcing-support-for-apache-flink-with-the-ga-of-cloudera-streaming-analytics/




Source:   https://github.com/tspannhw/MmFLaNK

Stateless NiFi:https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html

To show an example of using the Mm FLaNK stack we have an Apache NiFi flow that reads IoT data (JSON) and send it to Apache Kafka.   An Apache Flink streaming application running in YARN reads it, validates the data and send it to another Kafka topic.  We monitor and check the data with SMM.    The data from that second topic is read by Apache NiFi and pushed to Apache Kudu tables.

Mm FLaNK Stack (MXNet, MiNiFi, Flink, NiFi, Kafka, Kudu)



 First, we rapidly ingest, route, transform, convert, query and process data with Apache NiFi.   Once we have transformed it into a client, schema-validated known data type we can stream it to Kafka for additiona…