A Real-time Data Stream Processing Model for a Smart City Application Leveraging Intelligent Internet of Things (IOT) Concepts
Chapter One
Objectives of the Study
The main objective of this thesis is to demonstrate the ability to leverage big data methods such as Cassandra, Kafka, Zookeeper and Spark to perform real-time stream processing; to ensure that no data is lost ;monitor displayed data; determine succeeding data processes and monitor road traffic within an IoT environment for a traffic monitoring system in a smart city environment. The expected result is the integration of Kafka and Spark to perform real-time data stream processing, to process the data by Apache Spark, and to forward the data to the database. A big data tool is expected to be used to query the data from the database.
CHAPTER TWO
LITERATURE REVIEW
Smart City Concepts
The combination of social, physical and Information Technology infrastructure to improve the quality of services and enhance the citizens’ quality of life is termed ‘Smart City’. It allows for real-world urban data to be collected through software systems like sensors, server substructure, network infrastructure, and client devices, implements solutions, with the support of instrumentation and interconnection of sensors, actuators, and mobile devices. According to (Policy & Division, 2015) ’Smart city is a city that monitors and integrates conditions of all of its critical infrastructures including roads, bridges, vehicles, waste management, tunnels, rails, subways, airports, sea-ports, communications, water, power, even major buildings, can better optimize its resources, plan its preventive maintenance activities, and monitor security aspects while maximizing services and enhance the quality of life to its citizens’. Smart cities can be seen as systems with flows of energy, services, people and financing. Moreover, urban planning is closely related to the business, economic and social metabolism of communities. Identification, integration and optimization of different energy, transport and data flows in city planning, monitoring, controlling, making decisions and city management are crucial to creating sustainable smart environments (Highl, n.d.). Some areas of smart city applications are as follows:
Smart Mobility
In the mobility era, transportation must move people and goods faster, seamlessly, and in a proper way, in urban and IoT environments. The environment needs intelligent substructures that are able to process the vast amount of information collected in real time and data stream, and provide the most proficient transportation services to businesses, technologies and citizens alike. Providing transportation in a way that realizes the smart mobility concept requires building a network for the coordination of transportation companies or entities that collect, process and analyse information from the various entities that operate in the city; and supply each entity with information they can use to optimize and utilize the overall system (Okuda, 2012). All around the world, people are combining cities. Fifty-three percent of the population currently lives in urban areas and, by 2050, this is expected to reach 67 percent. Countless studies have shown that most cities are badly designed and are not able to cope with underlying transportation constraints, resulting in congestion within cities as the population grows.
Smart Grid
A smart grid is an electrical grid that includes a wide variety of operational and energy measures including smart meters, smart appliances and combining renewable energy resources with non-renewable energy resources to manage energy consumption. It is a renovated electrical grid system that uses information and communication technology and networks of physical devices to collect and act on available data, and process the data (such as information about the behaviors of suppliers and consumers) in an automated fashion to add some value (Al Nuaimi, Al Neyadi, Mohamed & Al-Jaroodi, 2015).
Smart Buildings
A smart building is a smart network with a central computer used for programming its environment, devices, and building appliances. A smart building is one that achieves significant energy savings by taking advantage of improved technology and materials in terms of structure, appliances, electrical systems (“What is a Smart Building_ _ Building Efficiency Initiative _ WRI Ross Center for Sustainable Cities,” n.d.).
It combines building electricity usage with motion sensor lights which can switch off automatically when a room is empty; detect when there is a leaking pipe using smart meters; keep track of electricity usage through a smart electric meter, and generate alerts when it reaches a specified threshold.
CHAPTER THREE
METHODOLOGY
This section discusses the methods used to implement the thesis; the tools used to implement the thesis; how it was implemented. The data set of the traffic monitoring system in a smart city environment is discussed. The methods used to perform real- time data stream processing model are discussed as well as how IoT is leveraged; how the traffic data is been monitored; and what the data is used for.
Apache Kafka
Kafka is a distributed streaming platform and a publish-subscribe messaging system. It is a platform used for collecting and delivering large volumes of data, the actual time it is captured. Kafka is scalable, durable, reliable and fast. It is used to send or ingest data over a cluster. Once the data is captured from the sensors, Kafka transmits the data to the system and also sends the data over the cluster for processing. Apache Kafka is a publish-subscribe messaging platform implemented as a distributed commit log, suitable for both offline and online message streaming. Kafka is a solution to the real-time problems of any software solution to deal with real-time data and route it to multiple consumers quickly. Kafka provides seamless integration between information of producers and consumers without blocking the producers of the information and without divulging the identity of subscribers to producers. Kafka is used to collect the data in real time before it will be analysed. In a very basic structure, the Kafka producer publishes transmits to a Kafka topic, which is created on a Kafka broker acting as a Kafka server and then the consumers then subscribe to the Kafka topic to consume the data from the producers. Kafka ensures reliability.
CHAPTER FOUR
IMPLEMENTATION AND RESULTS
This chapter explains how the implementation was done, the challenges faced, and how the expected results were achieved after the implementation was done. It also describes how traffic is monitored in a smart city environment. The data streaming model is also explained, showing the manner in which it works and also how Apache Kafka performs data processing. The dashboard of the traffic monitoring system is also explained.
Introduction
The Cassandra is launched; then Zookeeper is started because Kafka cannot work without Zookeeper starting perfectly. Also, the Kafka and the zookeeper servers are started. Then Spark is started. In Spark worker node and master nodes must all be started because that is where the jobs are submitted for processing. Also Spark-submit is used to submit all the jobs to the Spark cluster for stream processing.
CHAPTER FIVE
SUMMARY, CONCLUSION AND RECOMMENDATION
Chapter Five summarizes the work done on the thesis, the conclusion of what has been achieved and how it was achieved. It also discusses about what should be done in the future and the method to follow to achieve these recommendations.
Summary
The thesis tries to monitor the movement of traffic by using the connected cars in the IoT environment. The objectives achieved from the implementation are that, we were able to perform real-time data stream processing using Apache Kafka with Spark and also, to process data in the IoT environment using the information captured from the connected cars. The information from the cars was captured and used to make data driven decisions. It tries to monitor the movement of the vehicles, the longitude and latitude which is used to calculate the distance, and processes the data as soon as it is received using Apache Spark.
The proposed system achieved the aim of performing real-time streaming integration of Apache Kafka with Spark which captures the data from the IoT devices, and also does the processing instantaneously.
Conclusion
In an IoT environment such a smart city with a traffic control system in , the problem lies in monitoring road traffic in order that life may be comfortable for the citizens within the city.
The objective of this thesis was to use data generated from the IoT devices to monitor traffic and to make decisions using the data captured from the sensors instantaneously because the data are at most times, critical and time-sensitive.
This approach was taken by understanding how the smart devices work in an IoT environment, understanding how to generate data and capture the data from these devices and also to be able to process the data captured instantaneously. The approach was achieved by using Kafka which is a distributed streaming platform and a messaging system to publish messages for streaming.
Firstly, Kafka topics were created which was used to store the messages, then the brokers or the Kafka server was used to replicate the messages in order to avoid message loss and ensure that messages were delivered successfully for processing by Spark streaming. Spark streaming, the consumer fetches the messages from the topics and processes immediately. Also, it tries to understand the nature of the messages in order to be able to process the data.
Secondly, Spark processor processes the data and pushes the data to the Cassandra database. In the Cassandra database, key spaces were created in which the messages that were processed were stored and key space is used to define data replication on nodes. Further, the spring boot was used to capture the processed messages from the database and display these messages on the dashboard for monitoring.
Future work
It is recommended that future work on traffic monitoring should use the Kaa IoT platform to run the application. Kaa IoT is a highly flexible, multi-purpose, 100% open- source middleware platform for implementing complete end-to-end IoT solutions, connected applications, and smart products. It generates real-time data from smart devices to develop IoT applications. The data displayed on the monitoring dashboard should be used to avoid traffic congestion and prevent road accidents in the city. It should also use decision tree or random forest to predict probable future events and should compare the two methods to determine which will work faster and better.
REFERENCES
- Al Nuaimi, E., Al Neyadi, H., Mohamed, N., & Al-Jaroodi, J. (2015). Applications of big data to smart cities. Journal of Internet Services and Applications, 6(1), 25. https://doi.org/10.1186/s13174-015-0041-5
- Amini, S., & Prehofer, C. (n.d.). Big Data Analytics Architecture for Real-Time Traffic Control, (Tum Llcm).
- Apache Kafka. (n.d.). Architecture of smart cities. (n.d.).
- Caliri, G. V. (n.d.). Introduction to Analytical Modeling.
- Chong, M. M., Abraham, A., & Paprzycki, M. (1997). Traffic accident analysis using.
- Cities, S. (2015). IoT What is IoT? Corporation, I. B. M. (2013). Hive © 2013.
- Gehlot, R. (2016). Storage and Retrieval of Data for Smart City using Hadoop, 3(5), 85–89.