Evaluating big data technologies and their usefulness for automobile research analysis
Evaluating big data technologies and explaining the terms how it is useful for the automobile research company to make certain analysis according to the requirement.
Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
- Data Lakes – A data lake means a repository that is centralized which allows to store structured and unstructured data at any point of scaling. The data can be stored without structuring and running several methods of analytics for creating dashboards and making any visualization to process the big data, making analytics at real- time and machine learning to making better decisions. Several organizations can generate business successfully from its data created to out perform its competitors. It has been estimated that companies that who have implemented data lakes have out perform its competitors by 9% in terms of revenue growth organically. The analyst has been able to make several types of analysis like machine learning, click – streams to fetch data, connected to internet devices that are stored in the data lake. This has created many opportunities to make faster decisions and faster growth to retain its customers and make it look attracting, creating a boost in its productivity, maintaining devices pro-actively and finally making decisions which are efficient and effective. According to the requirements the organizations will require a data lake and warehouse with data. The data lake stores relational data according to the line of business applications and for mobile apps non- rational data or for devices IoT and in social media. The data is not defined having the structure or scheme when captured. This interprets that the day can be saved without any kind of careful design and there might be a kind of situations where it is not clear what questions need to be asked for future to come as the kind of questions continuously changes as the requirement changes (LeClerc & Cale 2020).
- In-memory Databases – The memory databases stores data in the main memory of the computer instead of disk drive such that it can make responses much quicker. This eliminates the time that is needed to query the data set by using the data stored in memory. This is used by applications which require rapid response for the output. This is also used for data management for real time. There are several memories that have benefits of in memory data bases which are banking sectors, gaming, travelling and telecommunications sectors. This can be also termed as the real time database or main memory database. It keeps all the data in RAM of the computer. While making query to the data it stores in the main memory. This makes the system to work faster and access to the data. The data storage depends on the computer’s random-access memory instead of those traditional drives. The data is then stored in compressed and non-relational format. The format that it is stored is directly usable having navigation to row or column with the system being read only. It can also store historical data which is then used for business intelligence applications. This reduces the indexing of data and ultimately reducing cost. Thus, in memory databases benefits reduced time transactions. Some of its uses includes processing or streaming sensor dataset. Creating software embedded software systems, creating software systems which are embedded and required for e- commerce applications.
- Streaming Analytics – This works on the real time analytics and engine for event processing which is designed to analyze huge volume of data which is fast steaming taken from multiple sources at a same time. Information can be extracted and identified as per relationships and patterns from several input sources such as sensors, social media feeds, applications, etc. Some examples where stream analytics can be used are predictive maintenance, analytics on point scale data on real time basis, real time streams from IoT devices. It consists of input, output and query. The queries are basically based on SQL query languages for which it can be easily filtered, sorted, join and make aggregations over a period of time. The outputs of the transformed data can have one or several outputs. The uses of stream analytics are easy to use, reliable, flexible and can be scalable to job size. It is also very easy to start and require only few multiple sources to create pipeline end to end (Jason 2021).
- Edge Computing – Edge computing is defined as the information technology (IT) which is processed having the client data to make valuable business insight and have control over the business processes and operations which are critical. Now a days the business processes consist of very huge dataset. These huge data can be collected by different kinds of sensors and several IoT devices to make analysis in real time from any locations remotely from almost anywhere in the world. Thus, the ways to handle data is also changing day by day as the data get bigger. For this the traditional methods cannot be used as things get changed and becomes much more complex. Businesses are going though many problems as things get more complex and the data becoming as ocean size. As the data generated from different sources like factories, stores with several sectors and industries. In traditional ways the data is moved across WAN through internet by corporate LAN on which the data is saved and stored. The data size is getting so huge that it becomes not possible to store the data in the traditional data center for which this has shifted to logical edge of an infrastructure by moving the data to the point the data has been generated (Bigelow 2021).
- 5. Artificial Intelligence – AI is defined as the ability to digitalize computer and robot having solutions and outputs digitally which are computer controlled to perform the tasks that are associated with the intelligent beings. The development of artificial intelligence has allowed complex solutions having the characteristics of human beings, such as understand meaning, learn from past experiences, make reasons to different things or generalize things. It can program complex problems by the use of different mathematical theorems. The advancement is increasing day by day from which it becomes muc more speed and have more memory capacity but no such program has been created which have similarities with human flexibility. Some program has been able to make attain performance levels, for which it can create inferences of an appropriate situations which can be classified as deductive or inductive. Artificial intelligence has some problem-solving ability which has been scheduled and programmed categorized by the systematics search of range of possible actions such that the goal can be reached. The problem solving can be segregated into special purpose and general purpose. Special purpose is scheduled to solve a particular set of problem which is very much specific. The general-purpose technique is used for variety set of problems. The goal can be to pick up, move right, move left, move forward or pick down. There are diverse problems that has been solved using artificial intelligence problems (Copeland 2022).
- 6. Apache Spark – Apache spart is the framework for data processing that can quickly process tasks having large data sets. It also distributes the data processed across different computers on its own or by computed tools that are distributed. It also reduces the programming burden as it has easy to use API that take off much of the heavy work load. It has become one of the key big data distributed processing frameworks across the world. It has several ways that provides bindings for different programming languages like python, R, Java and Scala programming languages. Different big companies are already using it like banks, gaming companies, governments and several tech giants like Facebook, Microsoft, Apple, etc. One of the biggest advantages of using it is its speed. It can perform tasks one hundred times faster. Also, it is very user friendly to use with spark API. It is used as an interface for developers to create applications. It is focused to create processing structured data taken from python or R language.
- Car Manufacturer making lowest quantity of models is Aston Martin with 8 quantity and highest for General Motors with 127 in Quantities and followed by BMW with 119 in quantities.
- Highest average fuel economy by city
The highest average fuel economy for city is for Mitsubishi Motors Co which is 25.46.
Highest average fuel economy by Highway
The highest average fuel economy by highway is for Mazda which is 34.25.
Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
- Taking the combined fuel efficiency and finding the high and low average fuel economy for all transmission types. It is observed from the above bar chart that the highest fuel economy is for Semi – Automatic having value of 6848 and lowest fuel economy is for continuously variable having value of 348.
- Find out which car manufacturers have 4WD (4 wheel drive) and 2WD (2 wheel drive) models and whose engine power is more than 3.5.
To find which manufacturers have 4 wheel drive and 2 wheel drive models and engine power more than 3.5 it is required to filter the drive disc variable and selecting only categorical variable consisting of 4 wheel drive and 2 wheel drive. To get values which have values for engine power greater than 3.5 that is 3.6, it is required to show a scale which shows the scale value and selecting the scale value from 3.6. After filtering the drive disc categorical variable and engine power the final output is found.
It is observed that General Motors have the highest value with 4 wheel drive as 77.8, 2 wheel drive rear as 251.3 and 2 wheel drive front as 114.2. The lowest value is observed for Subaru.
- Critically evaluate the strengths and weaknesses of data analytics, using Tableau and other recommended tools (up to 3 tools) which are used for data analytics.
The strengths of data analytics are using Tableau are:
- It is used to make interactive visualization – Tableau is a very easy tool to make visualization just by drag and drop.
- Tableau is very easy to learn in comparison to other software’s like R or python programming. Also, there is a option for tableau to go live to make connection with other data sources like SQL.
- Tableau has the ability to handle huge number of rows easily in millions.
- Tableau has the different options to make visualization like heat maps, highlight table, pie charts, horizontal maps, stacked bars, tree maps, etc according to the requirement by the client.
- Tableau has the ability to use scripting language which includes R and python language to make complex calculations and avoiding performance issues. The data set can be cleaned using packages and loading the software by using python script. Some packages need to be installed as python is not the native language in tableau.
- Python is very much mobile friendly and the mobile apps are available for android and IOS which adds to the mobility feature to tableau which makes tableau available to users at their fingertips. Most of the functions and features are supported on tableau mobile app as desktop has.
- There are various online guides available which makes tableau very easy to understand and execute.
- Most important feature of using dashboards is making dynamic dashboards for devices in laptops and smartphones. Tableau figures out which device is seeing the report by the users and automatically adjusts to ensure that the report is properly delivered.
- Tableau is comparatively low cost in comparison to other big counterparts.
The weakness of data analytics are using Tableau are:
- Tableau does not have the option of automatic refreshing of the made reports through scheduling. So, all the things need to be updated manually which requires efforts when the client need to update the data at any point of time.
- The visuals need to be recreated and cannot be imported like Power BI on which any custom visuals can be imported easily.
- There is no way that all the fields can be implemented directly, it needs to be annually entered every time which is quite time taking.
- Tableau have limited data processing. It is mostly used for visualization of data set and interpreting conclusions from it. Tableau desktop allows basic preprocessing. Some of which are different kinds of joints, converting data types, adding some calculated columns, making parameters.
- Tableau is very much expensive when used for large organizations.
Python – python is a programming language which can be used for data analysis. It helps to make analysis quite easy because it is easier than other programming languages comparatively and very much interactive to use and easy to understand for the client. It imports the data set or also data can be created. Python can clean and prepare the data for analysis. It can manipulate the data using pandas’ data frame. Python can also be used for descriptive statistics and summarizing the data. It is also used to make machine learning models using scikit learn. Python is the alternate to R language which can be used as programming for data analysis (Van Rossum, G. and Drake 1995).
SPSS – SPSS stands for statistical package for social sciences. It is used by researchers to make several kinds of statical data analysis. This software has been created for management and analysis for social science data. SPSS is a tool similar to Minitab software where people who are not from technical background can also use the tool as what kind of output just need needs to be selected and all options are already provided, so no programming skills are required to make analysis. It is widely used across several industries and sectors due to this. Most of the top research projects are done using SPSS as the tools to analyze the surveyed data and most of the things are already delivered using SPSS (Hinton, McMurray and Brownlow 2014).
SQL – SQL stands for structured queried language. It is widely used across all industries and sectors as it can perform most of the task for data processing. It can be used for cleaning, analyzing the data that are stored in the data base and accessing the data. It is very easy and useful to learn for all data analysts as the codes are very easy to learn to have common words like and, or, as, having, select, from, group by, order by and may other. SQL can access large amount of data where it is directly stored. The analysis done on audit is very easy to replicate and audit. SQL can use multiple tables at the same time which makes it much more valuable. It can access data from other table using joint functions having the common element in both of them (Melton, and Simon 1993).
References
Anon, what is an in-memory database? definition and faqs. What is an In-Memory Database? Definition and FAQs | HEAVY.AI. Available at: https://www.heavy.ai/technical-glossary/in-memory-database.
Bigelow, S.J., 2021. What is edge computing? everything you need to know. SearchDataCenter. Available at: https://www.techtarget.com/searchdatacenter/definition/edge-computing].
Copeland, B.J., Artificial Intelligence. Encyclopædia Britannica. Available at: https://www.britannica.com/technology/artificial-intelligence.
LeClerc, B. & Cale, J., 2020. Big data. Amazon. Available at: https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/.
Pointer, I., 2020. What is Apache Spark? The Big Data Platform that Crushed Hadoop. InfoWorld. Available at: https://www.infoworld.com/article/3236869/what-is-apache-spark-the-big-data-platform-that-crushed-hadoop.html.
WHowell, J., Introduction to azure stream analytics. Introduction to Azure Stream Analytics | Microsoft Docs. Availableat:https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction.
MeLean, H., 2022. 10 Advantages of Tableau for Data Visualization. [online] Jacksonville. Available at: <https://www.newhorizons-jax.com/blog/10-advantages-of-tableau-for-data-visualization.
Van Rossum, G. and Drake Jr, F.L., 1995. Python tutorial (Vol. 620). Amsterdam, The Netherlands: Centrum voor Wiskunde en Informatica.
Hinton, P., McMurray, I. and Brownlow, C., 2014. SPSS explained. Routledge.
Melton, J. and Simon, A.R., 1993. Understanding the new SQL: a complete guide. Morgan Kaufmann.
Turn in your highest-quality paper
Get a qualified writer to help you with
“ Big Data Technologies And Their Use In Automobile Research ”
Get high-quality paper
NEW! AI matching with writer