Imagine data collection even just a decade ago? Collecting data meant sending people down into the field to collect or observe data. It was not only tedious but very limited because a physical presence was a requirement. Product and services used to need to stand more on trial and error basis because data collection was basically quite costly, and nearly impossible from a cost factor. Fast forward to the entry of cloud computing coupled with all the other computational progress, the playing field gets a huge makeover.
And just with those, it has become very possible to innovate with data. Businesses have leveraged information about performance and operations for centuries to make decisions. Data used to be difficult to collect. Apart from that data analysis would take a long time, and is often incomplete because of the sheer volume of data. Nowadays, with technology, data can be collected, analysed and used at speed and scale much faster than before. The sophistication of today’s collection and analysis tools have brought speed and scale unmatched before.
Enter cloud technology. With cloud technology, data analysis can be done in real time, and in fact, we can do so much more because cloud technology puts data at our fingertips and at our behest! On top of that we can leverage machine learning and artificial intelligence to produce more and quicker, yet meaningful insights.
What has data got to do with digital transformation?
These days, data is power. Thanks to the internet, businesses have more access to data like never before. Businesses have access to internal information, which is collected within the organisation, and external information, which is gathered with the help of the internet, AI and machine learning. Put together, these get to bring the best of both sets of data and bring growth to their business. And that is why we are seeing some businesses like Amazon growing at dizzying speeds and scaling dizzying heights.
Data can be collected these days even in the unlikeliest of places. Sure you have your usual internal information like maintenance and operation reports, financials, personnel data, customer feedback, etc. But these days, it can be from digital interactions such as a length of time a user spends on a web page, how they navigate through the page or reaction to a social media post. These are totally new and very rich sources of information about human behaviour, all very useful, for example, in helping marketeers understand ways to approach their clients or advertisers craft their advertisements for maximum impact.
Capturing, analysing and leveraging this data highlights the power of digital transformation that is able to help to unlock more business value because it enables businesses to unleash the exact measures needed to make a product or service work. However, with such huge quantities of data that is being collected, there are quite a few obstacles that need to be overcome in order to leverage the value the data, like:
- Processing big volumes and varieties of new data.
- Finding cost effective solutions.
- Scaling resource capacity based on needs.
- Accessing historical data.
- Connecting the dots between data.
Additionally, businesses will need to spend quite a bit on infrastructure, like data centres to store and process the data, which was very costly. Cloud technology will be a game changer! So how does cloud technology actually help manage and innovate with data?
How cloud technology can help to manage and innovate with data?
With cloud technology, businesses can store data on an offsite data on an offsite server that is typically owned and overseen by a vendor, like Google Cloud. There are many benefits of leveraging on cloud technology:
- Scalability – Can scale according to needs.
- Automation – Automate where there is manual overhead.
- Integration – Bringing together data points and platforms fragmented across the business’s ecosystem.
- Speed – Process terabytes of data in real time and run queries at its request to retrieve and use data instantly.
- Lesser downtime – Resources are distributed across a global network, and creates resilience against data loss or service network.
Example: A retailer store
Traditionally, retailers have access to data about their stores like stock levels, items purchase and average spend on customer. With IoT devices, retailers are now able to capture a more nuanced in-store customer behaviour to improve customer experience.
For example, take security cameras in retailer’s stores. Retailers use security cameras for one main purpose, that is to detect criminal behaviour. What retailers can actually do further, is to mine the data to generate insights on customer retail footpath, sentiment and dwell time. This would enable the retailer to correlate data on shoppers behaviours and can use it to provide a better customer experience, for example, better security, operational efficiency, and also help the company focus on growth.
The same can be achieved with loyalty cards which can be linked to individual customer purchase behaviour.
How do you start?
So, how do we get started? The ideal starting point is to identify and map your data. A data map is a chart of all the data used in end-to-end business processes. What to include as a data point in your data map?
Understand your data
Generally, there are three sources of data – (i) User data, (ii) Corporate data and (iii) Industry data. The aggregated data usually comes in two types – (i) Structured data, and (ii) Unstructured data.
3 Sources of Data
This is basically data from customers who use or purchase your services or products.
Let’s say you run a chain of retail stores. If you aggregate purchase data from all your stores, you have a dataset – transactions. And you have others like, item returns, football, etc.
Corporate data is more operational kind of data.
Let’s take the same example of running a chain of retail stores. Your corporate dataset would be like staffing levels in each store, overall sales performance, how many people in the fitting room vs at the cash register, etc.
Industry data is data that is found outside an organisation that everyone in the sector needs to view or access to gain knowledge about a specific domain. This includes trends, purchasing patterns, publicly available research papers, etc.
2 Types of Data
|Structured data||Unstructured data|
| – Highly organised.|
– Quantitative data.
– Usually text only.
– Easy to search.
|Characteristics|| – No organisation.|
– Qualitative data.
– May be in text, images, sound, video or other formats (e.g. Binary Large Object “BLOB”).
– Difficult to search.
| – Relational databases|
– Data warehouses
|Storage|| – Applications|
– NoSQL databases
– Data warehouses
– Data lakes
| – Dates|
– Phone numbers
– Credit card numbers
|Examples|| – Images |
– Email messages
– Audio files
– Video files
Traditionally, unstructured data is hard to analyse. However, cloud technology changes this. With the right cloud tools, businesses can extract value from it by using an application programming interface (“API”), to create structure.
|API – API is a set of functions that integrate different platforms with different types of data so that new insights can be uncovered.|
Making sense of the datasets, data points and data map
As you add more and more datasets into your list, you need to make sense of them. You can do this by playing with the interactions between your datasets.
- How to make your data actionable?
- What insight are you looking for?
Experiment with different datasets, and see whether the insights make sense.
Human bias can also influence the way datasets are collected, combined and used. It’s always important to include strategies to remove unconscious biases as you start to leverage data to build new business value.
Just bear in min that handling volumes and diversity of data comes with its own ethical considerations. This requires alternative ways of thinking about security. Not all information that can be captured, should be captured.
Understand data storage solutions
To get meaningful insights from data, data must be stored centrally. There are quite a number of solutions for data storage – (i) cloud databases, (ii) cloud data warehouses, and (iii) cloud data lakes.
|Data management priorities:|
– Data integrity – accuracy and consistency of data stored
First, what is a database? A database is an organised collection of data generally stored in tables and accessed electronically from a computer system. They are the simplest to create and SQL can be used to query and report on the data.
By using databases, it helps companies better keep track of their basic transactions, provide information that will help the company run its business efficiently, or help management to make better decisions. If there’s an error, databases allow businesses to roll back transactions to see data history.
For example, supposedly a customer goes to an ATM to check their account balance, and finds that it’s not the same as shown on the mobile app. The bank needs the ability to roll back the transactions to identify the source of the problem. Perhaps the ATM is broken, or there’s a bug in the app. This rollback functionality protects the bank from fraudulent claims and protects the customer too.
Google offers quite a number of database services.
One of them is called Cloud SQL. Cloud SQL is a fully managed Relational Database Management Service (RDBMS), and is built on the performance innovation in Compute Engine. It easily integrates with existing applications and other Google Cloud services like Google Kubernetes Engine and Big Query.
This tool is especially useful for databases that serve websites, for operational applications for e-commerce, and to feed into report and chart creation that informs business intelligence.
Another is Cloud Spanner. It’s designed for global scale. Data is automatically and instantly copied across regions. This means that if one server in a region goes down, the organisation’s data still can be served from another region. It will also mean that queries always return consistently in ordered answers regardless of the region.
Furthermore, this tool provides enterprise-grade security. Hence, making it ideal for organisations that want scalability for their databases. The key takeaway is that it’s fully managed. This dramatically reduces the operational overhead needed to keep the database online and serving traffic.
An example of a company that uses this is Spotify. Spotify is a Swedish audio streaming and media services provider, whereby they always need to ensure that its customers get consistent service. Spotify holds information about the objects it stores, or metadata. By having them on the Cloud, Spotify is able to maintain strong listing consistency, with the option to scale easily.
Cloud data warehouses
While databases store transactional data in an online fashion, data warehouses assemble data from multiple sources, including databases, in a centralised place. Data warehouses have one up against databases, that is it can rapidly perform analysis of large and multi-dimensional datasets. Furthermore, it can consolidate data that is structured and semi-structured.
For example, let’s take a business in the hotel industry. Suppose they want to improve their service and ultimately deliver better customer service to its guests. They would need to combine multiple datasets to uncover meaningful insights. Problem is, the data that they have are in both structured (e.g. type of room booked by guest, number of guests, etc.) and unstructured data (e.g. customer posts on social media platforms, etc.). When combined with connector tools (e.g. Pub/Sub, Dataflow, etc.), data warehouses can transform unstructured data into semi-structured data so that it can be used for analysis.
Organisations usually invest in building data warehouses because of its ability to deliver business insights from across the company, and very quickly. This helps when an industry is sensitive to quick changing trends.
BigQuery is Google Cloud’s leading data warehouse solution. It’s a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS), and has built-in machine learning capabilities to boot.
An example of a company that uses this is BUENO Systems, a company based in Australia. BUENO is a Software as a Service (SaaS) company, that helps businesses meet their sustainability goals by improving building systems. When you think about maintaining a building, you think about various networks that control and operate the facilities they contain (e.g. air conditioning, lighting, security systems, etc.), and they usually exist in their respective silos with no link between them.
By using BigQuery and connectors such as Pub/Sub (service for real-time ingestion of data) and Dataflow (service for large scale processing of data), BUENO Is able to bring the unstructured data into the Cloud and transform it into semi-structured data, and use it for analysis. This functionality enables BUENO to unlock new insights about their customers, do preventive maintenance and more!
Cloud data lakes
A data lake stores structured, semi-structured and unstructured data, supporting the ability to store raw data from all sources without the need to process or transform it at that time. Structure is applied when the data is needed. This is ideal especially to data scientists and data analysts as it will enable them to create new data models on the fly.
Incorporating business intelligence solutions
The challenge that businesses face is identifying the right business intelligence solution.
Businesses need to be able to serve data in the form of insights at scale.
Looker is a Google Cloud business intelligence solution. It’s a data platform that sits on top of any analytics database and makes it simple to describe your data and define business metrics. That means, once you have a reliable source of truth for your business data, anyone on your team can analyse and explore it. They can ask and answer their own questions, create visualisations, and explore row level details.
Incorporate Artificial Intelligence (AI) and Machine Learning (ML)
When you mention AI or ML, people often think that they are too complex and not accessible by anyone outside the data engineering or data analysis teams. They also assume that it needs to be customised to your needs, and will be very costly.
|What is artificial intelligence? |
It’s a broad field or term that describes any kind of machine capable of acting autonomously.
|What is machine learning? |
It’s a type of artificial intelligence that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict the future.
Why should businesses leverage on ML now? It’s because it can create new business value when it learns from data by replacing rule-based systems, identifying processes to be automated, understanding unstructured data and personalising applications.
What do I mean? Take rule-based systems for example. Previously, back in the initial days, Google Search used rule-based systems to decide what to show a user. So, let’s say you are searching for the Giants sports team, but hey wait there’s two, the San Francisco Giants and the New York Giants, which should Google Search show? Google used hand-coded rules to decide which sports team to show a user depending on the area they searched from. If the user was in New York, it would show results on the New York Giants. If the user was in San Francisco, it would show results on the San Francisco Giants. If the user was anywhere else, it would show results about tall people.
With ML, Google was able to develop RankBrain, Google’s deep neural network for search ranking. It outperformed many human built signals and Google was able to replace many of the hand-coded rules. The neural network ended up improving search quality dramatically, and is still continuously improving itself based on new user queries and new user clicks.
The reality is that ML is more accessible now than ever before. Google has quite a few solutions ready to use out of the box.
Google Cloud AI Platform
This is a unified, simply managed platform that makes machine learning easy to adopt by analysts and developers. It provides modern ML services, with the ability to generate tailored models and use pre-trained models.
TensorFlow has a comprehensive, flexible ecosystem of tools, libraries and community resources. It lets researchers push innovations in ML and developers to easily build and deploy ML powered applications. TensorFlow takes advantage of Tensor Processing Units (TPU), a hardware that is designed to accelerate ML workloads by 15-30x. Because you pay only for what you use, there’s no upfront capital investment required.
|Data Quality Issues for ML|
Accuracy of ML predictions depend on large volumes of data that covers the scope of a problem domain and all possible scenarios it can account for. That means all possible input and output data.
(ii) Clean or consistent
Data is considered city or inconsistent if it includes or excludes anything that might prevent an ML model from making accurate predictions.
This refers to the availability of sufficient data about the world to replace human knowledge. Think of this as the various data categories or themes that help complete a user’s profile.
The AI Hub
Google also has a hosted repository of plug-and-play AI components, including end-to-end AI pipelines and out-of-the-box algorithms.
APIs are simple methods and tools to connect various applications. They can be deployed in a virtual private cloud, on-premises or in Google’s public cloud. This allows developers to quickly and easily train custom models regardless of their level of ML experience.
Innovating with data is more important than before, especially in today’s competitive landscape. With cloud technology coupled with AI and ML, businesses can now ingest, analyse and use that data at a speed and scale that wasn’t possible before. Businesses can leverage on that to deliver more value to its customers. But what’s more important is data which used to be difficult to collect, collate and analyse is now a game changer because of cloud computing coupled with AI and ML.
This is part of a 4 part series on Google Cloud Digital Leader Training Professional Certificate course available on Coursera:
– Introduction to Digital Transformation with Google Cloud
– Innovating with Data and Google Cloud
– Infrastructure and Application Modernisation with Google Cloud
– Understanding Google Cloud Security and Operations
Bonus – Decision Tree
A decision tree I made in deciding which solution is best suited for your needs.