How the Modern Data Stack Is Reshaping Geospatial Data Analysis
Authors: Juan Calvo Ferrándiz, Data Engineer and Yue Zou, Machine Learning Engineer
Geospatial Data Analysis in the Current Data Ecosystem
|Modern Data Stack: a collection of tools and cloud data technologies used to collect, process, store, and analyse data.|
The adoption of the Modern Data Stack (MDS) has been driven by the shift to the cloud and the unlimited availability of storage and computing power, as well as the need for more efficient, flexible, and faster ways of working. The MDS is facilitating democratisation and accessibility of the entire data lifecycle process, from ingestion into the data lake, through modelling in a data warehouse and then to the end users.
This has empowered the use of data and analytics, making data consumers able to easily discover, understand, and use data. With the MDS becoming more prevalent, Geospatial Data Analysis is developing and exploring its place within the MDS.
Defining Modern Geospatial Data Analysis
Modern Geospatial Data Analysis is the use of data to produce visual deliverables to make complex information understandable and manageable. This enables business decisions to be made based on geospatial data.
In Geospatial Data Analysis, it is common to leverage a range of technologies, such as satellite imagery, coordinate systems, GPS, and IoT sensors. This analysis can be used to present historical changes, as well as current changes in real-time. Furthermore, insights that may be overlooked in a table are revealed in easily recognisable visual patterns and images.
Modern Geospatial Data Analysis is different to a traditional Geographic Information System (GIS). The most significant difference is that modern Geospatial Data Analysis is a much broader area of focus that includes the data, technologies, concepts, development, application, and practices for analysing and deriving value from georeferenced data.
Current Uses of Geospatial Data Analysis
Our world is changing more rapidly than ever before, leaving us with less time to respond to emerging events, often on a global scale. Applications of Geospatial Data Analysis have always been closely relevant to our daily lives, and the two are becoming increasingly intertwined as a result of these trends.
Weather forecasts, for example, are one of the most prevalent practical applications of Geospatial Data Analysis with meteorological data. The recent record-setting heat waves have made the effects of global warming increasingly evident, and we must use the right tools to monitor and study climate change to prepare for the subsequent impacts.
Geospatial Data Analysis is also essential for a wide range of business operations besides its environmental applications, such as mapping floods and forest fires. Here are some examples of how Geospatial Data Analysis brings value to different sectors.
The UK Health Security Agency has developed an interactive map that shows Covid-19 case rates across the country. This data influenced the measures taken to combat the spread of Covid-19 and demonstrated the effectiveness of Geospatial Data Analysis as a way of clearly conveying geographic data statistics for ongoing trends.
Fig 3: The Covid-19 case rate in and around London between 22/04/2022 and 26/08/2022 with higher case rates indicated by darker colours
Geospatial Data Analysis also helps governments to plan for infrastructure development, such as roads, railways, schools, hospitals, and power and water supplies. The World Bank has created the Africa Electricity Grids Explorer, making future decision-making easier as the existing installations are already captured and visualised.
Fig 4: Existing and planned electricity grids in Africa and the Middle East
Retail & Real Estate
For retail businesses looking to open new stores, or property developers searching for their next site, Geospatial Data Analysis can provide significant insight into site selection. For example, businesses can find the best locations by studying public data such as population density, demographic profile, distance to public transport or car parks, nearby amenities, and cost of land, in conjunction with internal business data such as proximity to warehouses and existing branches.
Geospatial Data Analysis also helps businesses in supply chain management, site selection of depots, as well as route optimisation. The emergence of digital twins is also modernising the sector through real-time realistic simulations to plan operations, manage uncertainties, and mitigate environmental impact.
Managing distribution networks is a key part of the operation of utility companies, and Geospatial Data Analysis helps here too. For example, laying network maps on satellite imagery or vegetation data simplifies the process of estimating the proximity of trees to power lines to identify potential risks of an outage and subsequent wildfires. Weather and climate forecasts can also help to predict upcoming interruptions to the network and enable early intervention.
Agriculture & Farming
Information such as soil conditions, crop health, and precipitation, which are all available using geospatial data, facilitate real-time monitoring and timely decision-making.
This has become known as “precision agriculture”, defined as “a management strategy that gathers, processes and analyzes temporal, spatial and individual data and combines it with other information to support management decisions according to estimated variability for improved resource use efficiency, productivity, quality, profitability and sustainability of agricultural production”.
How to Become a Geospatial Data-Driven Business
In the next section of this blog, we identify four main business targets that need to be met for organisations to use geospatial data effectively and six capabilities of modern Geospatial Data Analysis that will enable them to achieve these goals.
Goals for Geospatial Data Analysis
To get the most value out of geospatial data, there are four main targets for businesses to aim for. Effective modern Geospatial Data Analysis requires:
- Automating scalable and flexible solutions
- Freeing geospatial analytics workloads from legacy constraints
- Creating accessible resources across all teams
- Being able to support data consumers by generating clear insights
This set of goals allows us to look at Geospatial Data Analysis in a more structured way with clearer outcomes for businesses.
In the next section, we’ll go into more detail to demonstrate the processes and policies that support these overarching goals.
Using Modern Geospatial Data Analysis
The four goals above help to define the way that businesses use modern Geospatial Data Analysis. In this section, we’ll look at how modern Geospatial Data Analysis goes beyond traditional GIS, as well as six important areas of focus for businesses wanting to use geospatial data effectively. These all help to achieve the four main targets outlined above.
A fantastic feature of modern Geospatial Data Analysis is scalability. It allows us to go from one computer to serverless processes; it permits scaling horizontally, in several nodes, and vertically, adding more power to the machines.
In traditional GIS, geospatial data preprocessing is workstation-oriented. Libraries such as GDAL are memory constrained and limited to a single machine. Desktop GIS tooling is not designed for easy scaling. Also, ingestion processes become complex and unmanageable, with low levels of automation.
Using the cloud allows businesses to avoid initial infrastructure investment costs, and to scale their required infrastructure easily. This flexibility also increases cost-effectiveness, where businesses only need to pay for what they use.
A prominent technology in this regard is GeoBeam. GeoBeam adds GIS capabilities to our beam pipelines by allowing us to run GDAL and similar geospatial libraries. This has enormous potential when running on Beam, an open-source, unified model, for defining both batch and streaming data-parallel processing pipelines.
Traditionally, GIS tools have a high degree of interaction through user interfaces. This increases visibility and should always be present in the early stages of a project. However, this capability is sometimes lost when working with large amounts of code.
In modern Geospatial Data Analysis, code can, and should, be version-controlled and well-governed. This allows Geospatial Data Analysis to work well alongside other aspects of the Modern Data Stack. Earth Engine Code Editor is one tool with both capabilities.
Furthermore, testing, deployment, infrastructure, data transformations, and quality checks can be automated, permitting data scientists to do this process once and then focus on creating code that provides insights, without spending time on repetitive manual steps.
- Open Sharing of Data
While security is extremely important, it is sometimes useful to have an open sharing of knowledge and information to enable collaboration at scale. For example, the required code and data need to be accessible to the relevant teams within a company. This sometimes extends to the global community when developing open-source tools or datasets.
Previously, open-source geospatial services had a high level of complexity to maintain in an enterprise environment. Businesses needed a dedicated team of users and IT professionals to use them, sometimes in conjunction with other commercial tools.
Now, this is much easier. Cloud providers offer the scalability, support, security and service-level agreements that many large enterprises need while enabling modern geospatial tools and services to be securely used at scale.
Traditional GIS is well known for having heterogeneous data sources and formats. However, this data is often in a particular context using systems and tools with a high level of specialisation. This means GIS teams tend to be isolated, receiving requests from other teams and then returning the results and deliverables.
Modern geospatial tools focus on being able to exchange data and easily share information. They aim to create cross-functional teams, enabling both technical collaboration and the delivery of high-value solutions. For effective Geospatial Data Analysis, we must remove traditional barriers between data scientists, data engineers, and business analysts to enable better collaboration.
One example of this is spatial SQL. Spatial SQL is an interoperable language for working with spatial data that enables transformation workflows, Geospatial Data Analysis, and application development. It provides geospatial data users with the ability to work in the same way as other data users, eliminating traditional silos between geospatial teams and other areas of the business.
- Single Source of Truth
A Single Source of Truth (SSOT) is a type of data solution that provides users with a complete view of all the data available to them. This means the entire team can act with greater confidence when all their data has been confirmed in one central location. Data users at all levels can now propose new strategies and support their ideas with access to geospatial data, increasing the potential for innovation, as well as trust in company-wide data.
Data should be stored at its lowest granularity in a data lake within the data warehouse, allowing everything to be rebuilt at any point from the data lake layer and create the same output. This enables teams to have better performance through faster and more accurate calculations.
- More Deliverables
With GIS, businesses are traditionally stuck with only certain data formats such as web maps, story maps, and static images. However, a Modern Data Stack provides a new catalogue of deliverables, such as dashboards, datasets, APIs, reports and layered maps. This allows for geospatial solutions to be integrated more widely across the organisation and play a larger role in solving problems and producing insights.
As technologies and tools become more advanced, the world of modern Geospatial Data Analysis is evolving rapidly. By understanding its four main goals, data users can begin to effectively implement it into their businesses. In this blog, we have also presented six key points that enable businesses to achieve these goals.
These areas of focus will be the foundations on which modern Geospatial Data Analysis will evolve. It is a fascinating time and never before has geospatial data analysis brought so much value in its potential use cases.