Bridging the Observability Gap • The Registry
Sponsored In modern computing, visibility is everything. IT administrators and Site Reliability Engineers (SREs) survive because of their ability to see what is going on in their systems. Unfortunately, as systems get more sophisticated, it has become more difficult to see what they are doing. This is why the industry promotes observability as an evolution of existing concepts such as monitoring and metrics. Suppliers are multiplying the tools to fill a growing lack of visibility.
What is observability?
IT departments have been monitoring infrastructure and applications for decades, so isn’t “Observability” just a marketing term for what is already good practice? Not according to Dhiraj Goklani, regional vice president of IT and DevOps, APAC at Splunk, which recently launched its Splunk Observability Cloud.
âWe have had a massive transition to cloud infrastructure and containerized applications as organizations dramatically accelerate their digital and direct-to-consumer initiatives,â he explains.
In the past, observers could easily scrutinize computer operations by running programs that directly monitored the operations of their servers. The apps were monolithic, so you point monitoring software to them and log the results. Things got a lot more difficult when companies started to ignore everything and make it more distributed. When they started running everything in the cloud, they relied on the monitoring services of cloud service providers. When they distributed operations across multiple cloud companies, as well as their own on-premises solutions, things got more disjointed.
Composable apps have made things even more difficult. For years, companies have struggled to atomize their applications, building them from smaller, more manageable parts. Microservices and the containers that run them ultimately bought the practice out of the mainstream, providing development teams with modular applications with things they could update individually. The downside to this approach was that these smaller pieces, running on a more abstract cloud infrastructure, became more difficult to monitor.
Developers now accessed these services through APIs when they assembled them to create new applications. These interfaces also had to be monitored as part of the overall end-to-end journey. Existing monitoring tools are often siled, designed to examine parts of the domains of the stack or infrastructure. They can’t handle complex, disjointed workflows. It’s the void that observability tools fill by bringing together applications and infrastructure into a single end-to-end view across all levels of the stack.
Take note of the traces
This common sight requires better telemetry, explains Goklani. Splunk has cut its teeth by developing tools that take machine-generated log analysis to the next level, allowing IT administrators and SREs to make sense of reams of operational data. Observability combines them with metrics that summarize performance and availability. These forms of operational data are well understood, but there is a third type that is essential in modern composable cloud infrastructures, Goklani explains: traces.
Traces document the interactions between the thousands, if not millions, of microservices that work together to meet an application demand. These small pieces of code are typically duplicated for a mix of scalable functionality and resiliency.
When a user of a microservices-based application logs into a web application, changes their account configuration, talks to a support technician in a chat window, searches for products, and compares features, they are touching issues. thousands of individual software. Check their cart and then pay with a plus key payment. This is more difficult to follow in an atomized microservices infrastructure where many individual pieces of code work together, often on infrastructure owned by different entities.
“I call traces of footprints in the sands of time”, quips Goklani. âThey trace my journey through all these microservices in the background. If my transaction fails, anyone in the Global Services team can understand why this happened, for me in particular. I might have a connection. slow here in my network, or it could be something in the back that caused it. “
GigaOm’s Cloud Observability Radar, which analyzes 14 competitors in the observability space, highlights the OpenTelemetry project as a key initiative in this space. This project, to which Splunk is a main contributor, is an open source observability framework managed by the Cloud Native Computing Foundation. It merged two existing projects, OpenTracing and OpenCensus, which respectively offered standard APIs for collecting traces and for application behavior metrics.
The GigaOm report also names Splunk as the only outperformer in the area of ââobservability. âSplunk has become a leader in the field of observability with strategic acquisitions and the development of targeted organic solutions. “
Splunk began in 2003 with an offering to distill large amounts of machine-generated data from technology infrastructures. Since then, it has expanded its product portfolio to cover aspects of IT ranging from security to IoT. Today, it monitors a wide range of infrastructure and application performance data while giving IT administrators and SREs an in-depth dive into system logs for forensic investigations.
GigaOm ranked Splunk # 1 among other vendors due to a wide range of integrations and a rich set of back-end features. âSplunk ingests total fidelity data from all sources (logs, metrics and traces) across the stack,â said the industry specialist. âIt also offers massive scalability, sophisticated in-stream analytics, and native support for OpenTelemetry. “
Splunk Observability Cloud brings together several components of the company’s existing portfolio under a single interface that it says makes it easier for IT administrators and SREs to create these end-to-end views. These cover infrastructure and application performance monitoring, as well as log observations. It also includes the Splunk Real User Monitoring product, which captures and measures user activity using browser interactions with back-end resources to get an accurate picture of user experiences in the real world. The full product also includes Splunk On-Call, which forwards any emerging issues discovered to the appropriate members of the incident response team.
Splunk Observability Cloud also offers a new product, Splunk Synthetic Monitoring, which allows IT administrators and SREs to script interactions to test the performance of different types of interactions. This complements its other tools with the ability to monitor common interactions with critical applications at set intervals, quickly bringing up any issues. Goklani also highlights its ability to test API interactions.
Splunk Observability Cloud is one of three such suites from Splunk. Its siblings are the IT and Security Cloud products. They can target different use cases, but they have one thing in common: a new pricing model. Splunk traditionally priced its services based on the amount of data ingested by customers. The suite-based approach replaces this with what it calls entity-based pricing. This charges products based on different types of infrastructure units. This can be an IP address or an individual user. The company defines these units based on the suites that the customer uses.
It’s possible to buy all the products separately and enjoy automatic integrations as they are discovered, Goklani explains, but the advantage of unifying them all under one interface is that they make it easier to track. and the exchange of workflows.
âIn the past, IT and DevOps teams took a pivotal approach, going from tool to tool, correlating their information and writing custom solutions,â he explains, that this was the only way to get a unified view. “Now we have provided a unified interface to consume all of this data.”
A single interface helps DevOps and IT teams maintain service availability, he says, protecting processes businesses simply can’t let go. It also helps IT administrators and SREs to maintain the performance of services spread across different applications and infrastructures by quickly bringing issues to light and allowing different team members to analyze and resolve them using the same. tool.
How observability supports DevOps
Nowadays, the IT administrators and SREs who spot and manage these issues are just as likely to be the people who developed the code. Goklani explains that collecting tools in an observability suite supports DevOps disciplines that put software engineers at the center of the software services lifecycle. Now that developers have the underlying cloud infrastructure as well as the code itself, they need a way to get full visibility into how it works and make it reliable, performant, and secure. âObservability helps monitor the DevOps lifecycle,â says Goklani.
Better observability translates into concrete results. Lenovo has started using Splunk Observability Cloud to manage its e-commerce business. It implemented the tool to collect data from its e-commerce systems and identify emerging issues. It reduced the time needed to recover from a system failure to five minutes instead of half an hour.
The sequel has also proven useful in dealing with an unexpected increase in demand in the midst of a pandemic. Lenovo expected its traffic to increase on Black Friday 2020 after introducing pricing incentives and gaming product giveaways, but it did not expect traffic to increase 300% more than the year. previous. Splunk Observability Cloud has helped the company maintain 100% uptime, executives said.
Splunk’s pricing change and the bundling of products into one Splunk Observability Cloud go hand in hand and represent a sea change for a company that has built revenue by the byte. It’s Splunk’s offering to get customers to do more with its products, pushing it further into their environments with a portfolio it wants to be ubiquitous.
Sponsored by Splunk