Network-Centric Performance Management – The Discourse Continues …

Well, it's time again to continue the discourse on performance management that we began earlier this year, and that involved our highly-successful panel on End-to-End Performance Management & Monitoring in July 2012

Having looked at the relationship between the application and the network, or between the OTT provider and the infrastructure provider, it was natural to turn our attention to the relationship between adjacent network segments and between different operators that make up the end-to-end  network between the user and the data center. Namely, the access, metro, middle-mile, and last mile providers.

The reason is that for any operator today, it is not enough to be able to manage just their segment of the network (access, middle-mile or last-mile). Rather, with the diversity of applications and services, and the growth in mobile broadband in the first mile and data centers in the last mile, it is imperative for an operator to have cross-domain insights. 

So, an operator must not only know the performance of the network (latency, jitter, delay, loss, throughput/bandwidth, and availability) locally, but also how these parameters are affected by adjacent network segments, and how the network behavior, as a whole, affects application performance. While much has been discussed in various forums about application performance, there can be no application performance without sound network performance!

Thus, network performance or the monitoring of the activities that are needed to keep the network up and running, reliably and efficiently, are critical, especially in today’s dynamic networks. E.g. sync. distribution in 3G and 4G/LTE mobile backhaul networks, impairment detection, and fault isolation & notification to name a few.

In this discourse, we will, therefore, discuss end-to-end performance management & monitoring with a focus on network-level problems and measurements. That is, we will consider the (horizontal) interactions between providers in different segments – access, middle-mile and last mile, and issues of sharing metrics across operator boundaries, which metrics are critical for network performance and why, and at how performance problems can be fixed efficiently and speedily.

It turns out that while there are many tools and techniques for performance management (advanced OAM, better sampling & aggregation methods, synthetic transactions, support in vendor systems, and sophisticated software), it is still very tricky to answer fundamental questions such as:

  • Where did a performance problem originate?
  • Who is responsible for it?
  • How do we go about fixing it?

The current initiative has again brought together key players from the carrier eco-system to discuss, debate, and answer important questions on network-centric performance management, such as:

i)                    What are key advances in performance monitoring of fundamental network-level activities that enable the proper & efficient running of a network?

ii)      Which advances are designed to enable proactive performance management, as opposed to reactive performance management?

iii)    Where are we on the ownership and sharing of performance data – how could operators in different segments share this data (without revealing internal details)? What is making this a necessity in today’s environment? What changes are occurring (in operator practices, vendor offerings, and software solutions) to facilitate that?

iv)    What is the role of standardization in enabling the industry to converge on performance management capabilities?

v)      What are the advances in real-time collection and processing of data, and how do they aid performance management in today’s complex IP/Ethernet networks?

vi)    With network complexity and scale, automated network enforcement actions could be valuable. However, they are perceived as complicated and risky. What is the eco-system doing to facilitate these? Why or why not?

vii)  How does fault isolation across boundaries work? How is fault notification done? How does one determine and assign “responsibility” to the liable network segment/operator?

viii)                        Are some of the ensuing difficulties merely organizational? Or, are there limits from software and systems? If the latter, what is the eco-system doing to remove those?

ix)    What is the contribution of the network (or a network segment) to application performance? How does advanced system design enable an operator to better determine/track that?

In subsequent postings, we’ll be sharing some of the discussions that have ensued between the players in our Roundtables so far, and some of the open-questions that the industry, as a whole, must work on.

In the meantime, what is your view? Do you think the questions above capture the open issues well? Are there some we’ve missed out that you’d like to add to? Or, perhaps you think this isn’t as much of an issue.

The companies cooperating in this initiative are:

