Per-Core Networking Stack
Overview
Networking applications use the Berkeley socket model to interface with a networking stack that resides in the operating system kernel. This model requires costly context switching between applications and the kernel, as well as memory copies on both the sending and receiving path. Context switches require the TLB and caches and can severely degrade instructions per cycle (IPC) for tens of thousands of cycles. This model imposes a limitation on performance which becomes even more apparent with the doubling of bandwidth of network bandwidth every 17-18 months, compared with CPU and DRAM performance doubling only every 26-27 months. For example The Memcached application spends over 80% of CPU time in the kernel networking stack, using less than 5% of the available networking bandwidth.
Applications using this model also suffer from lack of connection locality, as the kernel can process packets on different cores to the application. Multicore scalability is limited due to the lack of connection locality and synchronisation overhead from sharing networking state across multiple cores. To achieve multicore scalability different parallelisation techniques can be utilised such as a run-to-completion model where packets are processed on the same core, or a streaming model where application and network cores are separate and communicate using message passing. The streaming model has the ability to achieve parallelisation within a request, whereas the run-to-completion model attempts to improve temporal locality by processing packets as early as possible.
This research aims to evaluate the impact of using kernel bypass technologies listed above, to accelerate network bound applications. Some research questions include:
- The design of an efficient zero copy interface to replace the Berkeley socket model.
- The performance impact of dedicated networking and application cores, versus a run to completion model. What workloads benefit from each design.
- The overhead of the current kernel networking stack.
People
- Stephen Mallon, University of Sydney
- Dr. Guillaume Jourjon, Data61-CSIRO
- Dr. Vincent Gramoli, University of Sydney
News
- Paper accepted at ASPLOS 2018!
Publications
- Stephen Mallon, Vincent Gramoli, and Guillaume Jourjon, “Are Today’s SDN Controllers Ready for Primetime?”, IEEE Local Computer Networks 2016. Dubai, UAE.
- Stephen Mallon, Vincent Gramoli, and Guillaume Jourjon, “DLibOS: Performance and Protection with Network-on-Chip”, in ASPLOS 2018, the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems. March 2018, Williamsburg, VA, USA.