******************** Monitoring P4 switch ******************** Monitoring of the P4 switch is multifaceted and can be done through many avenues. In particular we can identify the following categories: * P4 Tables configuration: representing the configuration of the various table of the P4 switch (SPEAD routing, PSR...) * Port health monitoring: giving information about the current state of every data port including but not limited to: * port configuration such as Speed (100G, 25G...), FEC configuration, Auto-negotiation * port status such as Up (T/F), enable * global statistics such as number of byte/packets received and sent through this port, number of errors on reception/transmission * number of packets/bytes per type of traffic (PTP, ARP, SPEAD, SDP...) * Live traffic monitoring: through the advanced telemetry we gain access and we can expose precise realtime traffic monitoring for SKA specific traffic * SPEAD monitoring (SPS to CBF) * SPEAD monitoring (CBF to SDP) * PSR monitoring (CBF to PSS/PST) * Tango Health State that follows the pre-defined SKA tango `Health State `_ The way to access this monitoring information is not yet fully automated but is accessible via a few Tango attributes and methods. Health Status ############# The Health Status is a Tango attribute that provides a deeper view the current states of various ports. In particular the health status reports: * Port Status * "Enable": if the port is enable, * "Up": if the port is Up (aka synchronisation is done on the transceiver), * "Speed": the port speed configuration, * "Rx": the number of packets received on this port, * "Tx": the number of packets sent from this port, * "FCS": the number of frames with FCS error, * "Rx_errors": the number of packets received on this port with errors, * "Tx_errors": the number of packets sent from this port generating an error, * "TX_PPS": sent packet per second (calculated with a period of 1 second), * "RX_PPS": received packet per second (calculated with a period of 1 second), * "TX_RATE": received bytes per second (calculated with a period of 1 second), * "RX_RATE": received bytes per second (calculated with a period of 1 second). This Health Status attribute is published every second so that EDA or any other monitoring systems would be able to display it and store them. Table Counters ############## Associated with the various routing tables, we have activated the recording of counters for every configured entry in the table. Counters in table are twofolds: bytes and packets counters. Those counters are incremented every time a packets that matches an entry in the table is received on any ports. Note that these counters are 32-bit based and therefore would roll over after some time. Currently those counters operate on a pull basis. Advanced Telemetry ################## The final types of monitoring relates to detailed traffic monitoring for the SKA. This monitoring is no longer operating on a pulling basis from ASIC routing table but rather done via pushing mechanisms by constructing telemetry packets within the ASIC itself. In particular, the advanced telemetry works as follows: * Update register counter for given SKA type packets (ska_a) * if counter = reporting packet number * calculate number packets since last report * calculate number of bytes since last report * prepare telemetry header * instruct ASIC to generate a telemetry packets to the Operating System * send telemetry packets In the tango device, we are leveraging the `eBPF `_ framework and in particular the BCC package to extract the information and update throughput information.