Thank you for the sharing of your exploration of these interesting devices. First I would like to get more context on the setup. According to you setup trace file. There are two APs (QuantennaCom, TPLink AP) and two terminals in this test (RaspberryPi, Apple), right?
The first issue I found interesting is, "It should be noted that here that the initial transmit of the key exchange message 2 is unsuccessful (packet 180) as a second transmit of this message is seen in packet 185". I check the radio parameters of these two packets, actually they are same. Moreover, SNR=49dB, RSSI=-42dBm, both are in almost perfect radio condition, but I guess maybe the receiver failed to receive it. I doubt we can blame the 1m distance (signal is too strong and cause the ridiculous saturation) because here RaspberryPi was receiving the packet of RSSI=-27dBm (much higher than -42dBm).
The second issue I found the power saving issue, I doubt the properly communication never setup properly between QuantennaCom and RaspberryPi. This doubt is based on the following considerations,
1.
You mentioned "Looking at packet 249 we see a QoS NULL frame with the ‘power management’ bit of the frame control flags field set. This indicates the device is entering power save, and packets directed to this client should be buffered until the client wakes up. "However, just after aknowledgment packet 250, RaspberryPi start transmitting packet 252, then packet 253.
2. You mentioned "An interesting sequence in the trace is packets 330→337. Here we see a number of packets being transmitted to the RPi and subsequently retried due to lack of acknowledgement by the RPi.", but here it's Apple device is transmitting packet to RaspberryPi. Are they among the same BSSID? Who is the AP and who is the station? They are using directWiFi or triggering "from DS" and "to DS"?
I haven't finished your publication yet. They are really insightful, and I expect more discussion with you in the future. Thanks again.
Hi Wenzen, thank you for your insightful comments!
First to clarify the setup. In the trace I provided the TPLink AP should be ignored - it's on the same channel but not involved in the over the air exchanges. It is connected to the same Ethernet backbone and provides the routing function in my network, which is why you may see packets from this source address both directly OTA (ie, beacons), as well as transmitted via the Quantenna BSS. The two terminals in the test are the RPi5 (connected wirelessly via the Quantenna BSS), and an Apple desktop computer connected via Ethernet to my backbone network, to which the Quantenna AP is also connected.
Regarding the signal saturation issue, thank you for clarifying this. Some other thoughts pop up as to what could be happening:
1. Given we don't see any MAC retry of the key exchange packet 2 (which would be expected if an ACK/BACK was not received), then the packet could have been acknowledged at the MAC layer then lost on the receive path on the Quantenna device. In this case the sniffer may not have picked up the ACK frame for some reason.
2. The RPi5 device is not correctly performing the MAC transmission - ie, the Quantenna device misses the initial transmission and the RPi does not do a retry. We see in the QoS Control field of packet 180 that the packet DOES expect an ACK (Ack Policy bits in the QoS Control field). Perhaps this is by design or an error in the RPi5 WiFi?
Regarding packet 249, this is the entry point for power save, and the RPi can transmit at any time after this to indicate wakeup - this is what happens in packet 253 which is the corresponding wakeup QoS NULL (directed 'To DS' to the AP). Packet 252 is a multicast packet transmitted on behalf of the RPi5 by the AP - note the 'From DS' bit is set, and the packet comes directly after the DTIM beacon (packet 251, DTIM count is zero). The directed transmit of this packet from the RPi to the AP is in packet number 244.
On the packet sequence 330-337, the data packets are From DS - ie, the Apple transmissions coming from the backend network, via the AP WiFi interface to the RPi. The action frames are trying to establish a block ACK session from the Quantenna AP to the RPi5. None of these packets are successful (no ACK received), and we see that by packet 337, the Quantenna AP has rate-shifted down from 325Mbps to 32.5Mbps. Note that we see a QoS NULL in packet number 339, with the power save bit cleared. This is another indication that the RPi was in power save, but did not inform the AP (or both the AP and the sniffer missed the QoS NULL transmission).
In general, if an AP sees packets transmitted with no ACK received, it will assume the device has gone into power save (or off-channel) and start buffering packets for that client. After some further time the AP may probe the client, and if enough time passes with no transmissions to or from that client, will deauth the client to force a reconnect and re-synch of the state.
Let me know if I should clarify anything else or if what I write above is confusing.
In general, the deeper you look into traces the more issues you'll uncover. I've never seen the 'perfect' trace where everything just works as expected :)
Thank you very much for the deep dive of these traces. It's really nice work.
I also double check the key points you mentioned. I would like to point out the following two points. (I would like to attached two images here, but don't know how :-{.
First, "1. Given we don't see any MAC retry of the key exchange packet 2 (which would be expected if an ACK/BACK was not received)"
Actually the second message 2 of 4 transmission at packet 185 is the retry of the first failed transmission packet 180. Please refer to my attached snapshot.
Regarding the second power save mode, it may be a very difficult issue just based on these traces to figure out what really happened. But we are in the same page "something is wrong".
As you mentioned, "Note that we see a QoS NULL in packet number 339, with the power save bit cleared. This is another indication that the RPi was in power save, but did not inform the AP (or both the AP and the sniffer missed the QoS NULL transmission). " The same thing is packet 326. But even before that, packet 320 Raspberry was requesting to send, which means that it was on wake status.
I will guess that packet 318 Beacon frame with DTIM=0, which caused Raspberry to wake up temporarily wake up to check its traffic, but it seemed not working since the following packet 319 the AP was transmitting multicast packet on the behalf of the Raspberry Pi.
Put all these facts together, I doubt something wrong in the power save process (AP and STA engagement) . But maybe only trace itself could not provide enough information for triage?
Thank you for the sharing of your exploration of these interesting devices. First I would like to get more context on the setup. According to you setup trace file. There are two APs (QuantennaCom, TPLink AP) and two terminals in this test (RaspberryPi, Apple), right?
The first issue I found interesting is, "It should be noted that here that the initial transmit of the key exchange message 2 is unsuccessful (packet 180) as a second transmit of this message is seen in packet 185". I check the radio parameters of these two packets, actually they are same. Moreover, SNR=49dB, RSSI=-42dBm, both are in almost perfect radio condition, but I guess maybe the receiver failed to receive it. I doubt we can blame the 1m distance (signal is too strong and cause the ridiculous saturation) because here RaspberryPi was receiving the packet of RSSI=-27dBm (much higher than -42dBm).
The second issue I found the power saving issue, I doubt the properly communication never setup properly between QuantennaCom and RaspberryPi. This doubt is based on the following considerations,
1.
You mentioned "Looking at packet 249 we see a QoS NULL frame with the ‘power management’ bit of the frame control flags field set. This indicates the device is entering power save, and packets directed to this client should be buffered until the client wakes up. "However, just after aknowledgment packet 250, RaspberryPi start transmitting packet 252, then packet 253.
2. You mentioned "An interesting sequence in the trace is packets 330→337. Here we see a number of packets being transmitted to the RPi and subsequently retried due to lack of acknowledgement by the RPi.", but here it's Apple device is transmitting packet to RaspberryPi. Are they among the same BSSID? Who is the AP and who is the station? They are using directWiFi or triggering "from DS" and "to DS"?
I haven't finished your publication yet. They are really insightful, and I expect more discussion with you in the future. Thanks again.
Hi Wenzen, thank you for your insightful comments!
First to clarify the setup. In the trace I provided the TPLink AP should be ignored - it's on the same channel but not involved in the over the air exchanges. It is connected to the same Ethernet backbone and provides the routing function in my network, which is why you may see packets from this source address both directly OTA (ie, beacons), as well as transmitted via the Quantenna BSS. The two terminals in the test are the RPi5 (connected wirelessly via the Quantenna BSS), and an Apple desktop computer connected via Ethernet to my backbone network, to which the Quantenna AP is also connected.
Regarding the signal saturation issue, thank you for clarifying this. Some other thoughts pop up as to what could be happening:
1. Given we don't see any MAC retry of the key exchange packet 2 (which would be expected if an ACK/BACK was not received), then the packet could have been acknowledged at the MAC layer then lost on the receive path on the Quantenna device. In this case the sniffer may not have picked up the ACK frame for some reason.
2. The RPi5 device is not correctly performing the MAC transmission - ie, the Quantenna device misses the initial transmission and the RPi does not do a retry. We see in the QoS Control field of packet 180 that the packet DOES expect an ACK (Ack Policy bits in the QoS Control field). Perhaps this is by design or an error in the RPi5 WiFi?
Regarding packet 249, this is the entry point for power save, and the RPi can transmit at any time after this to indicate wakeup - this is what happens in packet 253 which is the corresponding wakeup QoS NULL (directed 'To DS' to the AP). Packet 252 is a multicast packet transmitted on behalf of the RPi5 by the AP - note the 'From DS' bit is set, and the packet comes directly after the DTIM beacon (packet 251, DTIM count is zero). The directed transmit of this packet from the RPi to the AP is in packet number 244.
On the packet sequence 330-337, the data packets are From DS - ie, the Apple transmissions coming from the backend network, via the AP WiFi interface to the RPi. The action frames are trying to establish a block ACK session from the Quantenna AP to the RPi5. None of these packets are successful (no ACK received), and we see that by packet 337, the Quantenna AP has rate-shifted down from 325Mbps to 32.5Mbps. Note that we see a QoS NULL in packet number 339, with the power save bit cleared. This is another indication that the RPi was in power save, but did not inform the AP (or both the AP and the sniffer missed the QoS NULL transmission).
In general, if an AP sees packets transmitted with no ACK received, it will assume the device has gone into power save (or off-channel) and start buffering packets for that client. After some further time the AP may probe the client, and if enough time passes with no transmissions to or from that client, will deauth the client to force a reconnect and re-synch of the state.
Let me know if I should clarify anything else or if what I write above is confusing.
In general, the deeper you look into traces the more issues you'll uncover. I've never seen the 'perfect' trace where everything just works as expected :)
Hi Richard,
Thank you very much for the deep dive of these traces. It's really nice work.
I also double check the key points you mentioned. I would like to point out the following two points. (I would like to attached two images here, but don't know how :-{.
First, "1. Given we don't see any MAC retry of the key exchange packet 2 (which would be expected if an ACK/BACK was not received)"
Actually the second message 2 of 4 transmission at packet 185 is the retry of the first failed transmission packet 180. Please refer to my attached snapshot.
Regarding the second power save mode, it may be a very difficult issue just based on these traces to figure out what really happened. But we are in the same page "something is wrong".
As you mentioned, "Note that we see a QoS NULL in packet number 339, with the power save bit cleared. This is another indication that the RPi was in power save, but did not inform the AP (or both the AP and the sniffer missed the QoS NULL transmission). " The same thing is packet 326. But even before that, packet 320 Raspberry was requesting to send, which means that it was on wake status.
I will guess that packet 318 Beacon frame with DTIM=0, which caused Raspberry to wake up temporarily wake up to check its traffic, but it seemed not working since the following packet 319 the AP was transmitting multicast packet on the behalf of the Raspberry Pi.
Put all these facts together, I doubt something wrong in the power save process (AP and STA engagement) . But maybe only trace itself could not provide enough information for triage?
Thanks again for your sharing!
All the best,
Wenzhen