Porting PicoTCP WIP
This commit is contained in:
731
kernel/picotcp/RFC/rfc1106.txt
Normal file
731
kernel/picotcp/RFC/rfc1106.txt
Normal file
@ -0,0 +1,731 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Network Working Group R. Fox
|
||||
Request for Comments: 1106 Tandem
|
||||
June 1989
|
||||
|
||||
|
||||
TCP Big Window and Nak Options
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This memo discusses two extensions to the TCP protocol to provide a
|
||||
more efficient operation over a network with a high bandwidth*delay
|
||||
product. The extensions described in this document have been
|
||||
implemented and shown to work using resources at NASA. This memo
|
||||
describes an Experimental Protocol, these extensions are not proposed
|
||||
as an Internet standard, but as a starting point for further
|
||||
research. Distribution of this memo is unlimited.
|
||||
|
||||
Abstract
|
||||
|
||||
Two extensions to the TCP protocol are described in this RFC in order
|
||||
to provide a more efficient operation over a network with a high
|
||||
bandwidth*delay product. The main issue that still needs to be
|
||||
solved is congestion versus noise. This issue is touched on in this
|
||||
memo, but further research is still needed on the applicability of
|
||||
the extensions in the Internet as a whole infrastructure and not just
|
||||
high bandwidth*delay product networks. Even with this outstanding
|
||||
issue, this document does describe the use of these options in the
|
||||
isolated satellite network environment to help facilitate more
|
||||
efficient use of this special medium to help off load bulk data
|
||||
transfers from links needed for interactive use.
|
||||
|
||||
1. Introduction
|
||||
|
||||
Recent work on TCP has shown great performance gains over a variety
|
||||
of network paths [1]. However, these changes still do not work well
|
||||
over network paths that have a large round trip delay (satellite with
|
||||
a 600 ms round trip delay) or a very large bandwidth
|
||||
(transcontinental DS3 line). These two networks exhibit a higher
|
||||
bandwidth*delay product, over 10**6 bits, than the 10**5 bits that
|
||||
TCP is currently limited to. This high bandwidth*delay product
|
||||
refers to the amount of data that may be unacknowledged so that all
|
||||
of the networks bandwidth is being utilized by TCP. This may also be
|
||||
referred to as "filling the pipe" [2] so that the sender of data can
|
||||
always put data onto the network and the receiver will always have
|
||||
something to read, and neither end of the connection will be forced
|
||||
to wait for the other end.
|
||||
|
||||
After the last batch of algorithm improvements to TCP, performance
|
||||
|
||||
|
||||
|
||||
Fox [Page 1]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
over high bandwidth*delay networks is still very poor. It appears
|
||||
that no algorithm changes alone will make any significant
|
||||
improvements over high bandwidth*delay networks, but will require an
|
||||
extension to the protocol itself. This RFC discusses two possible
|
||||
options to TCP for this purpose.
|
||||
|
||||
The two options implemented and discussed in this RFC are:
|
||||
|
||||
1. NAKs
|
||||
|
||||
This extension allows the receiver of data to inform the sender
|
||||
that a packet of data was not received and needs to be resent.
|
||||
This option proves to be useful over any network path (both high
|
||||
and low bandwidth*delay type networks) that experiences periodic
|
||||
errors such as lost packets, noisy links, or dropped packets due
|
||||
to congestion. The information conveyed by this option is
|
||||
advisory and if ignored, does not have any effect on TCP what so
|
||||
ever.
|
||||
|
||||
2. Big Windows
|
||||
|
||||
This option will give a method of expanding the current 16 bit (64
|
||||
Kbytes) TCP window to 32 bits of which 30 bits (over 1 gigabytes)
|
||||
are allowed for the receive window. (The maximum window size
|
||||
allowed in TCP due to the requirement of TCP to detect old data
|
||||
versus new data. For a good explanation please see [2].) No
|
||||
changes are required to the standard TCP header [6]. The 16 bit
|
||||
field in the TCP header that is used to convey the receive window
|
||||
will remain unchanged. The 32 bit receive window is achieved
|
||||
through the use of an option that contains the upper half of the
|
||||
window. It is this option that is necessary to fill large data
|
||||
pipes such as a satellite link.
|
||||
|
||||
This RFC is broken up into the following sections: section 2 will
|
||||
discuss the operation of the NAK option in greater detail, section 3
|
||||
will discuss the big window option in greater detail. Section 4 will
|
||||
discuss other effects of the big windows and nak feature when used
|
||||
together. Included in this section will be a brief discussion on the
|
||||
effects of congestion versus noise to TCP and possible options for
|
||||
satellite networks. Section 5 will be a conclusion with some hints
|
||||
as to what future development may be done at NASA, and then an
|
||||
appendix containing some test results is included.
|
||||
|
||||
2. NAK Option
|
||||
|
||||
Any packet loss in a high bandwidth*delay network will have a
|
||||
catastrophic effect on throughput because of the simple
|
||||
acknowledgement of TCP. TCP always acks the stream of data that has
|
||||
|
||||
|
||||
|
||||
Fox [Page 2]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
successfully been received and tells the sender the next byte of data
|
||||
of the stream that is expected. If a packet is lost and succeeding
|
||||
packets arrive the current protocol has no way of telling the sender
|
||||
that it missed one packet but received following packets. TCP
|
||||
currently resends all of the data over again, after a timeout or the
|
||||
sender suspects a lost packet due to a duplicate ack algorithm [1],
|
||||
until the receiver receives the lost packet and can then ack the lost
|
||||
packet as well as succeeding packets received. On a normal low
|
||||
bandwidth*delay network this effect is minimal if the timeout period
|
||||
is set short enough. However, on a long delay network such as a T1
|
||||
satellite channel this is catastrophic because by the time the lost
|
||||
packet can be sent and the ack returned the TCP window would have
|
||||
been exhausted and both the sender and receiver would be temporarily
|
||||
stalled waiting for the packet and ack to fully travel the data pipe.
|
||||
This causes the pipe to become empty and requires the sender to
|
||||
refill the pipe after the ack is received. This will cause a minimum
|
||||
of 3*X bandwidth loss, where X is the one way delay of the medium and
|
||||
may be much higher depending on the size of the timeout period and
|
||||
bandwidth*delay product. Its 1X for the packet to be resent, 1X for
|
||||
the ack to be received and 1X for the next packet being sent to reach
|
||||
the destination. This calculation assumes that the window size is
|
||||
much smaller than the pipe size (window = 1/2 data pipe or 1X), which
|
||||
is the typical case with the current TCP window limitation over long
|
||||
delay networks such as a T1 satellite link.
|
||||
|
||||
An attempt to reduce this wasted bandwidth from 3*X was introduced in
|
||||
[1] by having the sender resend a packet after it notices that a
|
||||
number of consecutively received acks completely acknowledges already
|
||||
acknowledged data. On a typical network this will reduce the lost
|
||||
bandwidth to almost nil, since the packet will be resent before the
|
||||
TCP window is exhausted and with the data pipe being much smaller
|
||||
than the TCP window, the data pipe will not become empty and no
|
||||
bandwidth will be lost. On a high delay network the reduction of
|
||||
lost bandwidth is minimal such that lost bandwidth is still
|
||||
significant. On a very noisy satellite, for instance, the lost
|
||||
bandwidth is very high (see appendix for some performance figures)
|
||||
and performance is very poor.
|
||||
|
||||
There are two methods of informing the sender of lost data.
|
||||
Selective acknowledgements and NAKS. Selective acknowledgements have
|
||||
been the object of research in a number of experimental protocols
|
||||
including VMTP [3], NETBLT [4], and SatFTP [5]. The idea behind
|
||||
selective acks is that the receiver tells the sender which pieces it
|
||||
received so that the sender can resend the data not acked but already
|
||||
sent once. NAKs on the other hand, tell the sender that a particular
|
||||
packet of data needs to be resent.
|
||||
|
||||
There are a couple of disadvantages of selective acks. Namely, in
|
||||
|
||||
|
||||
|
||||
Fox [Page 3]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
some of the protocols mentioned above, the receiver waits a certain
|
||||
time before sending the selective ack so that acks may be bundled up.
|
||||
This delay can cause some wasted bandwidth and requires more complex
|
||||
state information than the simple nak. Even if the receiver doesn't
|
||||
bundle up the selective acks but sends them as it notices that
|
||||
packets have been lost, more complex state information is needed to
|
||||
determine which packets have been acked and which packets need to be
|
||||
resent. With naks, only the immediate data needed to move the left
|
||||
edge of the window is naked, thus almost completely eliminating all
|
||||
state information.
|
||||
|
||||
The selective ack has one advantage over naks. If the link is very
|
||||
noisy and packets are being lost close together, then the sender will
|
||||
find out about all of the missing data at once and can send all of
|
||||
the missing data out immediately in an attempt to move the left
|
||||
window edge in the acknowledge number of the TCP header, thus keeping
|
||||
the data pipe flowing. Whereas with naks, the sender will be
|
||||
notified of lost packets one at a time and this will cause the sender
|
||||
to process extra packets compared to selective acks. However,
|
||||
empirical studies has shown that most lost packets occur far enough
|
||||
apart that the advantage of selective acks over naks is rarely seen.
|
||||
Also, if naks are sent out as soon as a packet has been determined
|
||||
lost, then the advantage of selective acks becomes no more than
|
||||
possibly a more aesthetic algorithm for handling lost data, but
|
||||
offers no gains over naks as described in this paper. It is this
|
||||
reason that the simplicity of naks was chosen over selective acks for
|
||||
the current implementation.
|
||||
|
||||
2.1 Implementation details
|
||||
|
||||
When the receiver of data notices a gap between the expected sequence
|
||||
number and the actual sequence number of the packet received, the
|
||||
receiver can assume that the data between the two sequence numbers is
|
||||
either going to arrive late or is lost forever. Since the receiver
|
||||
can not distinguish between the two events a nak should be sent in
|
||||
the TCP option field. Naking a packet still destined to arrive has
|
||||
the effect of causing the sender to resend the packet, wasting one
|
||||
packets worth of bandwidth. Since this event is fairly rare, the
|
||||
lost bandwidth is insignificant as compared to that of not sending a
|
||||
nak when the packet is not going to arrive. The option will take the
|
||||
form as follows:
|
||||
|
||||
+========+=========+=========================+================+
|
||||
+option= + length= + sequence number of + number of +
|
||||
+ A + 7 + first byte being naked + segments naked +
|
||||
+========+=========+=========================+================+
|
||||
|
||||
This option contains the first sequence number not received and a
|
||||
|
||||
|
||||
|
||||
Fox [Page 4]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
count of how many segments of bytes needed to be resent, where
|
||||
segments is the size of the current TCP MSS being used for the
|
||||
connection. Since a nak is an advisory piece of information, the
|
||||
sending of a nak is unreliable and no means for retransmitting a nak
|
||||
is provided at this time.
|
||||
|
||||
When the sender of data receives the option it may either choose to
|
||||
do nothing or it will resend the missing data immediately and then
|
||||
continue sending data where it left off before receiving the nak.
|
||||
The receiver will keep track of the last nak sent so that it will not
|
||||
repeat the same nak. If it were to repeat the same nak the protocol
|
||||
could get into the mode where on every reception of data the receiver
|
||||
would nak the first missing data frame. Since the data pipe may be
|
||||
very large by the time the first nak is read and responded to by the
|
||||
sender, many naks would have been sent by the receiver. Since the
|
||||
sender does not know that the naks are repetitious it will resend the
|
||||
data each time, thus wasting the network bandwidth with useless
|
||||
retransmissions of the same piece of data. Having an unreliable nak
|
||||
may result in a nak being damaged and not being received by the
|
||||
sender, and in this case, we will let the tcp recover by its normal
|
||||
means. Empirical data has shown that the likelihood of the nak being
|
||||
lost is quite small and thus, this advisory nak option works quite
|
||||
well.
|
||||
|
||||
3. Big Window Option
|
||||
|
||||
Currently TCP has a 16 bit window limitation built into the protocol.
|
||||
This limits the amount of outstanding unacknowledged data to 64
|
||||
Kbytes. We have already seen that some networks have a pipe larger
|
||||
than 64 Kbytes. A T1 satellite channel and a cross country DS3
|
||||
network with a 30ms delay have data pipes much larger than 64 Kbytes.
|
||||
Thus, even on a perfectly conditioned link with no bandwidth wasted
|
||||
due to errors, the data pipe will not be filled and bandwidth will be
|
||||
wasted. What is needed is the ability to send more unacknowledged
|
||||
data. This is achieved by having bigger windows, bigger than the
|
||||
current limitation of 16 bits. This option to expands the window
|
||||
size to 30 bits or over 1 gigabytes by literally expanding the window
|
||||
size mechanism currently used by TCP. The added option contains the
|
||||
upper 15 bits of the window while the lower 16 bits will continue to
|
||||
go where they normally go [6] in the TCP header.
|
||||
|
||||
A TCP session will use the big window options only if both sides
|
||||
agree to use them, otherwise the option is not used and the normal 16
|
||||
bit windows will be used. Once the 2 sides agree to use the big
|
||||
windows then every packet thereafter will be expected to contain the
|
||||
window option with the current upper 15 bits of the window. The
|
||||
negotiation to decide whether or not to use the bigger windows takes
|
||||
place during the SYN and SYN ACK segments of the TCP connection
|
||||
|
||||
|
||||
|
||||
Fox [Page 5]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
startup process. The originator of the connection will include in
|
||||
the SYN segment the following option:
|
||||
|
||||
1 byte 1 byte 4 bytes
|
||||
+=========+==========+===============+
|
||||
+option=B + length=6 + 30 bit window +
|
||||
+=========+==========+===============+
|
||||
|
||||
|
||||
If the other end of the connection wants to use big windows it will
|
||||
include the same option back in the SYN ACK segment that it must
|
||||
send. At this point, both sides have agreed to use big windows and
|
||||
the specified windows will be used. It should be noted that the SYN
|
||||
and SYN ACK segments will use the small windows, and once the big
|
||||
window option has been negotiated then the bigger windows will be
|
||||
used.
|
||||
|
||||
Once both sides have agreed to use 32 bit windows the protocol will
|
||||
function just as it did before with no difference in operation, even
|
||||
in the event of lost packets. This claim holds true since the
|
||||
rcv_wnd and snd_wnd variables of tcp contain the 16 bit windows until
|
||||
the big window option is negotiated and then they are replaced with
|
||||
the appropriate 32 bit values. Thus, the use of big windows becomes
|
||||
part of the state information kept by TCP.
|
||||
|
||||
Other methods of expanding the windows have been presented, including
|
||||
a window multiple [2] or streaming [5], but this solution is more
|
||||
elegant in the sense that it is a true extension of the window that
|
||||
one day may easily become part of the protocol and not just be an
|
||||
option to the protocol.
|
||||
|
||||
3.1 How does it work
|
||||
|
||||
Once a connection has decided to use big windows every succeeding
|
||||
packet must contain the following option:
|
||||
|
||||
+=========+==========+==========================+
|
||||
+option=C + length=4 + upper 15 bits of rcv_wnd +
|
||||
+=========+==========+==========================+
|
||||
|
||||
With all segments sent, the sender supplies the size of its receive
|
||||
window. If the connection is only using 16 bits then this option is
|
||||
not supplied, otherwise the lower 16 bits of the receive window go
|
||||
into the tcp header where it currently resides [6] and the upper 15
|
||||
bits of the window is put into the data portion of the option C.
|
||||
When the receiver processes the packet it must first reform the
|
||||
window and then process the packet as it would in the absence of the
|
||||
option.
|
||||
|
||||
|
||||
|
||||
Fox [Page 6]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
3.2 Impact of changes
|
||||
|
||||
In implementing the first version of the big window option there was
|
||||
very little change required to the source. State information must be
|
||||
added to the protocol to determine if the big window option is to be
|
||||
used and all 16 bit variables that dealt with window information must
|
||||
now become 32 bit quantities. A future document will describe in
|
||||
more detail the changes required to the 4.3 bsd tcp source code.
|
||||
Test results of the window change only are presented in the appendix.
|
||||
When expanding 16 bit quantities to 32 bit quantities in the TCP
|
||||
control block in the source (4.3 bsd source) may cause the structure
|
||||
to become larger than the mbuf used to hold the structure. Care must
|
||||
be taken to insure this doesn't occur with your system or
|
||||
undetermined events may take place.
|
||||
|
||||
4. Effects of Big Windows and Naks when used together
|
||||
|
||||
With big windows alone, transfer times over a satellite were quite
|
||||
impressive with the absence of any introduced errors. However, when
|
||||
an error simulator was used to create random errors during transfers,
|
||||
performance went down extremely fast. When the nak option was added
|
||||
to the big window option performance in the face of errors went up
|
||||
some but not to the level that was expected. This section will
|
||||
discuss some issues that were overcome to produce the results given
|
||||
in the appendix.
|
||||
|
||||
4.1 Window Size and Nak benefits
|
||||
|
||||
With out errors, the window size required to keep the data pipe full
|
||||
is equal to the round trip delay * throughput desired, or the data
|
||||
pipe bandwidth (called Z from now on). This and other calculations
|
||||
assume that processing time of the hosts is negligible. In the event
|
||||
of an error (without NAKs), the window size needs to become larger
|
||||
than Z in order to keep the data pipe full while the sender is
|
||||
waiting for the ack of the resent packet. If the window size is
|
||||
equaled to Z and we assume that the retransmission timer is equaled
|
||||
to Z, then when a packet is lost, the retransmission timer will go
|
||||
off as the last piece of data in the window is sent. In this case,
|
||||
the lost piece of data can be resent with no delay. The data pipe
|
||||
will empty out because it will take 1/2Z worth of data to get the ack
|
||||
back to the sender, an additional 1/2Z worth of data to get the data
|
||||
pipe refilled with new data. This causes the required window to be
|
||||
2Z, 1Z to keep the data pipe full during normal operations and 1Z to
|
||||
keep the data pipe full while waiting for a lost packet to be resent
|
||||
and acked.
|
||||
|
||||
If the same scenario in the last paragraph is used with the addition
|
||||
of NAKs, the required window size still needs to be 2Z to avoid
|
||||
|
||||
|
||||
|
||||
Fox [Page 7]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
wasting any bandwidth in the event of a dropped packet. This appears
|
||||
to mean that the nak option does not provide any benefits at all.
|
||||
Testing showed that the retransmission timer was larger than the data
|
||||
pipe and in the event of errors became much bigger than the data
|
||||
pipe, because of the retransmission backoff. Thus, the nak option
|
||||
bounds the required window to 2Z such that in the event of an error
|
||||
there is no lost bandwidth, even with the retransmission timer
|
||||
fluctuations. The results in the appendix shows that by using naks,
|
||||
bandwidth waste associated with the retransmission timer facility is
|
||||
eliminated.
|
||||
|
||||
4.2 Congestions vs Noise
|
||||
|
||||
An issue that must be looked at when implementing both the NAKs and
|
||||
big window scheme together is in the area of congestion versus lost
|
||||
packets due to the medium, or noise. In the recent algorithm
|
||||
enhancements [1], slow start was introduced so that whenever a data
|
||||
transfer is being started on a connection or right after a dropped
|
||||
packet, the effective send window would be set to a very small size
|
||||
(typically would equal the MSS being used). This is done so that a
|
||||
new connection would not cause congestion by immediately overloading
|
||||
the network, and so that an existing connection would back off the
|
||||
network if a packet was dropped due to congestion and allow the
|
||||
network to clear up. If a connection using big windows loses a
|
||||
packet due to the medium (a packet corrupted by an error) the last
|
||||
thing that should be done is to close the send window so that the
|
||||
connection can only send 1 packet and must use the slow start
|
||||
algorithm to slowly work itself back up to sending full windows worth
|
||||
of data. This algorithm would quickly limit the usefulness of the
|
||||
big window and nak options over lossy links.
|
||||
|
||||
On the other hand, if a packet was dropped due to congestion and the
|
||||
sender assumes the packet was dropped because of noise the sender
|
||||
will continue sending large amounts of data. This action will cause
|
||||
the congestion to continue, more packets will be dropped, and that
|
||||
part of the network will collapse. In this instance, the sender
|
||||
would want to back off from sending at the current window limit.
|
||||
Using the current slow start mechanism over a satellite builds up the
|
||||
window too slowly [1]. Possibly a better solution would be for the
|
||||
window to be opened 2*Rlog2(W) instead of R*log2(W) [1] (open window
|
||||
by 2 packets instead of 1 for each acked packet). This will reduce
|
||||
the wasted bandwidth by opening the window much quicker while giving
|
||||
the network a chance to clear up. More experimentation is necessary
|
||||
to find the optimal rate of opening the window, especially when large
|
||||
windows are being used.
|
||||
|
||||
The current recommendation for TCP is to use the slow start mechanism
|
||||
in the event of any lost packet. If an application knows that it
|
||||
|
||||
|
||||
|
||||
Fox [Page 8]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
will be using a satellite with a high error rate, it doesn't make
|
||||
sense to force it to use the slow start mechanism for every dropped
|
||||
packet. Instead, the application should be able to choose what
|
||||
action should happen in the event of a lost packet. In the BSD
|
||||
environment, a setsockopt call should be provided so that the
|
||||
application may inform TCP to handle lost packets in a special way
|
||||
for this particular connection. If the known error rate of a link is
|
||||
known to be small, then by using slow start with modified rate from
|
||||
above, will cause the amount of bandwidth loss to be very small in
|
||||
respect to the amount of bandwidth actually utilized. In this case,
|
||||
the setsockopt call should not be used. What is really needed is a
|
||||
way for a host to determine if a packet or packets are being dropped
|
||||
due to congestion or noise. Then, the host can choose to do the
|
||||
right thing. This will require a mechanism like source quench to be
|
||||
used. For this to happen more experimentation is necessary to
|
||||
determine a solid definition on the use of this mechanism. Now it is
|
||||
believed by some that using source quench to avoid congestion only
|
||||
adds to the problem, not help suppress it.
|
||||
|
||||
The TCP used to gather the results in the appendix for the big window
|
||||
with nak experiment, assumed that lost packets were the result of
|
||||
noise and not congestion. This assumption was used to show how to
|
||||
make the current TCP work in such an environment. The actual
|
||||
satellite used in the experiment (when the satellite simulator was
|
||||
not used) only experienced an error rate around 10e-10. With this
|
||||
error rate it is suggested that in practice when big windows are used
|
||||
over the link, TCP should use the slow start mechanism for all lost
|
||||
packets with the 2*Rlog2(W) rate discussed above. Under most
|
||||
situations when long delay networks are being used (transcontinental
|
||||
DS3 networks using fiber with very low error rates, or satellite
|
||||
links with low error rates) big windows and naks should be used with
|
||||
the assumption that lost packets are the result of congestion until a
|
||||
better algorithm is devised [7].
|
||||
|
||||
Another problem noticed, while testing the affects of slow start over
|
||||
a satellite link, was at times, the retransmission timer was set so
|
||||
restrictive, that milliseconds before a naked packet's ack is
|
||||
received the retransmission timer would go off due to a timed packet
|
||||
within the send window. The timer was set at the round trip delay of
|
||||
the network allowing no time for packet processing. If this timer
|
||||
went off due to congestion then backing off is the right thing to do,
|
||||
otherwise to avoid the scenario discovered by experimentation, the
|
||||
transmit timer should be set a little longer so that the
|
||||
retransmission timer does not go off too early. Care must be taken
|
||||
to make sure the right thing is done in the implementation in
|
||||
question so that a packet isn't retransmitted too soon, and blamed on
|
||||
congestion when in fact, the ack is on its way.
|
||||
|
||||
|
||||
|
||||
|
||||
Fox [Page 9]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
4.3 Duplicate Acks
|
||||
|
||||
Another problem found with the 4.3bsd implementation is in the area
|
||||
of duplicate acks. When the sender of data receives a certain number
|
||||
of acks (3 in the current Berkeley release) that acknowledge
|
||||
previously acked data before, it then assumes that a packet has been
|
||||
lost and will resend the one packet assumed lost, and close its send
|
||||
window as if the network is congested and the slow start algorithm
|
||||
mention above will be used to open the send window. This facility is
|
||||
no longer needed since the sender can use the reception of a nak as
|
||||
its indicator that a particular packet was dropped. If the nak
|
||||
packet is lost then the retransmit timer will go off and the packet
|
||||
will be retransmitted by normal means. If a senders algorithm
|
||||
continues to count duplicate acks the sender will find itself
|
||||
possibly receiving many duplicate acks after it has already resent
|
||||
the packet due to a nak being received because of the large size of
|
||||
the data pipe. By receiving all of these duplicate acks the sender
|
||||
may find itself doing nothing but resending the same packet of data
|
||||
unnecessarily while keeping the send window closed for absolutely no
|
||||
reason. By removing this feature of the implementation a user can
|
||||
expect to find a satellite connection working much better in the face
|
||||
of errors and other connections should not see any performance loss,
|
||||
but a slight improvement in performance if anything at all.
|
||||
|
||||
5. Conclusion
|
||||
|
||||
This paper has described two new options that if used will make TCP a
|
||||
more efficient protocol in the face of errors and a more efficient
|
||||
protocol over networks that have a high bandwidth*delay product
|
||||
without decreasing performance over more common networks. If a
|
||||
system that implements the options talks with one that does not, the
|
||||
two systems should still be able to communicate with no problems.
|
||||
This assumes that the system doesn't use the option numbers defined
|
||||
in this paper in some other way or doesn't panic when faced with an
|
||||
option that the machine does not implement. Currently at NASA, there
|
||||
are many machines that do not implement either option and communicate
|
||||
just fine with the systems that do implement them.
|
||||
|
||||
The drive for implementing big windows has been the direct result of
|
||||
trying to make TCP more efficient over large delay networks [2,3,4,5]
|
||||
such as a T1 satellite. However, another practical use of large
|
||||
windows is becoming more apparent as the local area networks being
|
||||
developed are becoming faster and supporting much larger MTU's.
|
||||
Hyperchannel, for instances, has been stated to be able to support 1
|
||||
Mega bit MTU's in their new line of products. With the current
|
||||
implementation of TCP, efficient use of hyperchannel is not utilized
|
||||
as it should because the physical mediums MTU is larger than the
|
||||
maximum window of the protocol being used. By increasing the TCP
|
||||
|
||||
|
||||
|
||||
Fox [Page 10]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
window size, better utilization of networks like hyperchannel will be
|
||||
gained instantly because the sender can send 64 Kbyte packets (IP
|
||||
limitation) but not have to operate in a stop and wait fashion.
|
||||
Future work is being started to increase the IP maximum datagram size
|
||||
so that even better utilization of fast local area networks will be
|
||||
seen by having the TCP/IP protocols being able to send large packets
|
||||
over mediums with very large MTUs. This will hopefully, eliminate
|
||||
the network protocol as the bottleneck in data transfers while
|
||||
workstations and workstation file system technology advances even
|
||||
more so, than it already has.
|
||||
|
||||
An area of concern when using the big window mechanism is the use of
|
||||
machine resources. When running over a satellite and a packet is
|
||||
dropped such that 2Z (where Z is the round trip delay) worth of data
|
||||
is unacknowledged, both ends of the connection need to be able to
|
||||
buffer the data using machine mbufs (or whatever mechanism the
|
||||
machine uses), usually a valuable and scarce commodity. If the
|
||||
window size is not chosen properly, some machines will crash when the
|
||||
memory is all used up, or it will keep other parts of the system from
|
||||
running. Thus, setting the window to some fairly large arbitrary
|
||||
number is not a good idea, especially on a general purpose machine
|
||||
where many users log on at any time. What is currently being
|
||||
engineered at NASA is the ability for certain programs to use the
|
||||
setsockopt feature or 4.3bsd asking to use big windows such that the
|
||||
average user may not have access to the large windows, thus limiting
|
||||
the use of big windows to applications that absolutely need them and
|
||||
to protect a valuable system resource.
|
||||
|
||||
6. References
|
||||
|
||||
[1] Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 88,
|
||||
Stanford, Ca., August 1988.
|
||||
|
||||
[2] Jacobson, V., and R. Braden, "TCP Extensions for Long-Delay
|
||||
Paths", LBL, USC/Information Sciences Institute, RFC 1072,
|
||||
October 1988.
|
||||
|
||||
[3] Cheriton, D., "VMTP: Versatile Message Transaction Protocol", RFC
|
||||
1045, Stanford University, February 1988.
|
||||
|
||||
[4] Clark, D., M. Lambert, and L. Zhang, "NETBLT: A Bulk Data
|
||||
Transfer Protocol", RFC 998, MIT, March 1987.
|
||||
|
||||
[5] Fox, R., "Draft of Proposed Solution for High Delay Circuit File
|
||||
Transfer", GE/NAS Internal Document, March 1988.
|
||||
|
||||
[6] Postel, J., "Transmission Control Protocol - DARPA Internet
|
||||
Program Protocol Specification", RFC 793, DARPA, September 1981.
|
||||
|
||||
|
||||
|
||||
Fox [Page 11]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
[7] Leiner, B., "Critical Issues in High Bandwidth Networking", RFC
|
||||
1077, DARPA, November 1989.
|
||||
|
||||
7. Appendix
|
||||
|
||||
Both options have been implemented and tested. Contained in this
|
||||
section is some performance gathered to support the use of these two
|
||||
options. The satellite channel used was a 1.544 Mbit link with a
|
||||
580ms round trip delay. All values are given as units of bytes.
|
||||
|
||||
|
||||
TCP with Big Windows, No Naks:
|
||||
|
||||
|
||||
|---------------transfer rates----------------------|
|
||||
Window Size | no error | 10e-7 error rate | 10e-6 error rate |
|
||||
-----------------------------------------------------------------
|
||||
64K | 94K | 53K | 14K |
|
||||
-----------------------------------------------------------------
|
||||
72K | 106K | 51K | 15K |
|
||||
-----------------------------------------------------------------
|
||||
80K | 115K | 42K | 14K |
|
||||
-----------------------------------------------------------------
|
||||
92K | 115K | 43K | 14K |
|
||||
-----------------------------------------------------------------
|
||||
100K | 135K | 66K | 15K |
|
||||
-----------------------------------------------------------------
|
||||
112K | 126K | 53K | 17K |
|
||||
-----------------------------------------------------------------
|
||||
124K | 154K | 45K | 14K |
|
||||
-----------------------------------------------------------------
|
||||
136K | 160K | 66K | 15K |
|
||||
-----------------------------------------------------------------
|
||||
156K | 167K | 45K | 14K |
|
||||
-----------------------------------------------------------------
|
||||
Figure 1.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fox [Page 12]
|
||||
|
||||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||||
|
||||
|
||||
TCP with Big Windows, and Naks:
|
||||
|
||||
|
||||
|---------------transfer rates----------------------|
|
||||
Window Size | no error | 10e-7 error rate | 10e-6 error rate |
|
||||
-----------------------------------------------------------------
|
||||
64K | 95K | 83K | 43K |
|
||||
-----------------------------------------------------------------
|
||||
72K | 104K | 87K | 49K |
|
||||
-----------------------------------------------------------------
|
||||
80K | 117K | 96K | 62K |
|
||||
-----------------------------------------------------------------
|
||||
92K | 124K | 119K | 39K |
|
||||
-----------------------------------------------------------------
|
||||
100K | 140K | 124K | 35K |
|
||||
-----------------------------------------------------------------
|
||||
112K | 151K | 126K | 53K |
|
||||
-----------------------------------------------------------------
|
||||
124K | 160K | 140K | 36K |
|
||||
-----------------------------------------------------------------
|
||||
136K | 167K | 148K | 38K |
|
||||
-----------------------------------------------------------------
|
||||
156K | 167K | 160K | 38K |
|
||||
-----------------------------------------------------------------
|
||||
Figure 2.
|
||||
|
||||
With a 10e-6 error rate, many naks as well as data packets were
|
||||
dropped, causing the wild swing in transfer times. Also, please note
|
||||
that the machines used are SGI Iris 2500 Turbos with the 3.6 OS with
|
||||
the new TCP enhancements. The performance associated with the Irises
|
||||
are slower than a Sun 3/260, but due to some source code restrictions
|
||||
the Iris was used. Initial results on the Sun showed slightly higher
|
||||
performance and less variance.
|
||||
|
||||
Author's Address
|
||||
|
||||
Richard Fox
|
||||
950 Linden #208
|
||||
Sunnyvale, Cal, 94086
|
||||
|
||||
EMail: rfox@tandem.com
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fox [Page 13]
|
||||
|
||||
Reference in New Issue
Block a user