731 lines
36 KiB
Plaintext
731 lines
36 KiB
Plaintext
|
||
|
||
|
||
|
||
|
||
|
||
Network Working Group R. Fox
|
||
Request for Comments: 1106 Tandem
|
||
June 1989
|
||
|
||
|
||
TCP Big Window and Nak Options
|
||
|
||
Status of this Memo
|
||
|
||
This memo discusses two extensions to the TCP protocol to provide a
|
||
more efficient operation over a network with a high bandwidth*delay
|
||
product. The extensions described in this document have been
|
||
implemented and shown to work using resources at NASA. This memo
|
||
describes an Experimental Protocol, these extensions are not proposed
|
||
as an Internet standard, but as a starting point for further
|
||
research. Distribution of this memo is unlimited.
|
||
|
||
Abstract
|
||
|
||
Two extensions to the TCP protocol are described in this RFC in order
|
||
to provide a more efficient operation over a network with a high
|
||
bandwidth*delay product. The main issue that still needs to be
|
||
solved is congestion versus noise. This issue is touched on in this
|
||
memo, but further research is still needed on the applicability of
|
||
the extensions in the Internet as a whole infrastructure and not just
|
||
high bandwidth*delay product networks. Even with this outstanding
|
||
issue, this document does describe the use of these options in the
|
||
isolated satellite network environment to help facilitate more
|
||
efficient use of this special medium to help off load bulk data
|
||
transfers from links needed for interactive use.
|
||
|
||
1. Introduction
|
||
|
||
Recent work on TCP has shown great performance gains over a variety
|
||
of network paths [1]. However, these changes still do not work well
|
||
over network paths that have a large round trip delay (satellite with
|
||
a 600 ms round trip delay) or a very large bandwidth
|
||
(transcontinental DS3 line). These two networks exhibit a higher
|
||
bandwidth*delay product, over 10**6 bits, than the 10**5 bits that
|
||
TCP is currently limited to. This high bandwidth*delay product
|
||
refers to the amount of data that may be unacknowledged so that all
|
||
of the networks bandwidth is being utilized by TCP. This may also be
|
||
referred to as "filling the pipe" [2] so that the sender of data can
|
||
always put data onto the network and the receiver will always have
|
||
something to read, and neither end of the connection will be forced
|
||
to wait for the other end.
|
||
|
||
After the last batch of algorithm improvements to TCP, performance
|
||
|
||
|
||
|
||
Fox [Page 1]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
over high bandwidth*delay networks is still very poor. It appears
|
||
that no algorithm changes alone will make any significant
|
||
improvements over high bandwidth*delay networks, but will require an
|
||
extension to the protocol itself. This RFC discusses two possible
|
||
options to TCP for this purpose.
|
||
|
||
The two options implemented and discussed in this RFC are:
|
||
|
||
1. NAKs
|
||
|
||
This extension allows the receiver of data to inform the sender
|
||
that a packet of data was not received and needs to be resent.
|
||
This option proves to be useful over any network path (both high
|
||
and low bandwidth*delay type networks) that experiences periodic
|
||
errors such as lost packets, noisy links, or dropped packets due
|
||
to congestion. The information conveyed by this option is
|
||
advisory and if ignored, does not have any effect on TCP what so
|
||
ever.
|
||
|
||
2. Big Windows
|
||
|
||
This option will give a method of expanding the current 16 bit (64
|
||
Kbytes) TCP window to 32 bits of which 30 bits (over 1 gigabytes)
|
||
are allowed for the receive window. (The maximum window size
|
||
allowed in TCP due to the requirement of TCP to detect old data
|
||
versus new data. For a good explanation please see [2].) No
|
||
changes are required to the standard TCP header [6]. The 16 bit
|
||
field in the TCP header that is used to convey the receive window
|
||
will remain unchanged. The 32 bit receive window is achieved
|
||
through the use of an option that contains the upper half of the
|
||
window. It is this option that is necessary to fill large data
|
||
pipes such as a satellite link.
|
||
|
||
This RFC is broken up into the following sections: section 2 will
|
||
discuss the operation of the NAK option in greater detail, section 3
|
||
will discuss the big window option in greater detail. Section 4 will
|
||
discuss other effects of the big windows and nak feature when used
|
||
together. Included in this section will be a brief discussion on the
|
||
effects of congestion versus noise to TCP and possible options for
|
||
satellite networks. Section 5 will be a conclusion with some hints
|
||
as to what future development may be done at NASA, and then an
|
||
appendix containing some test results is included.
|
||
|
||
2. NAK Option
|
||
|
||
Any packet loss in a high bandwidth*delay network will have a
|
||
catastrophic effect on throughput because of the simple
|
||
acknowledgement of TCP. TCP always acks the stream of data that has
|
||
|
||
|
||
|
||
Fox [Page 2]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
successfully been received and tells the sender the next byte of data
|
||
of the stream that is expected. If a packet is lost and succeeding
|
||
packets arrive the current protocol has no way of telling the sender
|
||
that it missed one packet but received following packets. TCP
|
||
currently resends all of the data over again, after a timeout or the
|
||
sender suspects a lost packet due to a duplicate ack algorithm [1],
|
||
until the receiver receives the lost packet and can then ack the lost
|
||
packet as well as succeeding packets received. On a normal low
|
||
bandwidth*delay network this effect is minimal if the timeout period
|
||
is set short enough. However, on a long delay network such as a T1
|
||
satellite channel this is catastrophic because by the time the lost
|
||
packet can be sent and the ack returned the TCP window would have
|
||
been exhausted and both the sender and receiver would be temporarily
|
||
stalled waiting for the packet and ack to fully travel the data pipe.
|
||
This causes the pipe to become empty and requires the sender to
|
||
refill the pipe after the ack is received. This will cause a minimum
|
||
of 3*X bandwidth loss, where X is the one way delay of the medium and
|
||
may be much higher depending on the size of the timeout period and
|
||
bandwidth*delay product. Its 1X for the packet to be resent, 1X for
|
||
the ack to be received and 1X for the next packet being sent to reach
|
||
the destination. This calculation assumes that the window size is
|
||
much smaller than the pipe size (window = 1/2 data pipe or 1X), which
|
||
is the typical case with the current TCP window limitation over long
|
||
delay networks such as a T1 satellite link.
|
||
|
||
An attempt to reduce this wasted bandwidth from 3*X was introduced in
|
||
[1] by having the sender resend a packet after it notices that a
|
||
number of consecutively received acks completely acknowledges already
|
||
acknowledged data. On a typical network this will reduce the lost
|
||
bandwidth to almost nil, since the packet will be resent before the
|
||
TCP window is exhausted and with the data pipe being much smaller
|
||
than the TCP window, the data pipe will not become empty and no
|
||
bandwidth will be lost. On a high delay network the reduction of
|
||
lost bandwidth is minimal such that lost bandwidth is still
|
||
significant. On a very noisy satellite, for instance, the lost
|
||
bandwidth is very high (see appendix for some performance figures)
|
||
and performance is very poor.
|
||
|
||
There are two methods of informing the sender of lost data.
|
||
Selective acknowledgements and NAKS. Selective acknowledgements have
|
||
been the object of research in a number of experimental protocols
|
||
including VMTP [3], NETBLT [4], and SatFTP [5]. The idea behind
|
||
selective acks is that the receiver tells the sender which pieces it
|
||
received so that the sender can resend the data not acked but already
|
||
sent once. NAKs on the other hand, tell the sender that a particular
|
||
packet of data needs to be resent.
|
||
|
||
There are a couple of disadvantages of selective acks. Namely, in
|
||
|
||
|
||
|
||
Fox [Page 3]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
some of the protocols mentioned above, the receiver waits a certain
|
||
time before sending the selective ack so that acks may be bundled up.
|
||
This delay can cause some wasted bandwidth and requires more complex
|
||
state information than the simple nak. Even if the receiver doesn't
|
||
bundle up the selective acks but sends them as it notices that
|
||
packets have been lost, more complex state information is needed to
|
||
determine which packets have been acked and which packets need to be
|
||
resent. With naks, only the immediate data needed to move the left
|
||
edge of the window is naked, thus almost completely eliminating all
|
||
state information.
|
||
|
||
The selective ack has one advantage over naks. If the link is very
|
||
noisy and packets are being lost close together, then the sender will
|
||
find out about all of the missing data at once and can send all of
|
||
the missing data out immediately in an attempt to move the left
|
||
window edge in the acknowledge number of the TCP header, thus keeping
|
||
the data pipe flowing. Whereas with naks, the sender will be
|
||
notified of lost packets one at a time and this will cause the sender
|
||
to process extra packets compared to selective acks. However,
|
||
empirical studies has shown that most lost packets occur far enough
|
||
apart that the advantage of selective acks over naks is rarely seen.
|
||
Also, if naks are sent out as soon as a packet has been determined
|
||
lost, then the advantage of selective acks becomes no more than
|
||
possibly a more aesthetic algorithm for handling lost data, but
|
||
offers no gains over naks as described in this paper. It is this
|
||
reason that the simplicity of naks was chosen over selective acks for
|
||
the current implementation.
|
||
|
||
2.1 Implementation details
|
||
|
||
When the receiver of data notices a gap between the expected sequence
|
||
number and the actual sequence number of the packet received, the
|
||
receiver can assume that the data between the two sequence numbers is
|
||
either going to arrive late or is lost forever. Since the receiver
|
||
can not distinguish between the two events a nak should be sent in
|
||
the TCP option field. Naking a packet still destined to arrive has
|
||
the effect of causing the sender to resend the packet, wasting one
|
||
packets worth of bandwidth. Since this event is fairly rare, the
|
||
lost bandwidth is insignificant as compared to that of not sending a
|
||
nak when the packet is not going to arrive. The option will take the
|
||
form as follows:
|
||
|
||
+========+=========+=========================+================+
|
||
+option= + length= + sequence number of + number of +
|
||
+ A + 7 + first byte being naked + segments naked +
|
||
+========+=========+=========================+================+
|
||
|
||
This option contains the first sequence number not received and a
|
||
|
||
|
||
|
||
Fox [Page 4]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
count of how many segments of bytes needed to be resent, where
|
||
segments is the size of the current TCP MSS being used for the
|
||
connection. Since a nak is an advisory piece of information, the
|
||
sending of a nak is unreliable and no means for retransmitting a nak
|
||
is provided at this time.
|
||
|
||
When the sender of data receives the option it may either choose to
|
||
do nothing or it will resend the missing data immediately and then
|
||
continue sending data where it left off before receiving the nak.
|
||
The receiver will keep track of the last nak sent so that it will not
|
||
repeat the same nak. If it were to repeat the same nak the protocol
|
||
could get into the mode where on every reception of data the receiver
|
||
would nak the first missing data frame. Since the data pipe may be
|
||
very large by the time the first nak is read and responded to by the
|
||
sender, many naks would have been sent by the receiver. Since the
|
||
sender does not know that the naks are repetitious it will resend the
|
||
data each time, thus wasting the network bandwidth with useless
|
||
retransmissions of the same piece of data. Having an unreliable nak
|
||
may result in a nak being damaged and not being received by the
|
||
sender, and in this case, we will let the tcp recover by its normal
|
||
means. Empirical data has shown that the likelihood of the nak being
|
||
lost is quite small and thus, this advisory nak option works quite
|
||
well.
|
||
|
||
3. Big Window Option
|
||
|
||
Currently TCP has a 16 bit window limitation built into the protocol.
|
||
This limits the amount of outstanding unacknowledged data to 64
|
||
Kbytes. We have already seen that some networks have a pipe larger
|
||
than 64 Kbytes. A T1 satellite channel and a cross country DS3
|
||
network with a 30ms delay have data pipes much larger than 64 Kbytes.
|
||
Thus, even on a perfectly conditioned link with no bandwidth wasted
|
||
due to errors, the data pipe will not be filled and bandwidth will be
|
||
wasted. What is needed is the ability to send more unacknowledged
|
||
data. This is achieved by having bigger windows, bigger than the
|
||
current limitation of 16 bits. This option to expands the window
|
||
size to 30 bits or over 1 gigabytes by literally expanding the window
|
||
size mechanism currently used by TCP. The added option contains the
|
||
upper 15 bits of the window while the lower 16 bits will continue to
|
||
go where they normally go [6] in the TCP header.
|
||
|
||
A TCP session will use the big window options only if both sides
|
||
agree to use them, otherwise the option is not used and the normal 16
|
||
bit windows will be used. Once the 2 sides agree to use the big
|
||
windows then every packet thereafter will be expected to contain the
|
||
window option with the current upper 15 bits of the window. The
|
||
negotiation to decide whether or not to use the bigger windows takes
|
||
place during the SYN and SYN ACK segments of the TCP connection
|
||
|
||
|
||
|
||
Fox [Page 5]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
startup process. The originator of the connection will include in
|
||
the SYN segment the following option:
|
||
|
||
1 byte 1 byte 4 bytes
|
||
+=========+==========+===============+
|
||
+option=B + length=6 + 30 bit window +
|
||
+=========+==========+===============+
|
||
|
||
|
||
If the other end of the connection wants to use big windows it will
|
||
include the same option back in the SYN ACK segment that it must
|
||
send. At this point, both sides have agreed to use big windows and
|
||
the specified windows will be used. It should be noted that the SYN
|
||
and SYN ACK segments will use the small windows, and once the big
|
||
window option has been negotiated then the bigger windows will be
|
||
used.
|
||
|
||
Once both sides have agreed to use 32 bit windows the protocol will
|
||
function just as it did before with no difference in operation, even
|
||
in the event of lost packets. This claim holds true since the
|
||
rcv_wnd and snd_wnd variables of tcp contain the 16 bit windows until
|
||
the big window option is negotiated and then they are replaced with
|
||
the appropriate 32 bit values. Thus, the use of big windows becomes
|
||
part of the state information kept by TCP.
|
||
|
||
Other methods of expanding the windows have been presented, including
|
||
a window multiple [2] or streaming [5], but this solution is more
|
||
elegant in the sense that it is a true extension of the window that
|
||
one day may easily become part of the protocol and not just be an
|
||
option to the protocol.
|
||
|
||
3.1 How does it work
|
||
|
||
Once a connection has decided to use big windows every succeeding
|
||
packet must contain the following option:
|
||
|
||
+=========+==========+==========================+
|
||
+option=C + length=4 + upper 15 bits of rcv_wnd +
|
||
+=========+==========+==========================+
|
||
|
||
With all segments sent, the sender supplies the size of its receive
|
||
window. If the connection is only using 16 bits then this option is
|
||
not supplied, otherwise the lower 16 bits of the receive window go
|
||
into the tcp header where it currently resides [6] and the upper 15
|
||
bits of the window is put into the data portion of the option C.
|
||
When the receiver processes the packet it must first reform the
|
||
window and then process the packet as it would in the absence of the
|
||
option.
|
||
|
||
|
||
|
||
Fox [Page 6]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
3.2 Impact of changes
|
||
|
||
In implementing the first version of the big window option there was
|
||
very little change required to the source. State information must be
|
||
added to the protocol to determine if the big window option is to be
|
||
used and all 16 bit variables that dealt with window information must
|
||
now become 32 bit quantities. A future document will describe in
|
||
more detail the changes required to the 4.3 bsd tcp source code.
|
||
Test results of the window change only are presented in the appendix.
|
||
When expanding 16 bit quantities to 32 bit quantities in the TCP
|
||
control block in the source (4.3 bsd source) may cause the structure
|
||
to become larger than the mbuf used to hold the structure. Care must
|
||
be taken to insure this doesn't occur with your system or
|
||
undetermined events may take place.
|
||
|
||
4. Effects of Big Windows and Naks when used together
|
||
|
||
With big windows alone, transfer times over a satellite were quite
|
||
impressive with the absence of any introduced errors. However, when
|
||
an error simulator was used to create random errors during transfers,
|
||
performance went down extremely fast. When the nak option was added
|
||
to the big window option performance in the face of errors went up
|
||
some but not to the level that was expected. This section will
|
||
discuss some issues that were overcome to produce the results given
|
||
in the appendix.
|
||
|
||
4.1 Window Size and Nak benefits
|
||
|
||
With out errors, the window size required to keep the data pipe full
|
||
is equal to the round trip delay * throughput desired, or the data
|
||
pipe bandwidth (called Z from now on). This and other calculations
|
||
assume that processing time of the hosts is negligible. In the event
|
||
of an error (without NAKs), the window size needs to become larger
|
||
than Z in order to keep the data pipe full while the sender is
|
||
waiting for the ack of the resent packet. If the window size is
|
||
equaled to Z and we assume that the retransmission timer is equaled
|
||
to Z, then when a packet is lost, the retransmission timer will go
|
||
off as the last piece of data in the window is sent. In this case,
|
||
the lost piece of data can be resent with no delay. The data pipe
|
||
will empty out because it will take 1/2Z worth of data to get the ack
|
||
back to the sender, an additional 1/2Z worth of data to get the data
|
||
pipe refilled with new data. This causes the required window to be
|
||
2Z, 1Z to keep the data pipe full during normal operations and 1Z to
|
||
keep the data pipe full while waiting for a lost packet to be resent
|
||
and acked.
|
||
|
||
If the same scenario in the last paragraph is used with the addition
|
||
of NAKs, the required window size still needs to be 2Z to avoid
|
||
|
||
|
||
|
||
Fox [Page 7]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
wasting any bandwidth in the event of a dropped packet. This appears
|
||
to mean that the nak option does not provide any benefits at all.
|
||
Testing showed that the retransmission timer was larger than the data
|
||
pipe and in the event of errors became much bigger than the data
|
||
pipe, because of the retransmission backoff. Thus, the nak option
|
||
bounds the required window to 2Z such that in the event of an error
|
||
there is no lost bandwidth, even with the retransmission timer
|
||
fluctuations. The results in the appendix shows that by using naks,
|
||
bandwidth waste associated with the retransmission timer facility is
|
||
eliminated.
|
||
|
||
4.2 Congestions vs Noise
|
||
|
||
An issue that must be looked at when implementing both the NAKs and
|
||
big window scheme together is in the area of congestion versus lost
|
||
packets due to the medium, or noise. In the recent algorithm
|
||
enhancements [1], slow start was introduced so that whenever a data
|
||
transfer is being started on a connection or right after a dropped
|
||
packet, the effective send window would be set to a very small size
|
||
(typically would equal the MSS being used). This is done so that a
|
||
new connection would not cause congestion by immediately overloading
|
||
the network, and so that an existing connection would back off the
|
||
network if a packet was dropped due to congestion and allow the
|
||
network to clear up. If a connection using big windows loses a
|
||
packet due to the medium (a packet corrupted by an error) the last
|
||
thing that should be done is to close the send window so that the
|
||
connection can only send 1 packet and must use the slow start
|
||
algorithm to slowly work itself back up to sending full windows worth
|
||
of data. This algorithm would quickly limit the usefulness of the
|
||
big window and nak options over lossy links.
|
||
|
||
On the other hand, if a packet was dropped due to congestion and the
|
||
sender assumes the packet was dropped because of noise the sender
|
||
will continue sending large amounts of data. This action will cause
|
||
the congestion to continue, more packets will be dropped, and that
|
||
part of the network will collapse. In this instance, the sender
|
||
would want to back off from sending at the current window limit.
|
||
Using the current slow start mechanism over a satellite builds up the
|
||
window too slowly [1]. Possibly a better solution would be for the
|
||
window to be opened 2*Rlog2(W) instead of R*log2(W) [1] (open window
|
||
by 2 packets instead of 1 for each acked packet). This will reduce
|
||
the wasted bandwidth by opening the window much quicker while giving
|
||
the network a chance to clear up. More experimentation is necessary
|
||
to find the optimal rate of opening the window, especially when large
|
||
windows are being used.
|
||
|
||
The current recommendation for TCP is to use the slow start mechanism
|
||
in the event of any lost packet. If an application knows that it
|
||
|
||
|
||
|
||
Fox [Page 8]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
will be using a satellite with a high error rate, it doesn't make
|
||
sense to force it to use the slow start mechanism for every dropped
|
||
packet. Instead, the application should be able to choose what
|
||
action should happen in the event of a lost packet. In the BSD
|
||
environment, a setsockopt call should be provided so that the
|
||
application may inform TCP to handle lost packets in a special way
|
||
for this particular connection. If the known error rate of a link is
|
||
known to be small, then by using slow start with modified rate from
|
||
above, will cause the amount of bandwidth loss to be very small in
|
||
respect to the amount of bandwidth actually utilized. In this case,
|
||
the setsockopt call should not be used. What is really needed is a
|
||
way for a host to determine if a packet or packets are being dropped
|
||
due to congestion or noise. Then, the host can choose to do the
|
||
right thing. This will require a mechanism like source quench to be
|
||
used. For this to happen more experimentation is necessary to
|
||
determine a solid definition on the use of this mechanism. Now it is
|
||
believed by some that using source quench to avoid congestion only
|
||
adds to the problem, not help suppress it.
|
||
|
||
The TCP used to gather the results in the appendix for the big window
|
||
with nak experiment, assumed that lost packets were the result of
|
||
noise and not congestion. This assumption was used to show how to
|
||
make the current TCP work in such an environment. The actual
|
||
satellite used in the experiment (when the satellite simulator was
|
||
not used) only experienced an error rate around 10e-10. With this
|
||
error rate it is suggested that in practice when big windows are used
|
||
over the link, TCP should use the slow start mechanism for all lost
|
||
packets with the 2*Rlog2(W) rate discussed above. Under most
|
||
situations when long delay networks are being used (transcontinental
|
||
DS3 networks using fiber with very low error rates, or satellite
|
||
links with low error rates) big windows and naks should be used with
|
||
the assumption that lost packets are the result of congestion until a
|
||
better algorithm is devised [7].
|
||
|
||
Another problem noticed, while testing the affects of slow start over
|
||
a satellite link, was at times, the retransmission timer was set so
|
||
restrictive, that milliseconds before a naked packet's ack is
|
||
received the retransmission timer would go off due to a timed packet
|
||
within the send window. The timer was set at the round trip delay of
|
||
the network allowing no time for packet processing. If this timer
|
||
went off due to congestion then backing off is the right thing to do,
|
||
otherwise to avoid the scenario discovered by experimentation, the
|
||
transmit timer should be set a little longer so that the
|
||
retransmission timer does not go off too early. Care must be taken
|
||
to make sure the right thing is done in the implementation in
|
||
question so that a packet isn't retransmitted too soon, and blamed on
|
||
congestion when in fact, the ack is on its way.
|
||
|
||
|
||
|
||
|
||
Fox [Page 9]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
4.3 Duplicate Acks
|
||
|
||
Another problem found with the 4.3bsd implementation is in the area
|
||
of duplicate acks. When the sender of data receives a certain number
|
||
of acks (3 in the current Berkeley release) that acknowledge
|
||
previously acked data before, it then assumes that a packet has been
|
||
lost and will resend the one packet assumed lost, and close its send
|
||
window as if the network is congested and the slow start algorithm
|
||
mention above will be used to open the send window. This facility is
|
||
no longer needed since the sender can use the reception of a nak as
|
||
its indicator that a particular packet was dropped. If the nak
|
||
packet is lost then the retransmit timer will go off and the packet
|
||
will be retransmitted by normal means. If a senders algorithm
|
||
continues to count duplicate acks the sender will find itself
|
||
possibly receiving many duplicate acks after it has already resent
|
||
the packet due to a nak being received because of the large size of
|
||
the data pipe. By receiving all of these duplicate acks the sender
|
||
may find itself doing nothing but resending the same packet of data
|
||
unnecessarily while keeping the send window closed for absolutely no
|
||
reason. By removing this feature of the implementation a user can
|
||
expect to find a satellite connection working much better in the face
|
||
of errors and other connections should not see any performance loss,
|
||
but a slight improvement in performance if anything at all.
|
||
|
||
5. Conclusion
|
||
|
||
This paper has described two new options that if used will make TCP a
|
||
more efficient protocol in the face of errors and a more efficient
|
||
protocol over networks that have a high bandwidth*delay product
|
||
without decreasing performance over more common networks. If a
|
||
system that implements the options talks with one that does not, the
|
||
two systems should still be able to communicate with no problems.
|
||
This assumes that the system doesn't use the option numbers defined
|
||
in this paper in some other way or doesn't panic when faced with an
|
||
option that the machine does not implement. Currently at NASA, there
|
||
are many machines that do not implement either option and communicate
|
||
just fine with the systems that do implement them.
|
||
|
||
The drive for implementing big windows has been the direct result of
|
||
trying to make TCP more efficient over large delay networks [2,3,4,5]
|
||
such as a T1 satellite. However, another practical use of large
|
||
windows is becoming more apparent as the local area networks being
|
||
developed are becoming faster and supporting much larger MTU's.
|
||
Hyperchannel, for instances, has been stated to be able to support 1
|
||
Mega bit MTU's in their new line of products. With the current
|
||
implementation of TCP, efficient use of hyperchannel is not utilized
|
||
as it should because the physical mediums MTU is larger than the
|
||
maximum window of the protocol being used. By increasing the TCP
|
||
|
||
|
||
|
||
Fox [Page 10]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
window size, better utilization of networks like hyperchannel will be
|
||
gained instantly because the sender can send 64 Kbyte packets (IP
|
||
limitation) but not have to operate in a stop and wait fashion.
|
||
Future work is being started to increase the IP maximum datagram size
|
||
so that even better utilization of fast local area networks will be
|
||
seen by having the TCP/IP protocols being able to send large packets
|
||
over mediums with very large MTUs. This will hopefully, eliminate
|
||
the network protocol as the bottleneck in data transfers while
|
||
workstations and workstation file system technology advances even
|
||
more so, than it already has.
|
||
|
||
An area of concern when using the big window mechanism is the use of
|
||
machine resources. When running over a satellite and a packet is
|
||
dropped such that 2Z (where Z is the round trip delay) worth of data
|
||
is unacknowledged, both ends of the connection need to be able to
|
||
buffer the data using machine mbufs (or whatever mechanism the
|
||
machine uses), usually a valuable and scarce commodity. If the
|
||
window size is not chosen properly, some machines will crash when the
|
||
memory is all used up, or it will keep other parts of the system from
|
||
running. Thus, setting the window to some fairly large arbitrary
|
||
number is not a good idea, especially on a general purpose machine
|
||
where many users log on at any time. What is currently being
|
||
engineered at NASA is the ability for certain programs to use the
|
||
setsockopt feature or 4.3bsd asking to use big windows such that the
|
||
average user may not have access to the large windows, thus limiting
|
||
the use of big windows to applications that absolutely need them and
|
||
to protect a valuable system resource.
|
||
|
||
6. References
|
||
|
||
[1] Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 88,
|
||
Stanford, Ca., August 1988.
|
||
|
||
[2] Jacobson, V., and R. Braden, "TCP Extensions for Long-Delay
|
||
Paths", LBL, USC/Information Sciences Institute, RFC 1072,
|
||
October 1988.
|
||
|
||
[3] Cheriton, D., "VMTP: Versatile Message Transaction Protocol", RFC
|
||
1045, Stanford University, February 1988.
|
||
|
||
[4] Clark, D., M. Lambert, and L. Zhang, "NETBLT: A Bulk Data
|
||
Transfer Protocol", RFC 998, MIT, March 1987.
|
||
|
||
[5] Fox, R., "Draft of Proposed Solution for High Delay Circuit File
|
||
Transfer", GE/NAS Internal Document, March 1988.
|
||
|
||
[6] Postel, J., "Transmission Control Protocol - DARPA Internet
|
||
Program Protocol Specification", RFC 793, DARPA, September 1981.
|
||
|
||
|
||
|
||
Fox [Page 11]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
[7] Leiner, B., "Critical Issues in High Bandwidth Networking", RFC
|
||
1077, DARPA, November 1989.
|
||
|
||
7. Appendix
|
||
|
||
Both options have been implemented and tested. Contained in this
|
||
section is some performance gathered to support the use of these two
|
||
options. The satellite channel used was a 1.544 Mbit link with a
|
||
580ms round trip delay. All values are given as units of bytes.
|
||
|
||
|
||
TCP with Big Windows, No Naks:
|
||
|
||
|
||
|---------------transfer rates----------------------|
|
||
Window Size | no error | 10e-7 error rate | 10e-6 error rate |
|
||
-----------------------------------------------------------------
|
||
64K | 94K | 53K | 14K |
|
||
-----------------------------------------------------------------
|
||
72K | 106K | 51K | 15K |
|
||
-----------------------------------------------------------------
|
||
80K | 115K | 42K | 14K |
|
||
-----------------------------------------------------------------
|
||
92K | 115K | 43K | 14K |
|
||
-----------------------------------------------------------------
|
||
100K | 135K | 66K | 15K |
|
||
-----------------------------------------------------------------
|
||
112K | 126K | 53K | 17K |
|
||
-----------------------------------------------------------------
|
||
124K | 154K | 45K | 14K |
|
||
-----------------------------------------------------------------
|
||
136K | 160K | 66K | 15K |
|
||
-----------------------------------------------------------------
|
||
156K | 167K | 45K | 14K |
|
||
-----------------------------------------------------------------
|
||
Figure 1.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Fox [Page 12]
|
||
|
||
RFC 1106 TCP Big Window and Nak Options June 1989
|
||
|
||
|
||
TCP with Big Windows, and Naks:
|
||
|
||
|
||
|---------------transfer rates----------------------|
|
||
Window Size | no error | 10e-7 error rate | 10e-6 error rate |
|
||
-----------------------------------------------------------------
|
||
64K | 95K | 83K | 43K |
|
||
-----------------------------------------------------------------
|
||
72K | 104K | 87K | 49K |
|
||
-----------------------------------------------------------------
|
||
80K | 117K | 96K | 62K |
|
||
-----------------------------------------------------------------
|
||
92K | 124K | 119K | 39K |
|
||
-----------------------------------------------------------------
|
||
100K | 140K | 124K | 35K |
|
||
-----------------------------------------------------------------
|
||
112K | 151K | 126K | 53K |
|
||
-----------------------------------------------------------------
|
||
124K | 160K | 140K | 36K |
|
||
-----------------------------------------------------------------
|
||
136K | 167K | 148K | 38K |
|
||
-----------------------------------------------------------------
|
||
156K | 167K | 160K | 38K |
|
||
-----------------------------------------------------------------
|
||
Figure 2.
|
||
|
||
With a 10e-6 error rate, many naks as well as data packets were
|
||
dropped, causing the wild swing in transfer times. Also, please note
|
||
that the machines used are SGI Iris 2500 Turbos with the 3.6 OS with
|
||
the new TCP enhancements. The performance associated with the Irises
|
||
are slower than a Sun 3/260, but due to some source code restrictions
|
||
the Iris was used. Initial results on the Sun showed slightly higher
|
||
performance and less variance.
|
||
|
||
Author's Address
|
||
|
||
Richard Fox
|
||
950 Linden #208
|
||
Sunnyvale, Cal, 94086
|
||
|
||
EMail: rfox@tandem.com
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Fox [Page 13]
|
||
|