Porting PicoTCP WIP

2025-10-29 14:29:06 +01:00
parent 6722f42e68
commit 815c2239fe
464 changed files with 235009 additions and 24 deletions
--- a/kernel/picotcp/RFC/get_all_rfc
+++ b/kernel/picotcp/RFC/get_all_rfc
@ -0,0 +1,15 @@
+#!/bin/sh
+
+wget -O rfc4614.txt http://tools.ietf.org/rfc/rfc4614.txt
+
+
+for RFC in `grep "\[RFC" rfc4614.txt | sed -e "s/^.*RFC/rfc/" | grep -v "rfc \|rfc$" | sed -e "s/\].*$/.txt/g" |sort |uniq`; do
+	wget -O ${RFC} http://tools.ietf.org/rfc/${RFC}
+done
+
+wget -O rfc3927.txt http://tools.ietf.org/rfc/rfc3927.txt
+
+# Get PPP related RFC's
+for RFC in $(echo 1332 1334 1661 1662 1877 1994 | sed -r "s/[^ ]+/rfc&.txt/g"); do
+	wget -O ${RFC} http://tools.ietf.org/rfc/${RFC}
+done
--- a/kernel/picotcp/RFC/rfc0793.txt
+++ b/kernel/picotcp/RFC/rfc0793.txt
--- a/kernel/picotcp/RFC/rfc0813.txt
+++ b/kernel/picotcp/RFC/rfc0813.txt
--- a/kernel/picotcp/RFC/rfc0814.txt
+++ b/kernel/picotcp/RFC/rfc0814.txt
@ -0,0 +1,763 @@
+
+RFC:  814
+
+
+
+                   NAME, ADDRESSES, PORTS, AND ROUTES
+
+                             David D. Clark
+                  MIT Laboratory for Computer Science
+               Computer Systems and Communications Group
+                               July, 1982
+
+
+     1.  Introduction
+
+
+     It has been said that the principal function of an operating system
+
+is to define a number of different names for the same object, so that it
+
+can  busy  itself  keeping  track of the relationship between all of the
+
+different names.  Network protocols  seem  to  have  somewhat  the  same
+
+characteristic.    In  TCP/IP,  there  are  several ways of referring to
+
+things.  At the human visible  interface,  there  are  character  string
+
+"names"  to  identify  networks,  hosts,  and  services.  Host names are
+
+translated into network "addresses", 32-bit  values  that  identify  the
+
+network  to  which  a  host is attached, and the location of the host on
+
+that net.  Service names are translated into a "port identifier",  which
+
+in  TCP  is  a  16-bit  value.    Finally, addresses are translated into
+
+"routes", which are the sequence of steps a packet must  take  to  reach
+
+the  specified  addresses.  Routes show up explicitly in the form of the
+
+internet routing options, and also implicitly in the  address  to  route
+
+translation tables which all hosts and gateways maintain.
+
+
+     This  RFC  gives  suggestions  and  guidance  for the design of the
+
+tables and algorithms necessary to keep track of these various sorts  of
+
+identifiers inside a host implementation of TCP/IP.
+
+                                   2
+
+
+     2.  The Scope of the Problem
+
+
+     One  of the first questions one can ask about a naming mechanism is
+
+how many names one can expect to encounter.  In order to answer this, it
+
+is necessary to know something about the expected maximum  size  of  the
+
+internet.  Currently, the internet is fairly small.  It contains no more
+
+than  25  active  networks,  and no more than a few hundred hosts.  This
+
+makes it possible to install tables which exhaustively list all of these
+
+elements.  However, any implementation undertaken now should be based on
+
+an assumption of a much  larger  internet.    The  guidelines  currently
+
+recommended  are  an upper limit of about 1,000 networks.  If we imagine
+
+an average number of 25 hosts per net,  this  would  suggest  a  maximum
+
+number  of 25,000 hosts.  It is quite unclear whether this host estimate
+
+is high or low, but even if it is off by several  factors  of  two,  the
+
+resulting  number  is  still  large enough to suggest that current table
+
+management strategies are unacceptable.  Some fresh techniques  will  be
+
+required to deal with the internet of the future.
+
+
+     3.  Names
+
+
+     As the previous section suggests, the internet will eventually have
+
+a  sufficient  number  of  names  that a host cannot have a static table
+
+which provides a translation from every name to its associated  address.
+
+There  are  several  reasons  other than sheer size why a host would not
+
+wish to have such a table.  First, with that many names, we  can  expect
+
+names  to  be  added  and deleted at such a rate that an installer might
+
+spend all his time just revising the table.  Second, most of  the  names
+
+will  refer  to  addresses  of  machines with which nothing will ever be
+
+                                   3
+
+
+exchanged.  In fact, there may be whole networks with which a particular
+
+host will never have any traffic.
+
+
+     To  cope  with  this  large  and  somewhat dynamic environment, the
+
+internet is moving from its current position  in  which  a  single  name
+
+table  is  maintained  by  the  NIC  and  distributed to all hosts, to a
+
+distributed approach in which each network (or  group  of  networks)  is
+
+responsible  for maintaining its own names and providing a "name server"
+
+to translate between the names and the addresses in that network.   Each
+
+host   is   assumed   to  store  not  a  complete  set  of  name-address
+
+translations, but only a cache of recently used names.  When a  name  is
+
+provided  by  a  user for translation to an address, the host will first
+
+examine its local cache, and if  the  name  is  not  found  there,  will
+
+communicate  with  an appropriate name server to obtain the information,
+
+which it may then insert into its cache for future reference.
+
+
+     Unfortunately, the name server mechanism is not totally in place in
+
+the internet yet, so for the moment, it is necessary to continue to  use
+
+the  old  strategy of maintaining a complete table of all names in every
+
+host.  Implementors, however, should structure this table in such a  way
+
+that  it  is  easy  to  convert  later  to  a  name server approach.  In
+
+particular, a reasonable programming strategy would be to make the  name
+
+table  accessible  only  through  a subroutine interface, rather than by
+
+scattering direct references to the table all through the code.  In this
+
+way, it will be possible, at a later date,  to  replace  the  subroutine
+
+with one capable of making calls on remote name servers.
+
+
+     A  problem  which  occasionally arises in the ARPANET today is that
+
+                                   4
+
+
+the information in a local host table is out of date, because a host has
+
+moved,  and a revision of the host table has not yet been installed from
+
+the NIC.  In this case, one attempts to connect to a particular host and
+
+discovers an unexpected machine at the address obtained from  the  local
+
+table.    If  a  human is directly observing the connection attempt, the
+
+error  is  usually  detected  immediately.    However,  for   unattended
+
+operations  such as the sending of queued mail, this sort of problem can
+
+lead to a great deal of confusion.
+
+
+     The nameserver scheme will only make this problem worse,  if  hosts
+
+cache  locally  the  address associated with names that have been looked
+
+up, because the host has no way of knowing when the address has  changed
+
+and the cache entry should be removed.  To solve this problem, plans are
+
+currently  under  way  to  define  a simple facility by which a host can
+
+query a foreign address to determine what name  is  actually  associated
+
+with  it.    SMTP already defines a verification technique based on this
+
+approach.
+
+
+     4.  Addresses
+
+
+     The IP layer must know something about addresses.   In  particular,
+
+when  a datagram is being sent out from a host, the IP layer must decide
+
+where to send it on the immediately  connected  network,  based  on  the
+
+internet address.  Mechanically, the IP first tests the internet address
+
+to  see  whether  the network number of the recipient is the same as the
+
+network number of the sender.  If so, the packet can be sent directly to
+
+the final recipient.  If not, the datagram must be sent to a gateway for
+
+further forwarding.  In this latter case,  a  second  decision  must  be
+
+                                   5
+
+
+made, as there may be more than one gateway available on the immediately
+
+attached network.
+
+
+     When  the  internet address format was first specified, 8 bits were
+
+reserved  to  identify  the  network.     Early   implementations   thus
+
+implemented  the  above  algorithm by means of a table with 256 entries,
+
+one for each possible net, that specified the gateway of choice for that
+
+net, with a special case entry for those nets  to  which  the  host  was
+
+immediately connected.  Such tables were sometimes statically filled in,
+
+which caused confusion and malfunctions when gateways and networks moved
+
+(or crashed).
+
+
+     The  current  definition  of  the  internet  address provides three
+
+different options for network numbering, with the  goal  of  allowing  a
+
+very  large  number of networks to be part of the internet.  Thus, it is
+
+no longer possible to imagine having an exhaustive  table  to  select  a
+
+gateway  for any foreign net.  Again, current implementations must use a
+
+strategy based on a local cache of  routing  information  for  addresses
+
+currently being used.
+
+
+     The  recommended  strategy  for  address to route translation is as
+
+follows.    When  the  IP  layer  receives  an  outbound  datagram   for
+
+transmission,  it  extracts  the  network  number  from  the destination
+
+address, and queries its local table to determine  whether  it  knows  a
+
+suitable  gateway to which to send the datagram.  If it does, the job is
+
+done.    (But  see  RFC  816  on  Fault  Isolation  and  Recovery,   for
+
+recommendations  on  how  to  deal  with  the  possible  failure  of the
+
+gateway.)  If there is no such entry in the local table, then select any
+
+                                   6
+
+
+accessible  gateway at random, insert that as an entry in the table, and
+
+use it to send the packet.  Either the guess will be right or wrong.  If
+
+it is wrong, the gateway to which the packet was  sent  will  return  an
+
+ICMP  redirect message to report that there is a better gateway to reach
+
+the net in question.  The arrival  of  this  redirect  should  cause  an
+
+update of the local table.
+
+
+     The  number  of  entries in the local table should be determined by
+
+the maximum number of active connections which this particular host  can
+
+support  at  any  one  time.  For a large time sharing system, one might
+
+imagine a table with 100 or more entries.  For a personal computer being
+
+used to support a single user telnet connection,  only  one  address  to
+
+gateway association need be maintained at once.
+
+
+     The  above strategy actually does not completely solve the problem,
+
+but only pushes it down one level, where the problem then arises of  how
+
+a  new  host,  freshly  arriving  on  the  internet,  finds  all  of its
+
+accessible gateways.  Intentionally, this problem is not  solved  within
+
+the  internetwork  architecture.   The reason is that different networks
+
+have drastically different strategies for allowing a host  to  find  out
+
+about  other  hosts  on  its  immediate  network.    Some  nets permit a
+
+broadcast mechanism.  In this case, a host can send out  a  message  and
+
+expect  an  answer  back  from  all  of the attached gateways.  In other
+
+cases, where a particular network  is  richly  provided  with  tools  to
+
+support  the  internet, there may be a special network mechanism which a
+
+host can invoke to determine where the gateways are.  In other cases, it
+
+may be necessary for an installer to manually provide  the  name  of  at
+
+                                   7
+
+
+least  one  accessible  gateway.  Once a host has discovered the name of
+
+one gateway, it can build up a table of all other available gateways, by
+
+keeping track of every gateway that has been reported back to it  in  an
+
+ICMP message.
+
+
+     5.  Advanced Topics in Addressing and Routing
+
+
+     The  preceding  discussion  describes  the  mechanism required in a
+
+minimal implementation,  an  implementation  intended  only  to  provide
+
+operational  service  access  today to the various networks that make up
+
+the internet.  For any host which will participate in  future  research,
+
+as  contrasted  with  service,  some  additional  features are required.
+
+These features will also be helpful for service hosts if  they  wish  to
+
+obtain access to some of the more exotic networks which will become part
+
+of  the internet over the next few years.  All implementors are urged to
+
+at least provide a structure into which these features  could  be  later
+
+integrated.
+
+
+     There   are   several  features,  either  already  a  part  of  the
+
+architecture or now under development,  which  are  used  to  modify  or
+
+expand  the  relationships  between addresses and routes.  The IP source
+
+route options allow a host to explicitly direct  a  datagram  through  a
+
+series of gateways to its foreign host.  An alternative form of the ICMP
+
+redirect  packet  has  been  proposed,  which  would  return information
+
+specific to a  particular  destination  host,  not  a  destination  net.
+
+Finally, additional IP options have been proposed to identify particular
+
+routes  within  the internet that are unacceptable.  The difficulty with
+
+implementing these new features  is  that  the  mechanisms  do  not  lie
+
+                                   8
+
+
+entirely within the bounds of IP.  All the mechanisms above are designed
+
+to apply to a particular connection, so that their use must be specified
+
+at  the  TCP level.  Thus, the interface between IP and the layers above
+
+it must include mechanisms to allow passing this  information  back  and
+
+forth,  and TCP (or any other protocol at this level, such as UDP), must
+
+be prepared to store this  information.    The  passing  of  information
+
+between IP and TCP is made more complicated by the fact that some of the
+
+information,  in  particular  ICMP packets, may arrive at any time.  The
+
+normal interface envisioned between TCP  and  IP  is  one  across  which
+
+packets  can  be  sent  or received.  The existence of asynchronous ICMP
+
+messages implies that there must be an additional  channel  between  the
+
+two,  unrelated  to the actual sending and receiving of data.  (In fact,
+
+there are many other ICMP messages which arrive asynchronously and which
+
+must be passed from IP  up  to  higher  layers.    See  RFC  816,  Fault
+
+Isolation and Recovery.)
+
+
+     Source  routes  are  already  in  use  in  the  internet,  and many
+
+implementations will wish to be able to take advantage  of  them.    The
+
+following  sorts  of  usages  should  be permitted.  First, a user, when
+
+initiating a TCP connection, should be able to hand a source route  into
+
+TCP,  which in turn must hand the source route to IP with every outgoing
+
+datagram.  The user might initially obtain the source route by  querying
+
+a  different  sort  of  name  server,  which would return a source route
+
+instead of an address, or the user may have fabricated the source  route
+
+manually.    A  TCP  which  is  listening  for a connection, rather than
+
+attempting to open one, must be prepared to  receive  a  datagram  which
+
+contains  a  IP return route, in which case it must remember this return
+
+route, and use it as a source route on all returning datagrams.
+
+                                   9
+
+
+     6.  Ports and Service Identifiers
+
+
+     The  IP  layer of the architecture contains the address information
+
+which specifies the destination host to  which  the  datagram  is  being
+
+sent.    In  fact, datagrams are not intended just for particular hosts,
+
+but for particular agents within a host,  processes  or  other  entities
+
+that  are  the  actual  source and sink of the data.  IP performs only a
+
+very simple dispatching once the datagram  has  arrived  at  the  target
+
+host,   it   dispatches  it  to  a  particular  protocol.    It  is  the
+
+responsibility of that protocol handler,  for  example  TCP,  to  finish
+
+dispatching  the  datagram  to the particular connection for which it is
+
+destined.    This  next  layer  of  dispatching  is  done  using   "port
+
+identifiers",  which  are  a  part  of  the  header  of the higher level
+
+protocol, and not the IP layer.
+
+
+     This two-layer dispatching architecture has caused  a  problem  for
+
+certain  implementations.    In  particular,  some  implementations have
+
+wished to put the IP layer within the kernel of  the  operating  system,
+
+and  the  TCP  layer  as  a  user  domain  application  program.  Strict
+
+adherence to this partitioning can lead to grave  performance  problems,
+
+for  the  datagram  must  first  be  dispatched from the kernel to a TCP
+
+process, which then dispatches the datagram  to  its  final  destination
+
+process.   The overhead of scheduling this dispatch process can severely
+
+limit the achievable throughput of the implementation.
+
+
+     As is discussed in RFC 817, Modularity and Efficiency  in  Protocol
+
+Implementations,  this  particular  separation  between  kernel and user
+
+leads to other performance problems, even ignoring  the  issue  of  port
+
+                                   10
+
+
+level  dispatching.   However, there is an acceptable shortcut which can
+
+be taken to move the higher  level  dispatching  function  into  the  IP
+
+layer, if this makes the implementation substantially easier.
+
+
+     In  principle,  every  higher level protocol could have a different
+
+dispatching  algorithm.    The  reason  for  this  is  discussed  below.
+
+However,  for  the  protocols  involved  in  the  service offering being
+
+implemented today, TCP and UDP, the dispatching algorithm is exactly the
+
+same, and the port field is located in precisely the same place  in  the
+
+header.  Therefore, unless one is interested in participating in further
+
+protocol  research,  there  is only one higher level dispatch algorithm.
+
+This algorithm takes into account the internet  level  foreign  address,
+
+the protocol number, and the local port and foreign port from the higher
+
+level  protocol  header.  This algorithm can be implemented as a sort of
+
+adjunct to the IP layer implementation, as long as no other higher level
+
+protocols are to be implemented.  (Actually, the above statement is only
+
+partially true, in that the UDP dispatch function is subset of  the  TCP
+
+dispatch  function.  UDP dispatch depends only protocol number and local
+
+port.  However, there is an occasion within TCP  when  this  exact  same
+
+subset comes into play, when a process wishes to listen for a connection
+
+from  any  foreign  host.    Thus,  the range of mechanisms necessary to
+
+support TCP dispatch are also sufficient to support  precisely  the  UDP
+
+requirement.)
+
+
+     The decision to remove port level dispatching from IP to the higher
+
+level  protocol  has  been questioned by some implementors.  It has been
+
+argued that if all of the address structure were part of the  IP  layer,
+
+                                   11
+
+
+then IP could do all of the packet dispatching function within the host,
+
+which  would  lead  to  a  simpler  modularity.    Three  problems  were
+
+identified with this.  First, not all protocol implementors could  agree
+
+on  the  size  of the port identifier.  TCP selected a fairly short port
+
+identifier, 16 bits, to reduce  header  size.    Other  protocols  being
+
+designed,  however, wanted a larger port identifier, perhaps 32 bits, so
+
+that the port identifier, if  properly  selected,  could  be  considered
+
+probabilistically  unique.    Thus,  constraining  the  port  id  to one
+
+particular IP level mechanism would prevent certain  fruitful  lines  of
+
+research.    Second,  ports  serve  a  special  function  in addition to
+
+datagram delivery:   certain  port  numbers  are  reserved  to  identify
+
+particular services.  Thus, TCP port 23 is the remote login service.  If
+
+ports  were  implemented  at  the  IP level, then the assignment of well
+
+known ports could not be done on a protocol basis, but would have to  be
+
+done  in a centralized manner for all of the IP architecture.  Third, IP
+
+was designed with a very simple layering role:    IP  contained  exactly
+
+those functions that the gateways must understand.  If the port idea had
+
+been  made a part of the IP layer, it would have suggested that gateways
+
+needed to know about ports, which is not the case.
+
+
+     There are, of course, other ways  to  avoid  these  problems.    In
+
+particular,  the  "well-known  port" problem can be solved by devising a
+
+second mechanism, distinct from port  dispatching,  to  name  well-known
+
+ports.   Several protocols have settled on the idea of including, in the
+
+packet which sets up a  connection  to  a  particular  service,  a  more
+
+general  service  descriptor,  such  as a character string field.  These
+
+special  packets,  which  are  requesting  connection  to  a  particular
+
+                                   12
+
+
+service,  are  routed on arrival to a special server, sometimes called a
+
+"rendezvous server", which  examines  the  service  request,  selects  a
+
+random  port  which  is to be used for this instance of the service, and
+
+then passes the packet along to  the  service  itself  to  commence  the
+
+interaction.
+
+
+     For  the  internet architecture, this strategy had the serious flaw
+
+that it presumed all protocols would fit into the same service paradigm:
+
+an initial setup phase, which might contain a certain overhead  such  as
+
+indirect routing through a rendezvous server, followed by the packets of
+
+the  interaction  itself,  which  would  flow  directly  to  the process
+
+providing the service.  Unfortunately, not all high level  protocols  in
+
+internet  were  expected to fit this model.  The best example of this is
+
+isolated datagram exchange using UDP.  The simplest exchange in  UDP  is
+
+one process sending a single datagram to another.  Especially on a local
+
+net,  where  the  net  related overhead is very low, this kind of simple
+
+single datagram interchange can be extremely efficient,  with  very  low
+
+overhead  in  the  hosts.  However, since these individual packets would
+
+not be part of an established connection, if  IP  supported  a  strategy
+
+based  on  a  rendezvous  server and service descriptors, every isolated
+
+datagram would have to  be  routed  indirectly  in  the  receiving  host
+
+through  the  rendezvous  server, which would substantially increase the
+
+overhead of processing, and every datagram would have to carry the  full
+
+service  request  field,  which  would  increase  the size of the packet
+
+header.
+
+
+     In general, if a network is intended for "virtual circuit service",
+
+                                   13
+
+
+or  things similar to that, then using a special high overhead mechanism
+
+for circuit setup makes sense.  However, current directions in  research
+
+are  leading  away  from  this  class  of  protocol,  so  once again the
+
+architecture  was  designed  not  to   preclude   alternative   protocol
+
+structures.    The  only  rational  position  was  that  the  particular
+
+dispatching strategy used should be part of the  higher  level  protocol
+
+design, not the IP layer.
+
+
+     This  same  argument about circuit setup mechanisms also applies to
+
+the design of the IP address structure.  Many protocols do not  transmit
+
+a  full  address  field  as  part of every packet, but rather transmit a
+
+short identifier which is created as part of a circuit setup from source
+
+to destination.  If the full address needs to be  carried  in  only  the
+
+first  packet  of  a long exchange, then the overhead of carrying a very
+
+long address field can easily be justified.  Under these  circumstances,
+
+one  can  create  truly extravagant address fields, which are capable of
+
+extending to address almost  any  conceivable  entity.    However,  this
+
+strategy  is  useable  only  in a virtual circuit net, where the packets
+
+being transmitted are part of a  established  sequence,  otherwise  this
+
+large  extravagant  address  must be transported on every packet.  Since
+
+Internet explicitly rejected this restriction on  the  architecture,  it
+
+was  necessary  to come up with an address field that was compact enough
+
+to be sent in every datagram, but general enough to correctly route  the
+
+datagram  through  the  catanet  without a previous setup phase.  The IP
+
+address of 32 bits is the compromise that results.  Clearly it  requires
+
+a  substantial  amount  of shoehorning to address all of the interesting
+
+places in the universe with only 32 bits.  On the other  hand,  had  the
+
+                                   14
+
+
+address  field  become  much  bigger,  IP would have been susceptible to
+
+another criticism, which is that the header had grown unworkably  large.
+
+Again, the fundamental design decision was that the protocol be designed
+
+in  such  a way that it supported research in new and different sorts of
+
+protocol architectures.
+
+
+     There are some limited restrictions imposed by the IP design on the
+
+port mechanism selected by the higher level  process.    In  particular,
+
+when  a packet goes awry somewhere on the internet, the offending packet
+
+is returned, along with an error indication, as part of an ICMP  packet.
+
+An  ICMP  packet  returns only the IP layer, and the next 64 bits of the
+
+original datagram.  Thus, any higher level protocol which wishes to sort
+
+out from which port a particular offending datagram came must make  sure
+
+that  the  port information is contained within the first 64 bits of the
+
+next level header.  This also means, in most cases, that it is  possible
+
+to  imagine,  as  part  of the IP layer, a port dispatch mechanism which
+
+works by masking and matching on the  first  64  bits  of  the  incoming
+
+higher level header.
+
+
--- a/kernel/picotcp/RFC/rfc0816.txt
+++ b/kernel/picotcp/RFC/rfc0816.txt
@ -0,0 +1,648 @@
+
+
+RFC:  816
+
+
+
+                      FAULT ISOLATION AND RECOVERY
+
+                             David D. Clark
+                  MIT Laboratory for Computer Science
+               Computer Systems and Communications Group
+                               July, 1982
+
+
+     1.  Introduction
+
+
+     Occasionally, a network or a gateway will go down, and the sequence
+
+of  hops  which the packet takes from source to destination must change.
+
+Fault isolation is that action which  hosts  and  gateways  collectively
+
+take  to  determine  that  something  is  wrong;  fault  recovery is the
+
+identification and selection of an alternative route which will serve to
+
+reconnect the source to the destination.  In fact, the gateways  perform
+
+most  of  the  functions  of  fault  isolation and recovery.  There are,
+
+however, a few actions which hosts must take if they wish to  provide  a
+
+reasonable  level  of  service.   This document describes the portion of
+
+fault isolation and recovery which is the responsibility of the host.
+
+
+     2.  What Gateways Do
+
+
+     Gateways collectively implement an algorithm which  identifies  the
+
+best  route  between  all pairs of networks.  They do this by exchanging
+
+packets  which  contain  each  gateway's  latest   opinion   about   the
+
+operational status of its neighbor networks and gateways.  Assuming that
+
+this  algorithm is operating properly, one can expect the gateways to go
+
+through a period of confusion immediately after some network or  gateway
+
+                                   2
+
+
+has  failed,  but  one  can assume that once a period of negotiation has
+
+passed, the gateways are equipped with a consistent and correct model of
+
+the connectivity of the internet.  At present this period of negotiation
+
+may actually take several minutes, and many TCP implementations time out
+
+within that period, but it is a design goal of  the  eventual  algorithm
+
+that  the  gateway  should  be  able to reconstruct the topology quickly
+
+enough that a TCP connection should be able to survive a failure of  the
+
+route.
+
+
+     3.  Host Algorithm for Fault Recovery
+
+
+     Since  the gateways always attempt to have a consistent and correct
+
+model of the internetwork topology, the host strategy for fault recovery
+
+is very simple.  Whenever the host feels that  something  is  wrong,  it
+
+asks the gateway for advice, and, assuming the advice is forthcoming, it
+
+believes  the  advice  completely.  The advice will be wrong only during
+
+the transient  period  of  negotiation,  which  immediately  follows  an
+
+outage, but will otherwise be reliably correct.
+
+
+     In  fact,  it  is  never  necessary  for a host to explicitly ask a
+
+gateway for advice, because the gateway will provide it as  appropriate.
+
+When  a  host  sends  a datagram to some distant net, the host should be
+
+prepared to receive back either  of  two  advisory  messages  which  the
+
+gateway  may  send.    The  ICMP  "redirect"  message indicates that the
+
+gateway to which the host sent the  datagram  is  not  longer  the  best
+
+gateway  to  reach the net in question.  The gateway will have forwarded
+
+the datagram, but the host should revise its routing  table  to  have  a
+
+different  immediate  address  for  this  net.    The  ICMP "destination
+
+                                   3
+
+
+unreachable"  message  indicates  that  as  a result of an outage, it is
+
+currently impossible to reach the addressed net or host in  any  manner.
+
+On  receipt  of  this  message, a host can either abandon the connection
+
+immediately without any further retransmission, or resend slowly to  see
+
+if the fault is corrected in reasonable time.
+
+
+     If  a  host  could assume that these two ICMP messages would always
+
+arrive when something was amiss in the network, then no other action  on
+
+the  part  of the host would be required in order maintain its tables in
+
+an optimal condition.  Unfortunately, there are two circumstances  under
+
+which  the  messages  will  not  arrive  properly.    First,  during the
+
+transient following a failure, error messages may  arrive  that  do  not
+
+correctly  represent  the  state of the world.  Thus, hosts must take an
+
+isolated error message with some scepticism.  (This transient period  is
+
+discussed  more  fully  below.)    Second,  if the host has been sending
+
+datagrams to a particular gateway, and that gateway itself crashes, then
+
+all the other gateways in the internet will  reconstruct  the  topology,
+
+but  the  gateway  in  question will still be down, and therefore cannot
+
+provide any advice back to the host.  As long as the host  continues  to
+
+direct  datagrams at this dead gateway, the datagrams will simply vanish
+
+off the face of the earth, and nothing will come back in return.   Hosts
+
+must detect this failure.
+
+
+     If some gateway many hops away fails, this is not of concern to the
+
+host, for then the discovery of the failure is the responsibility of the
+
+immediate  neighbor gateways, which will perform this action in a manner
+
+invisible to the host.  The  problem  only  arises  if  the  very  first
+
+                                   4
+
+
+gateway, the one to which the host is immediately sending the datagrams,
+
+fails.   We thus identify one single task which the host must perform as
+
+its part of fault isolation in the internet:  the  host  must  use  some
+
+strategy  to  detect  that a gateway to which it is sending datagrams is
+
+dead.
+
+
+     Let us  assume  for  the  moment  that  the  host  implements  some
+
+algorithm  to  detect  failed  gateways; we will return later to discuss
+
+what this algorithm might be.  First, let  us  consider  what  the  host
+
+should  do  when it has determined that a gateway is down. In fact, with
+
+the exception of one small problem, the action the host should  take  is
+
+extremely  simple.    The host should select some other gateway, and try
+
+sending the datagram to it.  Assuming that  gateway  is  up,  this  will
+
+either  produce  correct  results, or some ICMP advice.  Since we assume
+
+that, ignoring temporary periods immediately following  an  outage,  any
+
+gateway  is capable of giving correct advice, once the host has received
+
+advice from any gateway, that host is in as good a condition as  it  can
+
+hope to be.
+
+
+     There is always the unpleasant possibility that when the host tries
+
+a different gateway, that gateway too will be down.  Therefore, whatever
+
+algorithm  the  host  uses to detect a dead gateway must continuously be
+
+applied, as the host tries every gateway in turn that it knows about.
+
+
+     The only difficult part of this algorithm is to specify  the  means
+
+by which the host maintains the table of all of the gateways to which it
+
+has  immediate  access.    Currently,  the specification of the internet
+
+protocol does not architect any message by which a host can  ask  to  be
+
+                                   5
+
+
+supplied  with  such a table.  The reason is that different networks may
+
+provide very different mechanisms by which this table can be filled  in.
+
+For  example,  if  the  net is a broadcast net, such as an ethernet or a
+
+ringnet, every gateway may simply broadcast such a table  from  time  to
+
+time,  and  the  host  need do nothing but listen to obtain the required
+
+information.  Alternatively, the network may provide  the  mechanism  of
+
+logical  addressing,  by  which  a whole set of machines can be provided
+
+with a single group  address,  to  which  a  request  can  be  sent  for
+
+assistance.   Failing those two schemes, the host can build up its table
+
+of neighbor gateways by remembering all the gateways from which  it  has
+
+ever received a message.  Finally, in certain cases, it may be necessary
+
+for  this  table,  or  at  least the initial entries in the table, to be
+
+constructed manually by a manager or operator at the  site.    In  cases
+
+where  the  network  in question provides absolutely no support for this
+
+kind of host query, at least some manual intervention will  be  required
+
+to  get  started,  so  that  the  host  can  find out about at least one
+
+gateway.
+
+
+     4.  Host Algorithms for Fault Isolation
+
+
+     We now return to the question raised above.  What  strategy  should
+
+the  host use to detect that it is talking to a dead gateway, so that it
+
+can know to switch to some other gateway in the list. In fact, there are
+
+several algorithms which can be used.   All  are  reasonably  simple  to
+
+implement, but they have very different implications for the overhead on
+
+the  host, the gateway, and the network.  Thus, to a certain extent, the
+
+algorithm picked must depend on the details of the network  and  of  the
+
+host.
+
+                                   6
+
+
+
+1.  NETWORK LEVEL DETECTION
+
+
+     Many  networks,  particularly  the  Arpanet,  perform precisely the
+
+required function internal to the network.  If a host sends  a  datagram
+
+to  a dead gateway on the Arpanet, the network will return a "host dead"
+
+message, which is precisely the information the host needs  to  know  in
+
+order  to  switch  to  another  gateway.   Some early implementations of
+
+Internet on  the  Arpanet  threw  these  messages  away.    That  is  an
+
+exceedingly poor idea.
+
+
+2.  CONTINUOUS POLLING
+
+
+     The  ICMP  protocol  provides an echo mechanism by which a host may
+
+solicit a response from a gateway.    A  host  could  simply  send  this
+
+message  at  a  reasonable  rate, to assure itself continuously that the
+
+gateway was still up.  This works, but, since the message must  be  sent
+
+fairly  often  to  detect  a fault in a reasonable time, it can imply an
+
+unbearable overhead on the host itself, the network,  and  the  gateway.
+
+This  strategy  is  prohibited  except  where  a  specific  analysis has
+
+indicated that the overhead is tolerable.
+
+
+3.  TRIGGERED POLLING
+
+
+     If the use of polling could be restricted to only those times  when
+
+something  seemed  to  be  wrong,  then  the overhead would be bearable.
+
+Provided that one can get the proper  advice  from  one's  higher  level
+
+protocols,  it  is  possible to implement such a strategy.  For example,
+
+one could program the TCP level so  that  whenever  it  retransmitted  a
+
+                                   7
+
+
+segment  more  than  once,  it  sent  a  hint down to the IP layer which
+
+triggered polling.  This strategy does not have excessive overhead,  but
+
+does  have  the problem that the host may be somewhat slow to respond to
+
+an error, since only after polling has started will the host be able  to
+
+confirm  that  something  has  gone wrong, and by then the TCP above may
+
+have already timed out.
+
+
+     Both forms of polling suffer from a minor flaw.  Hosts as  well  as
+
+gateways respond to ICMP echo messages.  Thus, polling cannot be used to
+
+detect  the  error  that  a  foreign  address thought to be a gateway is
+
+actually a host.  Such a confusion can arise if the  physical  addresses
+
+of machines are rearranged.
+
+
+4.  TRIGGERED RESELECTION
+
+
+     There  is a strategy which makes use of a hint from a higher level,
+
+as did the previous  strategy,  but  which  avoids  polling  altogether.
+
+Whenever  a  higher  level  complains  that  the  service  seems  to  be
+
+defective, the Internet layer can pick the next gateway from the list of
+
+available gateways, and switch to it.  Assuming that this gateway is up,
+
+no real harm can come of this decision, even if it was  wrong,  for  the
+
+worst that will happen is a redirect message which instructs the host to
+
+return to the gateway originally being used.  If, on the other hand, the
+
+original  gateway  was indeed down, then this immediately provides a new
+
+route, so the period of time until recovery is  shortened.    This  last
+
+strategy  seems  particularly clever, and is probably the most generally
+
+suitable for those cases where the network itself does not provide fault
+
+isolation.  (Regretably, I have forgotten who suggested this idea to me.
+
+It is not my invention.)
+
+                                   8
+
+
+     5.  Higher Level Fault Detection
+
+
+     The  previous  discussion  has  concentrated on fault detection and
+
+recovery at the IP layer.  This section considers what the higher layers
+
+such as TCP should do.
+
+
+     TCP has a single fault recovery action; it repeatedly retransmits a
+
+segment until either it gets an acknowledgement or its connection  timer
+
+expires.    As discussed above, it may use retransmission as an event to
+
+trigger a request for fault recovery to the IP  layer.    In  the  other
+
+direction,  information  may  flow  up from IP, reporting such things as
+
+ICMP  Destination  Unreachable  or  error  messages  from  the  attached
+
+network.    The  only  subtle  question about TCP and faults is what TCP
+
+should do when such an error message arrives  or  its  connection  timer
+
+expires.
+
+
+     The  TCP  specification discusses the timer.  In the description of
+
+the open call, the timeout is described as an optional  value  that  the
+
+client  of  TCP  may  specify; if any segment remains unacknowledged for
+
+this period, TCP should abort the  connection.    The  default  for  the
+
+timeout  is  30 seconds.  Early TCPs were often implemented with a fixed
+
+timeout interval, but this  did  not  work  well  in  practice,  as  the
+
+following discussion may suggest.
+
+
+     Clients  of  TCP can be divided into two classes:  those running on
+
+immediate behalf of a human, such as  Telnet,  and  those  supporting  a
+
+program, such as a mail sender.  Humans require a sophisticated response
+
+to  errors.    Depending  on  exactly  what went wrong, they may want to
+
+                                   9
+
+
+abandon the connection at once, or wait for a long time to see if things
+
+get  better.   Programs do not have this human impatience, but also lack
+
+the power to make complex decisions based on details of the exact  error
+
+condition.  For them, a simple timeout is reasonable.
+
+
+     Based  on these considerations, at least two modes of operation are
+
+needed in TCP.  One,  for  programs,  abandons  the  connection  without
+
+exception  if  the  TCP  timer  expires.    The other mode, suitable for
+
+people, never abandons the connection on its own initiative, but reports
+
+to the layer above when the timer expires.  Thus, the human user can see
+
+error messages coming from all the relevant layers, TCP  and  ICMP,  and
+
+can request TCP to abort as appropriate.  This second mode requires that
+
+TCP  be  able to send an asynchronous message up to its client to report
+
+the timeout, and it requires  that  error  messages  arriving  at  lower
+
+layers similarly flow up through TCP.
+
+
+     At  levels  above TCP, fault detection is also required.  Either of
+
+the following can happen.  First, the foreign client of  TCP  can  fail,
+
+even  though TCP is still running, so data is still acknowledged and the
+
+timer never expires.  Alternatively, the communication  path  can  fail,
+
+without the TCP timer going off, because the local client has no data to
+
+send.  Both of these have caused trouble.
+
+
+     Sending  mail  provides an example of the first case.  When sending
+
+mail using SMTP, there is an SMTP level acknowledgement that is returned
+
+when a piece of mail is successfully  delivered.    Several  early  mail
+
+receiving programs would crash just at the point where they had received
+
+all of the mail text (so TCP did not detect a timeout due to outstanding
+
+                                   10
+
+
+unacknowledged  data)  but  before the mail was acknowledged at the SMTP
+
+level.  This failure would cause early mail senders to wait forever  for
+
+the  SMTP level acknowledgement.  The obvious cure was to set a timer at
+
+the SMTP level, but the first attempt to do this did not work, for there
+
+was no simple way to  select  the  timer  interval.    If  the  interval
+
+selected  was  short,  it  expired  in normal operational when sending a
+
+large file to a slow host.  An interval of many minutes  was  needed  to
+
+prevent  false timeouts, but that meant that failures were detected only
+
+very slowly.  The current solution in  several  mailers  is  to  pick  a
+
+timeout interval proportional to the size of the message.
+
+
+     Server telnet provides an example of the other kind of failure.  It
+
+can  easily  happen that the communications link can fail while there is
+
+no traffic flowing, perhaps because the user is thinking.    Eventually,
+
+the  user will attempt to type something, at which time he will discover
+
+that the connection is dead and abort it.   But  the  host  end  of  the
+
+connection,  having  nothing  to send, will not discover anything wrong,
+
+and will remain waiting forever.  In some systems there is no way for  a
+
+user  in  a  different  process  to  destroy or take over such a hanging
+
+process, so there is no way to recover.
+
+
+     One solution to this would be to have the host server telnet  query
+
+the  user  end now and then, to see if it is still up.  (Telnet does not
+
+have an explicit query  feature,  but  the  host  could  negotiate  some
+
+unimportant   option,   which   should   produce   either  agreement  or
+
+disagreement in  return.)    The  only  problem  with  this  is  that  a
+
+reasonable  sample interval, if applied to every user on a large system,
+
+                                   11
+
+
+can  generate  an unacceptable amount of traffic and system overhead.  A
+
+smart server telnet would use  this  query  only  when  something  seems
+
+wrong, perhaps when there had been no user activity for some time.
+
+
+     In  both  these  cases, the general conclusion is that client level
+
+error detection is needed, and that the details  of  the  mechanism  are
+
+very dependent on the application.  Application programmers must be made
+
+aware  of  the  problem  of  failures,  and  must  understand that error
+
+detection at the TCP or lower level cannot solve the whole  problem  for
+
+them.
+
+
+     6.  Knowing When to Give Up
+
+
+     It  is  not  obvious,  when error messages such as ICMP Destination
+
+Unreachable arrive, whether TCP should  abandon  the  connection.    The
+
+reason  that  error  messages  are  difficult  to  interpret is that, as
+
+discussed above, after a failure of a gateway or  network,  there  is  a
+
+transient   period   during   which  the  gateways  may  have  incorrect
+
+information,  so  that  irrelevant  or  incorrect  error  messages   may
+
+sometimes  return.   An isolated ICMP Destination Unreachable may arrive
+
+at a host, for example, if a packet is sent during the period  when  the
+
+gateways  are  trying  to find a new route.  To abandon a TCP connection
+
+based on such a message arriving would be to ignore the valuable feature
+
+of the Internet that for many  internal  failures  it  reconstructs  its
+
+function without any disruption of the end points.
+
+
+     But  if failure messages do not imply a failure, what are they for?
+
+In fact, error messages serve several important  purposes.    First,  if
+
+                                   12
+
+
+they  arrive  in response to opening a new connection, they probably are
+
+caused by opening the connection improperly  (e.g.,  to  a  non-existent
+
+address)  rather  than  by  a  transient  network failure.  Second, they
+
+provide valuable information, after the TCP timeout has occurred, as  to
+
+the  probable  cause of the failure.  Finally, certain messages, such as
+
+ICMP Parameter Problem, imply a possible  implementation  problem.    In
+
+general, error messages give valuable information about what went wrong,
+
+but  are  not  to  be  taken as absolutely reliable.  A general alerting
+
+mechanism, such as the TCP timeout  discussed  above,  provides  a  good
+
+indication  that  whatever  is wrong is a serious condition, but without
+
+the advisory messages to augment the timer, there  is  no  way  for  the
+
+client  to  know  how  to  respond to the error.  The combination of the
+
+timer and the advice from the error messages provide a reasonable set of
+
+facts for the client layer to have.  It is important that error messages
+
+from all layers be passed up to  the  client  module  in  a  useful  and
+
+consistent way.
+
+
+-------
--- a/kernel/picotcp/RFC/rfc0817.txt
+++ b/kernel/picotcp/RFC/rfc0817.txt
--- a/kernel/picotcp/RFC/rfc0826.txt
+++ b/kernel/picotcp/RFC/rfc0826.txt
@ -0,0 +1,470 @@
+Network Working Group                                   David C. Plummer 
+Request For Comments:  826                                  (DCP@MIT-MC)
+                                                           November 1982
+
+
+	     An Ethernet Address Resolution Protocol
+                            -- or --
+              Converting Network Protocol Addresses
+                   to 48.bit Ethernet Address
+                       for Transmission on
+                        Ethernet Hardware
+
+
+
+
+
+			    Abstract
+
+The implementation of protocol P on a sending host S decides,
+through protocol P's routing mechanism, that it wants to transmit
+to a target host T located some place on a connected piece of
+10Mbit Ethernet cable.  To actually transmit the Ethernet packet
+a 48.bit Ethernet address must be generated.  The addresses of
+hosts within protocol P are not always compatible with the
+corresponding Ethernet address (being different lengths or
+values).  Presented here is a protocol that allows dynamic
+distribution of the information needed to build tables to
+translate an address A in protocol P's address space into a
+48.bit Ethernet address.
+
+Generalizations have been made which allow the protocol to be
+used for non-10Mbit Ethernet hardware.  Some packet radio
+networks are examples of such hardware.
+
+--------------------------------------------------------------------
+
+The protocol proposed here is the result of a great deal of
+discussion with several other people, most notably J. Noel
+Chiappa, Yogen Dalal, and James E. Kulp, and helpful comments
+from David Moon.
+
+
+
+
+[The purpose of this RFC is to present a method of Converting
+Protocol Addresses (e.g., IP addresses) to Local Network
+Addresses (e.g., Ethernet addresses).  This is a issue of general
+concern in the ARPA Internet community at this time.  The
+method proposed here is presented for your consideration and
+comment.  This is not the specification of a Internet Standard.]
+
+Notes:
+------       
+
+This protocol was originally designed for the DEC/Intel/Xerox
+10Mbit Ethernet.  It has been generalized to allow it to be used
+for other types of networks.  Much of the discussion will be
+directed toward the 10Mbit Ethernet.  Generalizations, where
+applicable, will follow the Ethernet-specific discussion.
+
+DOD Internet Protocol will be referred to as Internet.
+
+Numbers here are in the Ethernet standard, which is high byte
+first.  This is the opposite of the byte addressing of machines
+such as PDP-11s and VAXes.  Therefore, special care must be taken
+with the opcode field (ar$op) described below.
+
+An agreed upon authority is needed to manage hardware name space
+values (see below).  Until an official authority exists, requests
+should be submitted to
+	David C. Plummer
+	Symbolics, Inc.
+	243 Vassar Street
+	Cambridge, Massachusetts  02139
+Alternatively, network mail can be sent to DCP@MIT-MC.
+
+The Problem:
+------------
+
+The world is a jungle in general, and the networking game
+contributes many animals.  At nearly every layer of a network
+architecture there are several potential protocols that could be
+used.  For example, at a high level, there is TELNET and SUPDUP
+for remote login.  Somewhere below that there is a reliable byte
+stream protocol, which might be CHAOS protocol, DOD TCP, Xerox
+BSP or DECnet.  Even closer to the hardware is the logical
+transport layer, which might be CHAOS, DOD Internet, Xerox PUP,
+or DECnet.  The 10Mbit Ethernet allows all of these protocols
+(and more) to coexist on a single cable by means of a type field
+in the Ethernet packet header.  However, the 10Mbit Ethernet
+requires 48.bit addresses on the physical cable, yet most
+protocol addresses are not 48.bits long, nor do they necessarily
+have any relationship to the 48.bit Ethernet address of the
+hardware.  For example, CHAOS addresses are 16.bits, DOD Internet
+addresses are 32.bits, and Xerox PUP addresses are 8.bits.  A
+protocol is needed to dynamically distribute the correspondences
+between a <protocol, address> pair and a 48.bit Ethernet address.
+
+Motivation:
+-----------
+
+Use of the 10Mbit Ethernet is increasing as more manufacturers
+supply interfaces that conform to the specification published by
+DEC, Intel and Xerox.  With this increasing availability, more
+and more software is being written for these interfaces.  There
+are two alternatives: (1) Every implementor invents his/her own
+method to do some form of address resolution, or (2) every
+implementor uses a standard so that his/her code can be
+distributed to other systems without need for modification.  This
+proposal attempts to set the standard.
+
+Definitions:
+------------
+
+Define the following for referring to the values put in the TYPE
+field of the Ethernet packet header:
+	ether_type$XEROX_PUP,
+	ether_type$DOD_INTERNET,
+	ether_type$CHAOS, 
+and a new one:
+	ether_type$ADDRESS_RESOLUTION.  
+Also define the following values (to be discussed later):
+	ares_op$REQUEST (= 1, high byte transmitted first) and
+	ares_op$REPLY   (= 2), 
+and
+	ares_hrd$Ethernet (= 1).
+
+Packet format:
+--------------
+
+To communicate mappings from <protocol, address> pairs to 48.bit
+Ethernet addresses, a packet format that embodies the Address
+Resolution protocol is needed.  The format of the packet follows.
+
+    Ethernet transmission layer (not necessarily accessible to
+	 the user):
+	48.bit: Ethernet address of destination
+	48.bit: Ethernet address of sender
+	16.bit: Protocol type = ether_type$ADDRESS_RESOLUTION
+    Ethernet packet data:
+	16.bit: (ar$hrd) Hardware address space (e.g., Ethernet,
+			 Packet Radio Net.)
+	16.bit: (ar$pro) Protocol address space.  For Ethernet
+			 hardware, this is from the set of type
+			 fields ether_typ$<protocol>.
+	 8.bit: (ar$hln) byte length of each hardware address
+	 8.bit: (ar$pln) byte length of each protocol address
+	16.bit: (ar$op)  opcode (ares_op$REQUEST | ares_op$REPLY)
+	nbytes: (ar$sha) Hardware address of sender of this
+			 packet, n from the ar$hln field.
+	mbytes: (ar$spa) Protocol address of sender of this
+			 packet, m from the ar$pln field.
+	nbytes: (ar$tha) Hardware address of target of this
+			 packet (if known).
+	mbytes: (ar$tpa) Protocol address of target.
+
+
+Packet Generation:
+------------------
+
+As a packet is sent down through the network layers, routing
+determines the protocol address of the next hop for the packet
+and on which piece of hardware it expects to find the station
+with the immediate target protocol address.  In the case of the
+10Mbit Ethernet, address resolution is needed and some lower
+layer (probably the hardware driver) must consult the Address
+Resolution module (perhaps implemented in the Ethernet support
+module) to convert the <protocol type, target protocol address>
+pair to a 48.bit Ethernet address.  The Address Resolution module
+tries to find this pair in a table.  If it finds the pair, it
+gives the corresponding 48.bit Ethernet address back to the
+caller (hardware driver) which then transmits the packet.  If it
+does not, it probably informs the caller that it is throwing the
+packet away (on the assumption the packet will be retransmitted
+by a higher network layer), and generates an Ethernet packet with
+a type field of ether_type$ADDRESS_RESOLUTION.  The Address
+Resolution module then sets the ar$hrd field to
+ares_hrd$Ethernet, ar$pro to the protocol type that is being
+resolved, ar$hln to 6 (the number of bytes in a 48.bit Ethernet
+address), ar$pln to the length of an address in that protocol,
+ar$op to ares_op$REQUEST, ar$sha with the 48.bit ethernet address
+of itself, ar$spa with the protocol address of itself, and ar$tpa
+with the protocol address of the machine that is trying to be
+accessed.  It does not set ar$tha to anything in particular,
+because it is this value that it is trying to determine.  It
+could set ar$tha to the broadcast address for the hardware (all
+ones in the case of the 10Mbit Ethernet) if that makes it
+convenient for some aspect of the implementation.  It then causes
+this packet to be broadcast to all stations on the Ethernet cable
+originally determined by the routing mechanism.
+
+
+
+Packet Reception:
+-----------------
+
+When an address resolution packet is received, the receiving
+Ethernet module gives the packet to the Address Resolution module
+which goes through an algorithm similar to the following.
+Negative conditionals indicate an end of processing and a
+discarding of the packet.
+
+?Do I have the hardware type in ar$hrd?
+Yes: (almost definitely)
+  [optionally check the hardware length ar$hln]
+  ?Do I speak the protocol in ar$pro?
+  Yes:
+    [optionally check the protocol length ar$pln]
+    Merge_flag := false
+    If the pair <protocol type, sender protocol address> is
+        already in my translation table, update the sender
+	hardware address field of the entry with the new
+	information in the packet and set Merge_flag to true. 
+    ?Am I the target protocol address?
+    Yes:
+      If Merge_flag is false, add the triplet <protocol type,
+          sender protocol address, sender hardware address> to
+	  the translation table.
+      ?Is the opcode ares_op$REQUEST?  (NOW look at the opcode!!)
+      Yes:
+	Swap hardware and protocol fields, putting the local
+	    hardware and protocol addresses in the sender fields.
+	Set the ar$op field to ares_op$REPLY
+	Send the packet to the (new) target hardware address on
+	    the same hardware on which the request was received.
+
+Notice that the <protocol type, sender protocol address, sender
+hardware address> triplet is merged into the table before the
+opcode is looked at.  This is on the assumption that communcation
+is bidirectional; if A has some reason to talk to B, then B will
+probably have some reason to talk to A.  Notice also that if an
+entry already exists for the <protocol type, sender protocol
+address> pair, then the new hardware address supersedes the old
+one.  Related Issues gives some motivation for this.
+
+Generalization:  The ar$hrd and ar$hln fields allow this protocol
+and packet format to be used for non-10Mbit Ethernets.  For the
+10Mbit Ethernet <ar$hrd, ar$hln> takes on the value <1, 6>.  For
+other hardware networks, the ar$pro field may no longer
+correspond to the Ethernet type field, but it should be
+associated with the protocol whose address resolution is being
+sought.
+
+
+Why is it done this way??
+-------------------------
+
+Periodic broadcasting is definitely not desired.  Imagine 100
+workstations on a single Ethernet, each broadcasting address
+resolution information once per 10 minutes (as one possible set
+of parameters).  This is one packet every 6 seconds.  This is
+almost reasonable, but what use is it?  The workstations aren't
+generally going to be talking to each other (and therefore have
+100 useless entries in a table); they will be mainly talking to a
+mainframe, file server or bridge, but only to a small number of
+other workstations (for interactive conversations, for example).
+The protocol described in this paper distributes information as
+it is needed, and only once (probably) per boot of a machine.
+
+This format does not allow for more than one resolution to be
+done in the same packet.  This is for simplicity.  If things were
+multiplexed the packet format would be considerably harder to
+digest, and much of the information could be gratuitous.  Think
+of a bridge that talks four protocols telling a workstation all
+four protocol addresses, three of which the workstation will
+probably never use.
+
+This format allows the packet buffer to be reused if a reply is
+generated; a reply has the same length as a request, and several
+of the fields are the same.
+
+The value of the hardware field (ar$hrd) is taken from a list for
+this purpose.  Currently the only defined value is for the 10Mbit
+Ethernet (ares_hrd$Ethernet = 1).  There has been talk of using
+this protocol for Packet Radio Networks as well, and this will
+require another value as will other future hardware mediums that
+wish to use this protocol.
+
+For the 10Mbit Ethernet, the value in the protocol field (ar$pro)
+is taken from the set ether_type$.  This is a natural reuse of
+the assigned protocol types.  Combining this with the opcode
+(ar$op) would effectively halve the number of protocols that can
+be resolved under this protocol and would make a monitor/debugger
+more complex (see Network Monitoring and Debugging below).  It is
+hoped that we will never see 32768 protocols, but Murphy made
+some laws which don't allow us to make this assumption.
+
+In theory, the length fields (ar$hln and ar$pln) are redundant,
+since the length of a protocol address should be determined by
+the hardware type (found in ar$hrd) and the protocol type (found
+in ar$pro).  It is included for optional consistency checking,
+and for network monitoring and debugging (see below). 
+
+The opcode is to determine if this is a request (which may cause
+a reply) or a reply to a previous request.  16 bits for this is
+overkill, but a flag (field) is needed.
+
+The sender hardware address and sender protocol address are
+absolutely necessary.  It is these fields that get put in a
+translation table.
+
+The target protocol address is necessary in the request form of
+the packet so that a machine can determine whether or not to
+enter the sender information in a table or to send a reply.  It
+is not necessarily needed in the reply form if one assumes a
+reply is only provoked by a request.  It is included for
+completeness, network monitoring, and to simplify the suggested
+processing algorithm described above (which does not look at the
+opcode until AFTER putting the sender information in a table).
+
+The target hardware address is included for completeness and
+network monitoring.  It has no meaning in the request form, since
+it is this number that the machine is requesting.  Its meaning in
+the reply form is the address of the machine making the request.
+In some implementations (which do not get to look at the 14.byte
+ethernet header, for example) this may save some register
+shuffling or stack space by sending this field to the hardware
+driver as the hardware destination address of the packet.
+
+There are no padding bytes between addresses.  The packet data
+should be viewed as a byte stream in which only 3 byte pairs are
+defined to be words (ar$hrd, ar$pro and ar$op) which are sent
+most significant byte first (Ethernet/PDP-10 byte style).  
+
+
+Network monitoring and debugging:
+---------------------------------
+
+The above Address Resolution protocol allows a machine to gain
+knowledge about the higher level protocol activity (e.g., CHAOS,
+Internet, PUP, DECnet) on an Ethernet cable.  It can determine
+which Ethernet protocol type fields are in use (by value) and the
+protocol addresses within each protocol type.  In fact, it is not
+necessary for the monitor to speak any of the higher level
+protocols involved.  It goes something like this:
+
+When a monitor receives an Address Resolution packet, it always
+enters the <protocol type, sender protocol address, sender
+hardware address> in a table.  It can determine the length of the
+hardware and protocol address from the ar$hln and ar$pln fields
+of the packet.  If the opcode is a REPLY the monitor can then
+throw the packet away.  If the opcode is a REQUEST and the target
+protocol address matches the protocol address of the monitor, the
+monitor sends a REPLY as it normally would.  The monitor will
+only get one mapping this way, since the REPLY to the REQUEST
+will be sent directly to the requesting host.  The monitor could
+try sending its own REQUEST, but this could get two monitors into
+a REQUEST sending loop, and care must be taken.
+
+Because the protocol and opcode are not combined into one field,
+the monitor does not need to know which request opcode is
+associated with which reply opcode for the same higher level
+protocol.  The length fields should also give enough information
+to enable it to "parse" a protocol addresses, although it has no
+knowledge of what the protocol addresses mean.
+
+A working implementation of the Address Resolution protocol can
+also be used to debug a non-working implementation.  Presumably a
+hardware driver will successfully broadcast a packet with Ethernet
+type field of ether_type$ADDRESS_RESOLUTION.  The format of the
+packet may not be totally correct, because initial
+implementations may have bugs, and table management may be
+slightly tricky.  Because requests are broadcast a monitor will
+receive the packet and can display it for debugging if desired.
+
+
+An Example:
+-----------
+
+Let there exist machines X and Y that are on the same 10Mbit
+Ethernet cable.  They have Ethernet address EA(X) and EA(Y) and
+DOD Internet addresses IPA(X) and IPA(Y) .  Let the Ethernet type
+of Internet be ET(IP).  Machine X has just been started, and
+sooner or later wants to send an Internet packet to machine Y on
+the same cable.  X knows that it wants to send to IPA(Y) and
+tells the hardware driver (here an Ethernet driver) IPA(Y).  The
+driver consults the Address Resolution module to convert <ET(IP),
+IPA(Y)> into a 48.bit Ethernet address, but because X was just
+started, it does not have this information.  It throws the
+Internet packet away and instead creates an ADDRESS RESOLUTION
+packet with
+	(ar$hrd) = ares_hrd$Ethernet
+	(ar$pro) = ET(IP)
+	(ar$hln) = length(EA(X))
+	(ar$pln) = length(IPA(X))
+	(ar$op)  = ares_op$REQUEST
+	(ar$sha) = EA(X)
+	(ar$spa) = IPA(X)
+	(ar$tha) = don't care
+	(ar$tpa) = IPA(Y)
+and broadcasts this packet to everybody on the cable.
+
+Machine Y gets this packet, and determines that it understands
+the hardware type (Ethernet), that it speaks the indicated
+protocol (Internet) and that the packet is for it
+((ar$tpa)=IPA(Y)).  It enters (probably replacing any existing
+entry) the information that <ET(IP), IPA(X)> maps to EA(X).  It
+then notices that it is a request, so it swaps fields, putting
+EA(Y) in the new sender Ethernet address field (ar$sha), sets the
+opcode to reply, and sends the packet directly (not broadcast) to
+EA(X).  At this point Y knows how to send to X, but X still
+doesn't know how to send to Y.
+
+Machine X gets the reply packet from Y, forms the map from
+<ET(IP), IPA(Y)> to EA(Y), notices the packet is a reply and
+throws it away.  The next time X's Internet module tries to send
+a packet to Y on the Ethernet, the translation will succeed, and
+the packet will (hopefully) arrive.  If Y's Internet module then
+wants to talk to X, this will also succeed since Y has remembered
+the information from X's request for Address Resolution.
+
+Related issue:
+---------------
+
+It may be desirable to have table aging and/or timeouts.  The
+implementation of these is outside the scope of this protocol.
+Here is a more detailed description (thanks to MOON@SCRC@MIT-MC).
+
+If a host moves, any connections initiated by that host will
+work, assuming its own address resolution table is cleared when
+it moves.  However, connections initiated to it by other hosts
+will have no particular reason to know to discard their old
+address.  However, 48.bit Ethernet addresses are supposed to be
+unique and fixed for all time, so they shouldn't change.  A host
+could "move" if a host name (and address in some other protocol)
+were reassigned to a different physical piece of hardware.  Also,
+as we know from experience, there is always the danger of
+incorrect routing information accidentally getting transmitted
+through hardware or software error; it should not be allowed to
+persist forever.  Perhaps failure to initiate a connection should
+inform the Address Resolution module to delete the information on
+the basis that the host is not reachable, possibly because it is
+down or the old translation is no longer valid.  Or perhaps
+receiving of a packet from a host should reset a timeout in the
+address resolution entry used for transmitting packets to that
+host; if no packets are received from a host for a suitable
+length of time, the address resolution entry is forgotten.  This
+may cause extra overhead to scan the table for each incoming
+packet.  Perhaps a hash or index can make this faster.
+
+The suggested algorithm for receiving address resolution packets
+tries to lessen the time it takes for recovery if a host does
+move.  Recall that if the <protocol type, sender protocol
+address> is already in the translation table, then the sender
+hardware address supersedes the existing entry.  Therefore, on a
+perfect Ethernet where a broadcast REQUEST reaches all stations
+on the cable, each station will be get the new hardware address.
+
+Another alternative is to have a daemon perform the timeouts.
+After a suitable time, the daemon considers removing an entry.
+It first sends (with a small number of retransmissions if needed)
+an address resolution packet with opcode REQUEST directly to the
+Ethernet address in the table.  If a REPLY is not seen in a short
+amount of time, the entry is deleted.  The request is sent
+directly so as not to bother every station on the Ethernet.  Just
+forgetting entries will likely cause useful information to be
+forgotten, which must be regained.
+
+Since hosts don't transmit information about anyone other than
+themselves, rebooting a host will cause its address mapping table
+to be up to date.  Bad information can't persist forever by being
+passed around from machine to machine; the only bad information
+that can exist is in a machine that doesn't know that some other
+machine has changed its 48.bit Ethernet address.  Perhaps
+manually resetting (or clearing) the address mapping table will
+suffice.
+
+This issue clearly needs more thought if it is believed to be
+important.  It is caused by any address resolution-like protocol.
+
--- a/kernel/picotcp/RFC/rfc0872.txt
+++ b/kernel/picotcp/RFC/rfc0872.txt
@ -0,0 +1,549 @@
+
+
+     RFC 872                                            September 1982
+                                                                M82-48
+
+
+
+
+
+
+
+                               TCP-ON-A-LAN
+
+
+
+
+     
+     
+
+     
+
+
+
+
+
+
+
+
+
+
+
+
+
+                              M.A. PADLIPSKY
+                           THE MITRE CORPORATION
+                          Bedford, Massachusetts
+     
+
+
+
+
+                                 Abstract
+     
+
+     
+
+          The sometimes-held position that the DoD Standard
+     Transmission Control Protocol (TCP) and Internet Protocol (IP)
+     are inappropriate for use "on" a Local Area Network (LAN) is
+     shown to be fallacious.  The paper is a companion piece to
+     M82-47, M82-49, M82-50, and M82-51.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+                                     i
+     
+     
+     
+     
+                              "TCP-ON-A-LAN"
+
+                              M. A. Padlipsky
+
+     Thesis
+
+          It is the thesis of this paper that fearing "TCP-on-a-LAN"
+     is a Woozle which needs slaying.  To slay the "TCP-on-a-LAN"
+     Woozle, we need to know three things:  What's a Woozle?  What's a
+     LAN?  What's a TCP?
+
+     Woozles
+
+          The first is rather straightforward [1]:
+
+               One fine winter's day when Piglet was brushing away the
+          snow in front of his house, he happened to look up, and
+          there was Winnie-the-Pooh.  Pooh was walking round and round
+          in a circle, thinking of something else, and when Piglet
+          called to him, he just went on walking.
+               "Hallo!" said Piglet, "what are you doing?"
+               "Hunting," said Pooh.
+               "Hunting what?"
+               "Tracking something," said Winnie-the-Pooh very
+          mysteriously.
+               "Tracking what?" said Piglet, coming closer.
+               "That's just what I ask myself.  I ask myself, What?"
+               "What do you think you'll answer?"
+               "I shall have to wait until I catch up with it," said
+          Winnie-the-Pooh.  "Now look there."  He pointed to the
+          ground in front of him.  "What do you see there?
+               "Tracks," said Piglet, "Paw-marks."  he gave a little
+          squeak of excitement.  "Oh, Pooh!  Do you think it's a--a--a
+          Woozle?"
+
+          Well, they convince each other that it is a Woozle, keep
+     "tracking," convince each other that it's a herd of Hostile
+     Animals, and get duly terrified before Christopher Robin comes
+     along and points out that they were following their own tracks
+     all the long.
+
+          In other words, it is our contention that expressed fears
+     about the consequences of using a particular protocol named "TCP"
+     in a particular environment called a Local Area Net stem from
+     misunderstandings of the protocol and the environment, not from
+     the technical facts of the situation.
+
+
+
+
+
+
+                                     1
+     RFC 872                                            September 1982
+
+
+     LAN's
+
+          The second thing we need to know is somewhat less
+     straightforward:  A LAN is, properly speaking [2], a
+     communications mechanism (or subnetwork) employing a transmission
+     technology suitable for relatively short distances (typically a
+     few kilometers) at relatively high bit-per-second rates
+     (typically greater than a few hundred kilobits per second) with
+     relatively low error rates, which exists primarily to enable
+     suitably attached computer systems (or "Hosts") to exchange bits,
+     and secondarily, though not necessarily, to allow terminals of
+     the teletypewriter and CRT classes to exchange bits with Hosts.
+     The Hosts are, at least in principle, heterogeneous; that is,
+     they are not merely multiple instances of the same operating
+     system.  The Hosts are assumed to communicate by means of layered
+     protocols in order to achieve what the ARPANET tradition calls
+     "resource sharing" and what the newer ISO tradition calls "Open
+     System Interconnection."  Addressing typically can be either
+     Host-Host (point-to-point) or "broadcast." (In some environments,
+     e.g., Ethernet, interesting advantage can be taken of broadcast
+     addressing; in other environments, e.g., LAN's which are
+     constituents of ARPA- or ISO-style "internets", broadcast
+     addressing is deemed too expensive to implement throughout the
+     internet as a whole and so may be ignored in the constituent LAN
+     even if available as part of the Host-LAN interface.)
+
+          Note that no assumptions are made about the particular
+     transmission medium or the particular topology in play.  LAN
+     media can be twisted-pair wires, CATV or other coaxial-type
+     cables, optical fibers, or whatever.  However, if the medium is a
+     processor-to-processor bus it is likely that the system in
+     question is going to turn out to "be" a moderately closely
+     coupled distributed processor or a somewhat loosely coupled
+     multiprocessor rather than a LAN, because the processors are
+     unlikely to be using either ARPANET or ISO-style layered
+     protocols.  (They'll usually -- either be homogeneous processors
+     interpreting only the protocol necessary to use the transmission
+     medium, or heterogeneous with one emulating the expectations of
+     the other.)  Systems like "PDSC" or "NMIC" (the evolutionarily
+     related, bus-oriented, multiple PDP-11 systems in use at the
+     Pacific Data Services Center and the National Military
+     Intelligence Center, respectively), then, aren't LANs.
+
+          LAN topologies can be either "bus," "ring," or "star".  That
+     is, a digital PBX can be a LAN, in the sense of furnishing a
+     transmission medium/communications subnetwork for Hosts to do
+     resource sharing/Open System Interconnection over, though it
+     might not present attractive speed or failure mode properties.
+     (It might, though.)  Topologically, it would probably be a
+     neutron star.
+
+
+
+                                     2
+     RFC 872                                            September 1982
+
+
+          For our purposes, the significant properties of a LAN are
+     the high bit transmission capacity and the good error properties.
+     Intuitively, a medium with these properties in some sense
+     "shouldn't require a heavy-duty protocol designed for long-haul
+     nets," according to some.  (We will not address the issue of
+     "wasted bandwidth" due to header sizes. [2], pp. 1509f, provides
+     ample refutation of that traditional communications notion.)
+     However, it must be borne in mind that for our purposes the
+     assumption of resource-sharing/OSI type protocols between/among
+     the attached Hosts is also extremely significant.  That is, if
+     all you're doing is letting some terminals access some different
+     Hosts, but the Hosts don't really have any intercomputer
+     networking protocols between them, what you have should be viewed
+     as a Localized Communications Network (LCN), not a LAN in the
+     sense we're talking about here.
+
+     TCP
+
+          The third thing we have to know can be either
+     straightforward or subtle, depending largely on how aware we are
+     of the context estabished by ARPANET-style prococols:  For the
+     visual-minded, Figure 1 and Figure 2 might be all that need be
+     "said."  Their moral is meant to be that in ARPANET-style
+     layering, layers aren't monoliths.  For those who need more
+     explanation, here goes:  TCP [3] (we'll take IP later) is a
+     Host-Host protocol (roughly equivalent to the functionality
+     implied by some of ISO Level 5 and all of ISO Level 4).  Its most
+     significant property is that it presents reliable logical
+     connections to protocols above itself.  (This point will be
+     returned to subsequently.)  Its next most significant property is
+     that it is designed to operate in a "catenet" (also known as the,
+     or an, "internet"); that is, its addressing discipline is such
+     that Hosts attached to communications subnets other than the one
+     a given Host is attached to (the "proximate net") can be
+     communicated with as well as Hosts on the proximate net.  Other
+     significant properties are those common to the breed:  Host-Host
+     protocols (and Transport protocols) "all" offer mechanisms for
+     flow Control, Out-of-Band Signals, Logical Connection management,
+     and the like.
+
+          Because TCP has a catenet-oriented addressing mechanism
+     (that is, it expresses foreign Host addresses as the
+     "two-dimensional" entity Foreign Net/Foreign Host because it
+     cannot assume that the Foreign Host is attached to the proximate
+     net), to be a full Host-Host protocol it needs an adjunct to deal
+     with the proximate net.  This adjunct, the Internet Protocol (IP)
+     was designed as a separate protocol from TCP, however, in order
+     to allow it to play the same role it plays for TCP for other
+     Host-Host protocols too.
+
+
+
+
+                                     3
+     RFC 872                                            September 1982
+
+
+          In order to "deal with the proximate net", IP possess the
+     following significant properties:  An IP implementation maps from
+     a virtualization (or common intermediate representation) of
+     generic proximate net qualities (such as precedence, grade of
+     service, security labeling) to the closest equivalent on the
+     proximate net. It determines whether the "Internet Address" of a
+     given transmission is on the proximate net or not; if so, it
+     sends it; if not, it sends it to a "Gateway" (where another IP
+     module resides).  That is, IP handles internet routing, whereas
+     TCP (or some other Host-Host  protocol) handles only internet
+     addressing.  Because some proximate nets will accept smaller
+     transmissions ("packets") than others, IP, qua protocol, also has
+     a discipline for allowing packets to be fragmented while in the
+     catenet and reassembled at their destination.  Finally (for our
+     purposes), IP offers a mechanism to allow the particular protocol
+     it was called by (for a given packet) to be identified so that
+     the receiver can demultiplex transmissions based on IP-level
+     information only. (This is in accordance with the Principle of
+     Layering:  you don't want to have to look at the data IP is
+     conveying to find out what to do with it.)
+
+          Now that all seems rather complex, even though it omits a
+     number of mechanisms.  (For a more complete discussion, see
+     Reference [4].)  But it should be just about enough to slay the
+     Woozle, especially if just one more protocol's most significant
+     property can be snuck in.  An underpublicized member of the
+     ARPANET suite of protocols is called UDP--the "User Datagram
+     Protocol."  UDP is designed for speed rather than accuracy.  That
+     is, it's not "reliable."  All there is to UDP, basically, is a
+     mechanism to allow a given packet to be associated with a given
+     logical connection. Not a TCP logical connection, mind you, but a
+     UDP logical connection.  So if all you want is the ability to
+     demultiplex data streams from your Host-Host protocol, you use
+     UDP, not TCP.  ("You" is usually supposed to be a Packetized
+     Speech protocol, but doesn't have to be.)  (And we'll worry about
+     Flow Control some other time.)
+
+     TCP-on-a-LAN
+
+          So whether you're a Host proximate to a LAN or not, and even
+     whether your TCP/IP is "inboard" or "outboard" of you, if you're
+     talking to a Host somewhere out there on the catenet, you use IP;
+     and if you're exercising some process-level/applications protocol
+     (roughly equivalent to some of some versions of ISO L5 and all of
+     L6 and L7) that expects TCP/IP as its Host-Host protocol (because
+     it "wants" reliable, flow controlled, ordered delivery [whoops,
+     forgot that "ordered" property earlier--but it doesn't matter all
+     that much for present purposes] over logical connections which
+     allow it to be
+
+
+
+
+                                     4
+     RFC 872                                            September 1982
+
+
+     addressed via a Well-Known Socket), you use TCP "above" IP
+     regardless of whether the other Host is on your proximate net or
+     not.  But if your application doesn't require the properties of
+     TCP (say for Packetized Speech), don't use it--regardless of
+     where or what you are.  And if you want to make the decision
+     about whether you're talking to a proximate Host explicitly and
+     not even go through IP, you can even arrange to do that (though
+     it might make for messy implementation under some circumstances).
+     That is, if you want to take advantage of the properties of your
+     LAN "in the raw" and have or don't need appropriate applications
+     protocols, the Reference Model to which TCP/IP were designed
+     won't stop you.  See Figure 2 if you're visual.  A word of
+     caution, though:  those applications probably will need protocols
+     of some sort--and they'll probably need some sort of Host-Host
+     protocol under them, so unless you relish maintaining "parallel"
+     suites of protocols....  that is, you really would be better off
+     with TCP most of the time locally anyway, because you've got to
+     have it to talk to the catenet and it's a nuisance to have
+     "something else" to talk over the LAN--when, of course, what
+     you're talking requires a Host-Host protocol.
+
+          We'll touch on "performance" issues in a bit more detail
+     later. At this level, though, one point really does need to be
+     made:  On the "reliability" front, many (including the author) at
+     first blush take the TCP checksum to be "overkill" for use on a
+     LAN, which does, after all, typically present extremely good
+     error properties. Interestingly enough, however, metering of TCP
+     implementations on several Host types in the research community
+     shows that the processing time expended on the TCP checksum is
+     only around 12% of the per-transmission processing time anyway.
+     So, again, it's not clear that it's worthwhile to bother with an
+     alternate Host-Host protocol for local use (if, that is, you need
+     the rest of the properties of TCP other than "reliability"--and,
+     of course, always assuming you've got a LAN, not an LCN, as
+     distinguished earlier.)
+
+          Take that, Woozle!
+
+     Other Significant Properties
+
+          Oh, by the way, one or two other properties of TCP/IP really
+     do bear mention:
+
+          1.   Protocol interpreters for TCP/IP exist for a dozen or
+               two different operating systems.
+
+          2.   TCP/IP work, and have been working (though in less
+               refined versions) for several years.
+
+
+
+
+
+                                     5
+     RFC 872                                            September 1982
+
+
+          3.   IP levies no constraints on the interface protocol
+               presented by the proximate net (though some protocols
+               at that level are more wasteful than others).
+
+          4.   IP levies no constraints on its users; in particular,
+               any proximate net that offers alternate routing can be
+               taken advantage of (unlike X.25, which appears to
+               preclude alternate routing).
+
+          5.   IP-bearing Gateways both exist and present and exploit
+               properties 3 and 4.
+
+          6.   TCP/IP are Department of Defense Standards.
+
+          7.   Process (or application) protocols compatible with
+               TCP/IP for Virtual Terminal and File Transfer
+               (including "electronic mail") exist and have been
+               implemented on numerous operating systems.
+
+          8.   "Vendor-style" specifications of TCP/IP are being
+               prepared under the aegis of the DoD Protocol Standards
+               Technical Panel, for those who find the
+               research-community-provided specs not to their liking.
+
+          9.   The research community has recently reported speeds in
+               excess of 300 kb/s on an 800 kb/s subnet, 1.2 Mb/s on a
+               3 Mb/s subnet, and 9.2 kbs on a 9.6 kb/s phone
+               line--all using TCP.  (We don't know of any numbers for
+               alternative protocol suites, but it's unlikely they'd
+               be appreciably better if they confer like
+               functionality--and they may well be worse if they
+               represent implementations which haven't been around
+               enough to have been iterated a time or three.)
+
+          With the partial exception of property 8, no other
+     resource-sharing protocol suite can make those claims.
+
+          Note particularly well that none of the above should be
+     construed as eliminating the need for extremely careful
+     measurement of TCP/IP performance in/on a LAN.  (You do, after
+     all, want to know their limitations, to guide you in when to
+     bother ringing in "local" alternatives--but be very careful:  1.
+     they're hard to measure commensurately with alternative
+     protocols; and 2.  most conventional Hosts can't take [or give]
+     as many bits per second as you might imagine.)  It merely
+     dramatically refocuses the motivation for doing such measurement.
+     (And levies a constraint or two on how you outboard, if you're
+     outboarding.)
+
+
+
+
+
+                                     6
+     RFC 872                                            September 1982
+
+
+     Other Contextual Data
+
+          Our case could really rest here, but some amplification of
+     the aside above about Host capacities is warranted, if only to
+     suggest that some quantification is available to supplement the a
+     priori argument:  Consider the previously mentioned PDSC.  Its
+     local terminals operate in a screen-at-a-time mode, each
+     screen-load comprising some 16 kb.  How many screens can one of
+     its Hosts handle in a given second?  Well, we're told that each
+     disk fetch requires 17 ms average latency, and each context
+     switch costs around 2 ms, so allowing 1 ms for transmission of
+     the data from the disk and to the "net" (it makes the arithmetic
+     easy), that would add up to 20 ms "processing" time per screen,
+     even if no processing were done to the disk image.  Thus, even if
+     the Host were doing nothing else, and  even if the native disk
+     I/O software were optimized to do 16 kb reads, it could only
+     present 50 screens to its communications mechanism
+     (processor-processor bus) per second.  That's 800 kb/s. And
+     that's well within the range of TCP-achievable rates (cf.  Other
+     Significant Property 9).  So in a realistic sample environment,
+     it would certainly seem that typical Hosts can't necessarily
+     present so many bits as to overtax the protocols anyway.  (The
+     analysis of how many bits typical Hosts can accept is more
+     difficult because it depends more heavily on system internals.
+     However, the point is nearly moot in that even in the intuitively
+     unlikely event that receiving were appreciably faster in
+     principle [unlikely because of typical operating system
+     constraints on address space sizes, the need to do input to a
+     single address space, and the need to share buffers in the
+     address space among several processes], you can't accept more
+     than you can be given.)
+
+     Conclusion
+
+          The sometimes-expressed fear that using TCP on a local net
+     is a bad idea is unfounded.
+
+     References
+
+     [1]  Milne, A. A., "Winnie-the-Pooh", various publishers.
+
+     [2]  The LAN description is based on Clark, D. D.  et al., "An
+          Introduction to Local Area Networks,"  IEEE Proc., V. 66, N.
+          11, November 1978, pp. 1497-1517, several year's worth of
+          conversations with Dr. Clark, and the author's observations
+          of both the open literature and the Oral Tradition (which
+          were sufficiently well-thought of to have prompted The MITRE
+          Corporation/NBS/NSA Local Nets "Brain Picking Panel" to have
+
+
+
+
+
+                                     7
+     RFC 872                                            September 1982
+
+
+          solicited his testimony during the year he was in FACC's
+          employ.*)
+
+     [3]  The TCP/IP descriptions are based on Postel, J. B.,
+          "Internet Protocol Specification," and "Transmission Control
+          Specification" in DARPA Internet Program Protocol
+          Specifications, USC Information Sciences Institute,
+          September, 1981, and on more than 10 years' worth of
+          conversations with Dr. Postel, Dr. Clark (now the DARPA
+          "Internet Architect") and Dr. Vinton G. Cerf (co-originator
+          of TCP), and on numerous discussions with several other
+          members of the TCP/IP design team, on having edited the
+          referenced documents for the PSTP, and, for that matter, on
+          having been one of the developers of the ARPANET "Reference
+          Model."
+
+     [4]  Padlipsky, M. A., "A Perspective on the ARPANET Reference
+          Model", M82-47, The MITRE Corporation, September 1982; also
+          available in Proc. INFOCOM '83.
+
+     ________________
+     *  In all honesty, as far as I know I started the rumor that TCP
+        might be overkill for a LAN at that meeting.  At the next TCP
+        design meeting, however, they separated IP out from TCP, and
+        everything's been alright for about three years now--except
+        for getting the rumor killed.  (I'd worry about Woozles
+        turning into roosting chickens if it weren't for the facts
+        that:  1.  People tend to ignore their local guru; 2.  I was
+        trying to encourage the IP separation; and 3.  All I ever
+        wanted was some empirical data.)
+
+     NOTE:  FIGURE 1. ARM in the Abstract, and FIGURE 2.  ARMS,
+        Somewhat Particularized, may be obtained by writing to:  Mike
+        Padlipsky, MITRE Corporation, P.O. Box 208, Bedford,
+        Massachusetts, 01730, or sending computer mail to
+        Padlipsky@USC-ISIA.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+                                     8
--- a/kernel/picotcp/RFC/rfc0879.txt
+++ b/kernel/picotcp/RFC/rfc0879.txt
@ -0,0 +1,638 @@
+
+
+Network Working Group                                          J. Postel
+Request for Comments: 879                                            ISI
+                                                           November 1983
+
+
+
+                      The TCP Maximum Segment Size
+                           and Related Topics
+
+This memo discusses the TCP Maximum Segment Size Option and related
+topics.  The purposes is to clarify some aspects of TCP and its
+interaction with IP.  This memo is a clarification to the TCP
+specification, and contains information that may be considered as
+"advice to implementers".
+
+1.  Introduction
+
+   This memo discusses the TCP Maximum Segment Size and its relation to
+   the IP Maximum Datagram Size.  TCP is specified in reference [1].  IP
+   is specified in references [2,3].
+
+   This discussion is necessary because the current specification of
+   this TCP option is ambiguous.
+
+   Much of the difficulty with understanding these sizes and their
+   relationship has been due to the variable size of the IP and TCP
+   headers.
+
+   There have been some assumptions made about using other than the
+   default size for datagrams with some unfortunate results.
+
+      HOSTS MUST NOT SEND DATAGRAMS LARGER THAN 576 OCTETS UNLESS THEY
+      HAVE SPECIFIC KNOWLEDGE THAT THE DESTINATION HOST IS PREPARED TO
+      ACCEPT LARGER DATAGRAMS.
+
+         This is a long established rule.
+
+   To resolve the ambiguity in the TCP Maximum Segment Size option
+   definition the following rule is established:
+
+      THE TCP MAXIMUM SEGMENT SIZE IS THE IP MAXIMUM DATAGRAM SIZE MINUS
+      FORTY.
+
+         The default IP Maximum Datagram Size is 576.
+         The default TCP Maximum Segment Size is 536.
+
+
+
+
+
+
+
+
+
+Postel                                                          [Page 1]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+2.  The IP Maximum Datagram Size
+
+   Hosts are not required to reassemble infinitely large IP datagrams.
+   The maximum size datagram that all hosts are required to accept or
+   reassemble from fragments is 576 octets.  The maximum size reassembly
+   buffer every host must have is 576 octets.  Hosts are allowed to
+   accept larger datagrams and assemble fragments into larger datagrams,
+   hosts may have buffers as large as they please.
+
+   Hosts must not send datagrams larger than 576 octets unless they have
+   specific knowledge that the destination host is prepared to accept
+   larger datagrams.
+
+3.  The TCP Maximum Segment Size Option
+
+   TCP provides an option that may be used at the time a connection is
+   established (only) to indicate the maximum size TCP segment that can
+   be accepted on that connection.  This Maximum Segment Size (MSS)
+   announcement (often mistakenly called a negotiation) is sent from the
+   data receiver to the data sender and says "I can accept TCP segments
+   up to size X". The size (X) may be larger or smaller than the
+   default.  The MSS can be used completely independently in each
+   direction of data flow.  The result may be quite different maximum
+   sizes in the two directions.
+
+   The MSS counts only data octets in the segment, it does not count the
+   TCP header or the IP header.
+
+   A footnote:  The MSS value counts only data octets, thus it does not
+   count the TCP SYN and FIN control bits even though SYN and FIN do
+   consume TCP sequence numbers.
+
+4.  The Relationship of TCP Segments and IP Datagrams
+
+   TCP segment are transmitted as the data in IP datagrams.  The
+   correspondence between TCP segments and IP datagrams must be one to
+   one.  This is because TCP expects to find exactly one complete TCP
+   segment in each block of data turned over to it by IP, and IP must
+   turn over a block of data for each datagram received (or completely
+   reassembled).
+
+
+
+
+
+
+
+
+
+
+Postel                                                          [Page 2]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+5.  Layering and Modularity
+
+   TCP is an end to end reliable data stream protocol with error
+   control, flow control, etc.  TCP remembers many things about the
+   state of a connection.
+
+   IP is a one shot datagram protocol.  IP has no memory of the
+   datagrams transmitted.  It is not appropriate for IP to keep any
+   information about the maximum datagram size a particular destination
+   host might be capable of accepting.
+
+   TCP and IP are distinct layers in the protocol architecture, and are
+   often implemented in distinct program modules.
+
+   Some people seem to think that there must be no communication between
+   protocol layers or program modules.  There must be communication
+   between layers and modules, but it should be carefully specified and
+   controlled.  One problem in understanding the correct view of
+   communication between protocol layers or program modules in general,
+   or between TCP and IP in particular is that the documents on
+   protocols are not very clear about it.  This is often because the
+   documents are about the protocol exchanges between machines, not the
+   program architecture within a machine, and the desire to allow many
+   program architectures with different organization of tasks into
+   modules.
+
+6.  IP Information Requirements
+
+   There is no general requirement that IP keep information on a per
+   host basis.
+
+   IP must make a decision about which directly attached network address
+   to send each datagram to.  This is simply mapping an IP address into
+   a directly attached network address.
+
+   There are two cases to consider:  the destination is on the same
+   network, and the destination is on a different network.
+
+      Same Network
+
+         For some networks the the directly attached network address can
+         be computed from the IP address for destination hosts on the
+         directly attached network.
+
+         For other networks the mapping must be done by table look up
+         (however the table is initialized and maintained, for
+         example, [4]).
+
+
+
+Postel                                                          [Page 3]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+      Different Network
+
+         The IP address must be mapped to the directly attached network
+         address of a gateway.  For networks with one gateway to the
+         rest of the Internet the host need only determine and remember
+         the gateway address and use it for sending all datagrams to
+         other networks.
+
+         For networks with multiple gateways to the rest of the
+         Internet, the host must decide which gateway to use for each
+         datagram sent.  It need only check the destination network of
+         the IP address and keep information on which gateway to use for
+         each network.
+
+   The IP does, in some cases, keep per host routing information for
+   other hosts on the directly attached network.  The IP does, in some
+   cases, keep per network routing information.
+
+   A Special Case
+
+      There are two ICMP messages that convey information about
+      particular hosts.  These are subtypes of the Destination
+      Unreachable and the Redirect ICMP messages.  These messages are
+      expected only in very unusual circumstances.  To make effective
+      use of these messages the receiving host would have to keep
+      information about the specific hosts reported on.  Because these
+      messages are quite rare it is strongly recommended that this be
+      done through an exception mechanism rather than having the IP keep
+      per host tables for all hosts.
+
+7.  The Relationship between IP Datagram and TCP Segment Sizes
+
+   The relationship between the value of the maximum IP datagram size
+   and the maximum TCP segment size is obscure.  The problem is that
+   both the IP header and the TCP header may vary in length.  The TCP
+   Maximum Segment Size option (MSS) is defined to specify the maximum
+   number of data octets in a TCP segment exclusive of TCP (or IP)
+   header.
+
+   To notify the data sender of the largest TCP segment it is possible
+   to receive the calculation of the MSS value to send is:
+
+      MSS = MTU - sizeof(TCPHDR) - sizeof(IPHDR)
+
+   On receipt of the MSS option the calculation of the size of segment
+   that can be sent is:
+
+      SndMaxSegSiz = MIN((MTU - sizeof(TCPHDR) - sizeof(IPHDR)), MSS)
+
+
+Postel                                                          [Page 4]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+   where MSS is the value in the option, and MTU is the Maximum
+   Transmission Unit (or the maximum packet size) allowed on the
+   directly attached network.
+
+   This begs the question, though.  What value should be used for the
+   "sizeof(TCPHDR)" and for the "sizeof(IPHDR)"?
+
+   There are three reasonable positions to take: the conservative, the
+   moderate, and the liberal.
+
+   The conservative or pessimistic position assumes the worst -- that
+   both the IP header and the TCP header are maximum size, that is, 60
+   octets each.
+
+      MSS = MTU - 60 - 60 = MTU - 120
+
+      If MTU is 576 then MSS = 456
+
+   The moderate position assumes the that the IP is maximum size (60
+   octets) and the TCP header is minimum size (20 octets), because there
+   are no TCP header options currently defined that would normally be
+   sent at the same time as data segments.
+
+      MSS = MTU - 60 - 20 = MTU - 80
+
+      If MTU is 576 then MSS = 496
+
+   The liberal or optimistic position assumes the best -- that both the
+   IP header and the TCP header are minimum size, that is, 20 octets
+   each.
+
+      MSS = MTU - 20 - 20 = MTU - 40
+
+      If MTU is 576 then MSS = 536
+
+      If nothing is said about MSS, the data sender may cram as much as
+      possible into a 576 octet datagram, and if the datagram has
+      minimum headers (which is most likely), the result will be 536
+      data octets in the TCP segment.  The rule relating MSS to the
+      maximum datagram size ought to be consistent with this.
+
+   A practical point is raised in favor of the liberal position too.
+   Since the use of minimum IP and TCP headers is very likely in the
+   very large percentage of cases, it seems wasteful to limit the TCP
+   segment data to so much less than could be transmitted at once,
+   especially since it is less that 512 octets.
+
+
+
+
+Postel                                                          [Page 5]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+      For comparison:  536/576 is 93% data, 496/576 is 86% data, 456/576
+      is 79% data.
+
+8.  Maximum Packet Size
+
+   Each network has some maximum packet size, or maximum transmission
+   unit (MTU).  Ultimately there is some limit imposed by the
+   technology, but often the limit is an engineering choice or even an
+   administrative choice.  Different installations of the same network
+   product do not have to use the same maximum packet size.  Even within
+   one installation not all host must use the same packet size (this way
+   lies madness, though).
+
+   Some IP implementers have assumed that all hosts on the directly
+   attached network will be the same or at least run the same
+   implementation.  This is a dangerous assumption.  It has often
+   developed that after a small homogeneous set of host have become
+   operational additional hosts of different types are introduced into
+   the environment.  And it has often developed that it is desired to
+   use a copy of the implementation in a different inhomogeneous
+   environment.
+
+   Designers of gateways should be prepared for the fact that successful
+   gateways will be copied and used in other situation and
+   installations.  Gateways must be prepared to accept datagrams as
+   large as can be sent in the maximum packets of the directly attached
+   networks.  Gateway implementations should be easily configured for
+   installation in different circumstances.
+
+   A footnote:  The MTUs of some popular networks (note that the actual
+   limit in some installations may be set lower by administrative
+   policy):
+
+      ARPANET, MILNET = 1007
+      Ethernet (10Mb) = 1500
+      Proteon PRONET  = 2046
+
+9.  Source Fragmentation
+
+   A source host would not normally create datagram fragments.  Under
+   normal circumstances datagram fragments only arise when a gateway
+   must send a datagram into a network with a smaller maximum packet
+   size than the datagram.  In this case the gateway must fragment the
+   datagram (unless it is marked "don't fragment" in which case it is
+   discarded, with the option of sending an ICMP message to the source
+   reporting the problem).
+
+   It might be desirable for the source host to send datagram fragments
+
+
+Postel                                                          [Page 6]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+   if the maximum segment size (default or negotiated) allowed by the
+   data receiver were larger than the maximum packet size allowed by the
+   directly attached network.  However, such datagram fragments must not
+   combine to a size larger than allowed by the destination host.
+
+      For example, if the receiving TCP announced that it would accept
+      segments up to 5000 octets (in cooperation with the receiving IP)
+      then the sending TCP could give such a large segment to the
+      sending IP provided the sending IP would send it in datagram
+      fragments that fit in the packets of the directly attached
+      network.
+
+   There are some conditions where source host fragmentation would be
+   necessary.
+
+      If the host is attached to a network with a small packet size (for
+      example 256 octets), and it supports an application defined to
+      send fixed sized messages larger than that packet size (for
+      example TFTP [5]).
+
+      If the host receives ICMP Echo messages with data it is required
+      to send an ICMP Echo-Reply message with the same data.  If the
+      amount of data in the Echo were larger than the packet size of the
+      directly attached network the following steps might be required:
+      (1) receive the fragments, (2) reassemble the datagram, (3)
+      interpret the Echo, (4) create an Echo-Reply, (5) fragment it, and
+      (6) send the fragments.
+
+10. Gateway Fragmentation
+
+   Gateways must be prepared to do fragmentation.  It is not an optional
+   feature for a gateway.
+
+   Gateways have no information about the size of datagrams destination
+   hosts are prepared to accept.  It would be inappropriate for gateways
+   to attempt to keep such information.
+
+   Gateways must be prepared to accept the largest datagrams that are
+   allowed on each of the directly attached networks, even if it is
+   larger than 576 octets.
+
+   Gateways must be prepared to fragment datagrams to fit into the
+   packets of the next network, even if it smaller than 576 octets.
+
+   If a source host thought to take advantage of the local network's
+   ability to carry larger datagrams but doesn't have the slightest idea
+   if the destination host can accept larger than default datagrams and
+   expects the gateway to fragment the datagram into default size
+
+
+Postel                                                          [Page 7]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+   fragments, then the source host is misguided.  If indeed, the
+   destination host can't accept larger than default datagrams, it
+   probably can't reassemble them either. If the gateway either passes
+   on the large datagram whole or fragments into default size fragments
+   the destination will not accept it.  Thus, this mode of behavior by
+   source hosts must be outlawed.
+
+   A larger than default datagram can only arrive at a gateway because
+   the source host knows that the destination host can handle such large
+   datagrams (probably because the destination host announced it to the
+   source host in an TCP MSS option).  Thus, the gateway should pass on
+   this large datagram in one piece or in the largest fragments that fit
+   into the next network.
+
+   An interesting footnote is that even though the gateways may know
+   about know the 576 rule, it is irrelevant to them.
+
+11. Inter-Layer Communication
+
+   The Network Driver (ND) or interface should know the Maximum
+   Transmission Unit (MTU) of the directly attached network.
+
+   The IP should ask the Network Driver for the Maximum Transmission
+   Unit.
+
+   The TCP should ask the IP for the Maximum Datagram Data Size (MDDS).
+   This is the MTU minus the IP header length (MDDS = MTU - IPHdrLen).
+
+   When opening a connection TCP can send an MSS option with the value
+   equal MDDS - TCPHdrLen.
+
+   TCP should determine the Maximum Segment Data Size (MSDS) from either
+   the default or the received value of the MSS option.
+
+   TCP should determine if source fragmentation is possible (by asking
+   the IP) and desirable.
+
+      If so TCP may hand to IP segments (including the TCP header) up to
+      MSDS + TCPHdrLen.
+
+      If not TCP may hand to IP segments (including the TCP header) up
+      to the lesser of (MSDS + TCPHdrLen) and MDDS.
+
+   IP checks the length of data passed to it by TCP.  If the length is
+   less than or equal MDDS, IP attached the IP header and hands it to
+   the ND.  Otherwise the IP must do source fragmentation.
+
+
+
+
+Postel                                                          [Page 8]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+12. What is the Default MSS ?
+
+   Another way of asking this question is "What transmitted value for
+   MSS has exactly the same effect of not transmitting the option at
+   all?".
+
+   In terms of the previous section:
+
+      The default assumption is that the Maximum Transmission Unit is
+      576 octets.
+
+         MTU = 576
+
+      The Maximum Datagram Data Size (MDDS) is the MTU minus the IP
+      header length.
+
+         MDDS = MTU - IPHdrLen = 576 - 20 = 556
+
+      When opening a connection TCP can send an MSS option with the
+      value equal MDDS - TCPHdrLen.
+
+         MSS = MDDS - TCPHdrLen = 556 - 20 = 536
+
+      TCP should determine the Maximum Segment Data Size (MSDS) from
+      either the default or the received value of the MSS option.
+
+         Default MSS = 536, then MSDS = 536
+
+      TCP should determine if source fragmentation is possible and
+      desirable.
+
+         If so TCP may hand to IP segments (including the TCP header) up
+         to MSDS + TCPHdrLen (536 + 20 = 556).
+
+         If not TCP may hand to IP segments (including the TCP header)
+         up to the lesser of (MSDS + TCPHdrLen (536 + 20 = 556)) and
+         MDDS (556).
+
+
+
+
+
+
+
+
+
+
+
+
+
+Postel                                                          [Page 9]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+13. The Truth
+
+   The rule relating the maximum IP datagram size and the maximum TCP
+   segment size is:
+
+      TCP Maximum Segment Size = IP Maximum Datagram Size - 40
+
+   The rule must match the default case.
+
+      If the TCP Maximum Segment Size option is not transmitted then the
+      data sender is allowed to send IP datagrams of maximum size (576)
+      with a minimum IP header (20) and a minimum TCP header (20) and
+      thereby be able to stuff 536 octets of data into each TCP segment.
+
+   The definition of the MSS option can be stated:
+
+      The maximum number of data octets that may be received by the
+      sender of this TCP option in TCP segments with no TCP header
+      options transmitted in IP datagrams with no IP header options.
+
+14. The Consequences
+
+   When TCP is used in a situation when either the IP or TCP headers are
+   not minimum and yet the maximum IP datagram that can be received
+   remains 576 octets then the TCP Maximum Segment Size option must be
+   used to reduce the limit on data octets allowed in a TCP segment.
+
+      For example, if the IP Security option (11 octets) were in use and
+      the IP maximum datagram size remained at 576 octets, then the TCP
+      should send the MSS with a value of 525 (536-11).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Postel                                                         [Page 10]
+
+
+
+RFC 879                                                    November 1983
+TCP Maximum Segment Size                                                
+
+
+15. References
+
+   [1]  Postel, J., ed., "Transmission Control Protocol - DARPA Internet
+        Program Protocol Specification", RFC 793, USC/Information
+        Sciences Institute, September 1981.
+
+   [2]  Postel, J., ed., "Internet Protocol - DARPA Internet Program
+        Protocol Specification", RFC 791, USC/Information Sciences
+        Institute, September 1981.
+
+   [3]  Postel, J., "Internet Control Message Protocol - DARPA Internet
+        Program Protocol Specification", RFC 792, USC/Information
+        Sciences Institute, September 1981.
+
+   [4]  Plummer, D., "An Ethernet Address Resolution Protocol or
+        Converting Network Protocol Addresses to 48-bit Ethernet
+        Addresses for Transmission on Ethernet Hardware", RFC 826,
+        MIT/LCS, November 1982.
+
+   [5]  Sollins, K., "The TFTP Protocol (Revision 2)", RFC 783, MIT/LCS,
+        June 1981.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Postel                                                         [Page 11]
+
--- a/kernel/picotcp/RFC/rfc0896.txt
+++ b/kernel/picotcp/RFC/rfc0896.txt
@ -0,0 +1,512 @@
+
+
+Network Working Group                                  John Nagle
+Request For Comments:  896                         6 January 1984
+                    Ford Aerospace and Communications Corporation
+
+           Congestion Control in IP/TCP Internetworks
+
+This memo discusses some aspects of congestion control in  IP/TCP
+Internetworks.   It  is intended to stimulate thought and further
+discussion of this topic.   While some specific  suggestions  are
+made for improved congestion  control  implementation,  this memo
+does not specify any standards.
+
+                          Introduction
+
+Congestion control is a recognized problem in  complex  networks.
+We have discovered that the Department of Defense's Internet Pro-
+tocol (IP) , a pure datagram protocol, and  Transmission  Control
+Protocol  (TCP),  a transport layer protocol, when used together,
+are subject to unusual congestion problems caused by interactions
+between  the  transport  and  datagram layers.  In particular, IP
+gateways are vulnerable to a phenomenon we call  "congestion col-
+lapse",  especially when such gateways connect networks of widely
+different bandwidth.  We have developed  solutions  that  prevent
+congestion collapse.
+
+These problems are not generally recognized because these  proto-
+cols  are used most often on networks built on top of ARPANET IMP
+technology.  ARPANET IMP based networks traditionally  have  uni-
+form  bandwidth and identical switching nodes, and are sized with
+substantial excess capacity.  This excess capacity, and the abil-
+ity  of the IMP system to throttle the transmissions of hosts has
+for most IP / TCP hosts and  networks  been  adequate  to  handle
+congestion.  With the recent split of the ARPANET into two inter-
+connected networks and the growth of other networks with  differ-
+ing properties connected to the ARPANET, however, reliance on the
+benign properties of the IMP system is no longer enough to  allow
+hosts  to  communicate rapidly and reliably. Improved handling of
+congestion is now  mandatory  for  successful  network  operation
+under load.
+
+Ford Aerospace and Communications  Corporation,  and  its  parent
+company,  Ford  Motor  Company,  operate  the only private IP/TCP
+long-haul network in existence today.  This network connects four
+facilities  (one  in Michigan, two in California, and one in Eng-
+land) some with extensive local networks.  This net is cross-tied
+to  the  ARPANET  but  uses  its  own long-haul circuits; traffic
+between Ford  facilities  flows  over  private  leased  circuits,
+including  a  leased  transatlantic  satellite  connection.   All
+switching nodes are pure IP datagram switches  with  no  node-to-
+node  flow  control, and all hosts run software either written or
+heavily modified by Ford or Ford Aerospace.  Bandwidth  of  links
+in  this  network varies widely, from 1200 to 10,000,000 bits per
+second.  In general, we have not been able to afford  the  luxury
+of excess long-haul bandwidth that the ARPANET possesses, and our
+long-haul links are heavily loaded during peak periods.   Transit
+times of several seconds are thus common in our network.
+
+
+RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84
+
+
+Because of our pure datagram orientation, heavy loading, and wide
+variation  in  bandwidth,  we have had to solve problems that the
+ARPANET / MILNET community is just beginning to  recognize.   Our
+network is sensitive to suboptimal behavior by host TCP implemen-
+tations, both on and off our own net.  We have devoted  consider-
+able  effort  to examining TCP behavior under various conditions,
+and have solved some widely  prevalent  problems  with  TCP.   We
+present  here  two problems and their solutions.  Many TCP imple-
+mentations have these problems; if throughput is worse through an
+ARPANET  /  MILNET  gateway  for  a given TCP implementation than
+throughput across a single net, there is a high probability  that
+the TCP implementation has one or both of these problems.
+
+                       Congestion collapse
+
+Before we proceed with a discussion of the two specific  problems
+and  their  solutions,  a  description of what happens when these
+problems are not addressed is in order.  In heavily  loaded  pure
+datagram  networks  with  end to end retransmission, as switching
+nodes become congested, the  round  trip  time  through  the  net
+increases  and  the  count of datagrams in transit within the net
+also increases.  This is normal behavior under load.  As long  as
+there is only one copy of each datagram in transit, congestion is
+under  control.   Once  retransmission  of  datagrams   not   yet
+delivered begins, there is potential for serious trouble.
+
+Host TCP  implementations  are  expected  to  retransmit  packets
+several times at increasing time intervals until some upper limit
+on the retransmit interval is reached.  Normally, this  mechanism
+is  enough to prevent serious congestion problems.  Even with the
+better adaptive host retransmission algorithms, though, a  sudden
+load on the net can cause the round-trip time to rise faster than
+the sending hosts measurements of round-trip time can be updated.
+Such  a  load  occurs  when  a  new  bulk  transfer,  such a file
+transfer, begins and starts filling a large window.   Should  the
+round-trip  time  exceed  the maximum retransmission interval for
+any host, that host will begin to introduce more and more  copies
+of  the same datagrams into the net.  The network is now in seri-
+ous trouble.  Eventually all available buffers in  the  switching
+nodes  will  be full and packets must be dropped.  The round-trip
+time for packets that are delivered is now at its maximum.  Hosts
+are  sending  each packet several times, and eventually some copy
+of each packet arrives at its destination.   This  is  congestion
+collapse.
+
+This condition is stable.  Once the  saturation  point  has  been
+reached,  if the algorithm for selecting packets to be dropped is
+fair, the network will continue to operate in a  degraded  condi-
+tion.   In  this  condition  every  packet  is  being transmitted
+several times and throughput is reduced to a  small  fraction  of
+normal.   We  have pushed our network into this condition experi-
+mentally and observed its stability.  It is possible  for  round-
+trip  time to become so large that connections are broken because
+
+
+RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84
+
+
+the hosts involved time out.
+
+Congestion collapse and pathological congestion are not  normally
+seen  in  the ARPANET / MILNET system because these networks have
+substantial excess  capacity.   Where  connections  do  not  pass
+through IP gateways, the IMP-to host flow control mechanisms usu-
+ally prevent congestion collapse, especially since TCP  implemen-
+tations  tend  to be well adjusted for the time constants associ-
+ated with the pure ARPANET case.  However, other than ICMP Source
+Quench  messages,  nothing fundamentally prevents congestion col-
+lapse when TCP is run over the ARPANET / MILNET and  packets  are
+being  dropped  at  gateways.  Worth  noting is that a few badly-
+behaved hosts can by themselves congest the gateways and  prevent
+other  hosts from passing traffic.  We have observed this problem
+repeatedly with certain hosts (with whose administrators we  have
+communicated privately) on the ARPANET.
+
+Adding additional memory to the gateways will not solve the prob-
+lem.   The  more  memory  added, the longer round-trip times must
+become before packets are dropped.  Thus, the onset of congestion
+collapse  will be delayed but when collapse occurs an even larger
+fraction of the  packets  in  the  net  will  be  duplicates  and
+throughput will be even worse.
+
+                        The two problems
+
+Two key problems with the engineering of TCP implementations have
+been  observed;  we  call  these the small-packet problem and the
+source-quench problem.  The second is being addressed by  several
+implementors; the first is generally believed (incorrectly) to be
+solved.  We have discovered that once  the  small-packet  problem
+has  been  solved,  the  source-quench  problem becomes much more
+tractable.  We thus present  the  small-packet  problem  and  our
+solution to it first.
+
+                    The small-packet problem
+
+There is a special problem associated with small  packets.   When
+TCP  is  used  for  the transmission of single-character messages
+originating at a keyboard, the typical result  is  that  41  byte
+packets  (one  byte  of data, 40 bytes of header) are transmitted
+for each byte of useful data.  This 4000%  overhead  is  annoying
+but tolerable on lightly loaded networks.  On heavily loaded net-
+works, however, the congestion resulting from this  overhead  can
+result  in  lost datagrams and retransmissions, as well as exces-
+sive propagation time caused by congestion in switching nodes and
+gateways.   In practice, throughput may drop so low that TCP con-
+nections are aborted.
+
+This classic problem is well-known and was first addressed in the
+Tymnet network in the late 1960s.  The solution used there was to
+impose a limit on the count of datagrams generated per unit time.
+This limit was enforced by delaying transmission of small packets
+
+
+RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84
+
+
+until a short (200-500ms) time had elapsed, in hope that  another
+character  or two would become available for addition to the same
+packet before the  timer  ran  out.   An  additional  feature  to
+enhance  user  acceptability was to inhibit the time delay when a
+control character, such as a carriage return, was received.
+
+This technique has been used in NCP Telnet, X.25  PADs,  and  TCP
+Telnet. It has the advantage of being well-understood, and is not
+too difficult to implement.  Its flaw is that it is hard to  come
+up  with  a  time limit that will satisfy everyone.  A time limit
+short enough to provide highly responsive service over a 10M bits
+per  second Ethernet will be too short to prevent congestion col-
+lapse over a heavily loaded net with  a  five  second  round-trip
+time;  and  conversely,  a  time  limit long enough to handle the
+heavily loaded net will produce frustrated users on the Ethernet.
+
+            The solution to the small-packet problem
+
+Clearly an adaptive approach is desirable.  One  would  expect  a
+proposal  for  an  adaptive  inter-packet time limit based on the
+round-trip delay observed by TCP.  While such a  mechanism  could
+certainly  be  implemented,  it  is  unnecessary.   A  simple and
+elegant solution has been discovered.
+
+The solution is to inhibit the sending of new TCP  segments  when
+new  outgoing  data  arrives  from  the  user  if  any previously
+transmitted data on the connection remains unacknowledged.   This
+inhibition  is  to be unconditional; no timers, tests for size of
+data received, or other conditions are required.   Implementation
+typically requires one or two lines inside a TCP program.
+
+At first glance, this solution seems to imply drastic changes  in
+the  behavior of TCP.  This is not so.  It all works out right in
+the end.  Let us see why this is so.
+
+When a user process writes to a TCP connection, TCP receives some
+data.   It  may  hold  that data for future sending or may send a
+packet immediately.  If it refrains from  sending  now,  it  will
+typically send the data later when an incoming packet arrives and
+changes the state of the system.  The state changes in one of two
+ways;  the incoming packet acknowledges old data the distant host
+has received, or announces the availability of  buffer  space  in
+the  distant  host  for  new  data.  (This last is referred to as
+"updating the window").    Each time data arrives  on  a  connec-
+tion,  TCP must reexamine its current state and perhaps send some
+packets out.  Thus, when we omit sending data on arrival from the
+user,  we  are  simply  deferring its transmission until the next
+message arrives from the distant host.   A  message  must  always
+arrive soon unless the connection was previously idle or communi-
+cations with the other end have been lost.  In  the  first  case,
+the  idle  connection,  our  scheme will result in a packet being
+sent whenever the user writes to the TCP connection.  Thus we  do
+not  deadlock  in  the idle condition.  In the second case, where
+
+
+RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84
+
+
+the distant host has failed, sending more data is futile  anyway.
+Note  that we have done nothing to inhibit normal TCP retransmis-
+sion logic, so lost messages are not a problem.
+
+Examination of the behavior of this scheme under  various  condi-
+tions  demonstrates  that the scheme does work in all cases.  The
+first case to examine is the one we wanted to solve, that of  the
+character-oriented  Telnet  connection.   Let us suppose that the
+user is sending TCP a new character every  200ms,  and  that  the
+connection  is  via  an Ethernet with a round-trip time including
+software processing of 50ms.  Without any  mechanism  to  prevent
+small-packet congestion, one packet will be sent for each charac-
+ter, and response will be optimal.  Overhead will be  4000%,  but
+this  is  acceptable  on  an Ethernet.  The classic timer scheme,
+with a limit of 2 packets per second, will  cause  two  or  three
+characters to be sent per packet.  Response will thus be degraded
+even though on a high-bandwidth  Ethernet  this  is  unnecessary.
+Overhead  will  drop  to  1500%, but on an Ethernet this is a bad
+tradeoff.  With our scheme, every character the user  types  will
+find  TCP with an idle connection, and the character will be sent
+at once, just as in the no-control case.  The user  will  see  no
+visible  delay.   Thus,  our  scheme  performs as well as the no-
+control scheme and provides better responsiveness than the  timer
+scheme.
+
+The second case to examine is the same Telnet  test  but  over  a
+long-haul  link  with  a  5-second  round trip time.  Without any
+mechanism to prevent  small-packet  congestion,  25  new  packets
+would be sent in 5 seconds.* Overhead here is  4000%.   With  the
+classic timer scheme, and the same limit of 2 packets per second,
+there would still be 10 packets outstanding and  contributing  to
+congestion.  Round-trip time will not be improved by sending many
+packets, of course; in general it will be worse since the packets
+will  contend  for line time.  Overhead now drops to 1500%.  With
+our scheme, however, the first character from the user would find
+an  idle  TCP connection and would be sent immediately.  The next
+24 characters, arriving from the user at 200ms  intervals,  would
+be  held  pending  a  message from the distant host.  When an ACK
+arrived for the first packet at the end of 5  seconds,  a  single
+packet  with  the 24 queued characters would be sent.  Our scheme
+thus results in an overhead reduction to 320% with no penalty  in
+response  time.   Response time will usually be improved with our
+scheme because packet overhead is reduced, here by  a  factor  of
+4.7 over the classic timer scheme.  Congestion will be reduced by
+this factor and round-trip delay will decrease sharply.  For this
+________
+  * This problem is not seen in the pure ARPANET case because the
+    IMPs will block the host when the count of packets
+    outstanding becomes excessive, but in the case where a pure
+    datagram local net (such as an Ethernet) or a pure datagram
+    gateway (such as an ARPANET / MILNET gateway) is involved, it
+    is possible to have large numbers of tiny packets
+    outstanding.
+
+
+RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84
+
+
+case, our scheme has a striking  advantage  over  either  of  the
+other approaches.
+
+We use our scheme for all TCP connections, not just  Telnet  con-
+nections.   Let us see what happens for a file transfer data con-
+nection using our technique. The two extreme cases will again  be
+considered.
+
+As before, we first consider the Ethernet case.  The user is  now
+writing data to TCP in 512 byte blocks as fast as TCP will accept
+them.  The user's first write to TCP will start things going; our
+first  datagram  will  be  512+40  bytes  or 552 bytes long.  The
+user's second write to TCP will not cause a send but  will  cause
+the  block  to  be buffered.  Assume that the user fills up TCP's
+outgoing buffer area before the first ACK comes back.  Then  when
+the  ACK  comes in, all queued data up to the window size will be
+sent.  From then on, the window will be kept full,  as  each  ACK
+initiates  a  sending  cycle  and queued data is sent out.  Thus,
+after a one round-trip time initial period when only one block is
+sent,  our  scheme  settles down into a maximum-throughput condi-
+tion.  The delay in startup is only 50ms on the Ethernet, so  the
+startup  transient  is  insignificant.  All three schemes provide
+equivalent performance for this case.
+
+Finally, let us look at a file transfer over the  5-second  round
+trip  time connection.  Again, only one packet will be sent until
+the first ACK comes back; the window will then be filled and kept
+full.   Since the round-trip time is 5 seconds, only 512 bytes of
+data are transmitted in the first 5 seconds.  Assuming a 2K  win-
+dow,  once  the first ACK comes in, 2K of data will be sent and a
+steady rate of 2K per 5 seconds will  be  maintained  thereafter.
+Only  for  this  case is our scheme inferior to the timer scheme,
+and the difference is only in the startup transient; steady-state
+throughput  is  identical.  The naive scheme and the timer scheme
+would both take 250 seconds to transmit a 100K  byte  file  under
+the  above  conditions  and  our scheme would take 254 seconds, a
+difference of 1.6%.
+
+Thus, for all cases examined, our scheme provides at least 98% of
+the  performance  of  both other schemes, and provides a dramatic
+improvement in Telnet performance over paths with long round trip
+times.   We  use  our  scheme  in  the  Ford  Aerospace  Software
+Engineering Network, and are able to run screen editors over Eth-
+ernet and talk to distant TOPS-20 hosts with improved performance
+in both cases.
+
+                  Congestion control with ICMP
+
+Having solved the small-packet congestion problem and with it the
+problem  of excessive small-packet congestion within our own net-
+work, we turned our attention to the problem of  general  conges-
+tion  control.   Since  our  own network is pure datagram with no
+node-to-node flow control, the only  mechanism  available  to  us
+
+
+RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84
+
+
+under  the  IP standard was the ICMP Source Quench message.  With
+careful handling,  we  find  this  adequate  to  prevent  serious
+congestion problems.  We do find it necessary to be careful about
+the behavior of our hosts and switching  nodes  regarding  Source
+Quench messages.
+
+               When to send an ICMP Source Quench
+
+The present ICMP standard* specifies that an ICMP  Source  Quench
+message  should  be  sent whenever a packet is dropped, and addi-
+tionally may be sent when a gateway finds itself  becoming  short
+of  resources.   There is some ambiguity here but clearly it is a
+violation of the standard to drop a  packet  without  sending  an
+ICMP message.
+
+Our basic assumption is that packets ought not to be dropped dur-
+ing  normal  network  operation.   We  therefore want to throttle
+senders back before they overload switching nodes  and  gateways.
+All  our  switching  nodes  send ICMP Source Quench messages well
+before buffer space is exhausted; they do not wait  until  it  is
+necessary to drop a message before sending an ICMP Source Quench.
+As demonstrated in our  analysis  of  the  small-packet  problem,
+merely  providing  large  amounts of buffering is not a solution.
+In general, our experience is that Source Quench should  be  sent
+when  about  half  the  buffering space is exhausted; this is not
+based on extensive experimentation but appears to be a reasonable
+engineering  decision.   One  could  argue for an adaptive scheme
+that adjusted the quench generation  threshold  based  on  recent
+experience; we have not found this necessary as yet.
+
+There exist other gateway implementations  that  generate  Source
+Quenches  only after more than one packet has been discarded.  We
+consider this approach undesirable since any system for  control-
+ling congestion based on the discarding of packets is wasteful of
+bandwidth and may be susceptible  to  congestion  collapse  under
+heavy  load.   Our understanding is that the decision to generate
+Source Quenches with great reluctance stems from a fear that ack-
+nowledge  traffic  will  be quenched and that this will result in
+connection failure.  As will be shown below, appropriate handling
+of  Source  Quench in host implementations eliminates this possi-
+bility.
+
+        What to do when an ICMP Source Quench is received
+
+We inform TCP or any other  protocol  at  that  layer  when  ICMP
+receives  a Source Quench.  The basic action of our TCP implemen-
+tations is to reduce the amount of data  outstanding  on  connec-
+tions to the host mentioned in the Source Quench. This control is
+________
+  * ARPANET RFC 792 is the present standard.  We are advised by
+    the Defense Communications Agency that the description of
+    ICMP in MIL-STD-1777 is incomplete and will be deleted from
+    future revision of that standard.
+
+
+RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84
+
+
+applied by causing the sending TCP to behave as  if  the  distant
+host's  window  size  has been reduced.  Our first implementation
+was simplistic but effective;  once  a  Source  Quench  has  been
+received  our  TCP behaves as if the window size is zero whenever
+the window isn't  empty.   This  behavior  continues  until  some
+number  (at  present 10) of ACKs have been received, at that time
+TCP returns to normal operation.* David Mills  of  Linkabit  Cor-
+poration  has  since  implemented  a  similar  but more elaborate
+throttle on the count of outstanding packets in his DCN  systems.
+The  additional  sophistication seems to produce a modest gain in
+throughput, but we have not made formal tests.  Both  implementa-
+tions effectively prevent congestion collapse in switching nodes.
+
+Source Quench thus has the effect of limiting the connection to a
+limited number (perhaps one) of outstanding messages.  Thus, com-
+munication can continue but at a reduced rate,  that  is  exactly
+the effect desired.
+
+This scheme has the important property that Source Quench doesn't
+inhibit  the  sending of acknowledges or retransmissions.  Imple-
+mentations of Source Quench entirely within the IP layer are usu-
+ally unsuccessful because IP lacks enough information to throttle
+a connection properly.  Holding back acknowledges tends  to  pro-
+duce  retransmissions and thus unnecessary traffic.  Holding back
+retransmissions may cause loss of a connection by  a  retransmis-
+sion  timeout.   Our  scheme  will  keep  connections alive under
+severe overload but at reduced bandwidth per connection.
+
+Other protocols at the same layer as TCP should also  be  respon-
+sive  to  Source  Quench.  In each case we would suggest that new
+traffic should be throttled but acknowledges  should  be  treated
+normally.   The only serious problem comes from the User Datagram
+Protocol, not normally a major traffic generator.   We  have  not
+implemented  any  throttling  in  these protocols as yet; all are
+passed Source Quench messages by ICMP but ignore them.
+
+                    Self-defense for gateways
+
+As we have shown, gateways are vulnerable to  host  mismanagement
+of  congestion.  Host misbehavior by excessive traffic generation
+can prevent not only the host's own traffic from getting through,
+but  can interfere with other unrelated traffic.  The problem can
+be dealt with at the host level but since one malfunctioning host
+can  interfere  with others, future gateways should be capable of
+defending themselves against such behavior by obnoxious or  mali-
+cious hosts.  We offer some basic self-defense techniques.
+
+On one occasion in late 1983, a TCP bug in an ARPANET host caused
+the  host  to  frantically  generate  retransmissions of the same
+datagram as fast as the ARPANET would accept them.   The  gateway
+________
+  * This follows the control engineering dictum  "Never bother
+    with proportional control unless bang-bang  doesn't work".
+
+
+RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84
+
+
+that connected our net with the ARPANET was saturated and  little
+useful  traffic  could  get  through,  since the gateway had more
+bandwidth to the ARPANET than to our  net.   The  gateway  busily
+sent  ICMP  Source  Quench  messages  but the malfunctioning host
+ignored them.  This continued for several hours, until  the  mal-
+functioning  host  crashed.   During this period, our network was
+effectively disconnected from the ARPANET.
+
+When a gateway is forced to  discard  a  packet,  the  packet  is
+selected  at  the  discretion of the gateway.  Classic techniques
+for making  this  decision  are  to  discard  the  most  recently
+received packet, or the packet at the end of the longest outgoing
+queue.  We suggest that a worthwhile practical measure is to dis-
+card  the  latest  packet  from the host that originated the most
+packets currently queued within the gateway.  This strategy  will
+tend  to  balance throughput amongst the hosts using the gateway.
+We have not yet tried this strategy, but it  seems  a  reasonable
+starting point for gateway self-protection.
+
+Another strategy is to discard a  newly  arrived  packet  if  the
+packet  duplicates  a  packet already in the queue.  The computa-
+tional load for this check is not a problem if hashing techniques
+are  used.   This  check will not protect against malicious hosts
+but will provide some protection against TCP implementations with
+poor  retransmission  control.   Gateways between fast local net-
+works and slower long-haul networks may find this check  valuable
+if the local hosts are tuned to work well with the local network.
+
+Ideally  the  gateway  should  detect  malfunctioning  hosts  and
+squelch them; such detection is difficult in a pure datagram sys-
+tem.  Failure to  respond  to  an  ICMP  Source  Quench  message,
+though,  should be regarded as grounds for action by a gateway to
+disconnect a host.  Detecting such failure is non-trivial but  is
+a worthwhile area for further research.
+
+                           Conclusion
+
+The congestion control problems  associated  with  pure  datagram
+networks  are  difficult, but effective solutions exist.  If IP /
+TCP networks are to be operated under heavy load, TCP implementa-
+tions  must address several key issues in ways at least as effec-
+tive as the ones described here.
+
--- a/kernel/picotcp/RFC/rfc0964.txt
+++ b/kernel/picotcp/RFC/rfc0964.txt
@ -0,0 +1,570 @@
+
+
+Network Working Group                                 Deepinder P. Sidhu
+Request for Comments: 964                               Thomas P. Blumer
+                                               SDC - A Burroughs Company
+                                                           November 1985
+
+              SOME PROBLEMS WITH THE SPECIFICATION OF THE
+            MILITARY STANDARD TRANSMISSION CONTROL PROTOCOL
+
+
+STATUS OF THIS MEMO
+
+   The purpose of this RFC is to provide helpful information on the
+   Military Standard Transmission Control Protocol (MIL-STD-1778) so
+   that one can obtain a reliable implementation of this protocol
+   standard. Distribution of this note is unlimited.
+
+      Reprinted from: Proc. Protocol Specification, Testing and
+      Verification IV, (ed.) Y. Yemini, et al, North-Holland (1984).
+
+ABSTRACT
+
+   This note points out three errors with the specification of the
+   Military Standard Transmission Control Protocol (MIL-STD-1778, dated
+   August 1983 [MILS83]).  These results are based on an initial
+   investigation of this protocol standard.  The first problem is that
+   data accompanying a SYN can not be accepted because of errors in the
+   acceptance policy.  The second problem is that no retransmission
+   timer is set for a SYN packet, and therefore the SYN will not be
+   retransmitted if it is lost.  The third problem is that when the
+   connection has been established, neither entity takes the proper
+   steps to accept incoming data.  This note also proposes solutions to
+   these problems.
+
+1.  Introduction
+
+   In recent years, much progress has been made in creating an
+   integrated set of tools for developing reliable communication
+   protocols.  These tools provide assistance in the specification,
+   verification, implementation and testing of protocols.  Several
+   protocols have been analyzed and developed using such tools.
+
+   In a recent paper, the authors discussed the verification of the
+   connection management of NBS class 4 transport protocol (TP4).  The
+   verification was carried out with the help of a software tool we
+   developed [BLUT82] [BLUT83] [SIDD83].  In spite of the very precise
+   specification of this protocol, our analysis discovered several
+   errors in the current specification of NBS TP4.  These errors are
+   incompleteness errors in the specification, that is, states where
+   there is no transition for the reception of some input event.  Our
+   analysis did not find deadlocks, livelocks or any other problem in
+   the connection management of TP4.  In that paper, we proposed
+
+
+Sidhu & Blumer                                                  [Page 1]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+   solutions for all errors except for errors associated with 2 states
+   whose satisfactory resolution may require redesigning parts of TP4.
+   Modifications to TP4 specification are currently underway to solve
+   the remaining incompleteness problems with 2 states.  It is important
+   to emphasize that we did not find any obvious error in the NBS
+   specification of TP4.
+
+   The authors are currently working on the verification of connection
+   management of the Military Standard Transmission Control Protocol
+   (TCP).  This analysis will be based on the published specification
+   [MILS83] of TCP dated 12 August 1983.
+
+   While studying the MIL standard TCP specification in preparation for
+   our analysis of the connection management features, we have noticed
+   several errors in the specification.  As a consequence of these
+   errors, the Transmission Control Protocol (as specified in [MILS83])
+   will not permit data to be received by TCP entities in SYN_RECVD and
+   ESTAB states.
+
+   The proof of this statement follows from the specification of the
+   three-way handshake mechanism of TCP [MILS83] and from a decision
+   table associated with ESTAB state.
+
+2.  Transmission Control Protocol
+
+   The Transmission Control Protocol (TCP) is a transport level
+   connection-oriented protocol in the DoD protocol hierarchy for use in
+   packet-switched and other networks.  Its most important services are
+   reliable transfer and ordered delivery of data over full-duplex and
+   flow-controlled virtual connections.  TCP is designed to operate
+   successfully over channels that are inherently unreliable, i.e., they
+   can lose, damage, duplicate, and reorder packets.
+
+   TCP is based, in part, on a protocol discussed by Cerf and Kahn
+   [CERV74].  Over the years, DARPA has supported specifications of
+   several versions of this protocol, the last one appeared in [POSJ81].
+   Some issues in the connection management of this protocol are
+   discussed in [SUNC78].
+
+   A few years ago, DCA decided to standardize TCP for use in DoD
+   networks and supported formal specification of this protocol
+   following the design of this protocol discussed in [POSJ81]. A
+   detailed specification of this protocol given in [MILS83] has been
+   adopted as the DoD standard for the Transmission Control Protocol, a
+   reliable connection-oriented transport protocol for DoD networks.
+
+   A TCP connection progresses through three phases: opening (or
+
+
+Sidhu & Blumer                                                  [Page 2]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+   synchronization), maintenance, and closing.  In this note we consider
+   data transfer in the opening and maintenance phases of the
+   connection.
+
+3.  Problems with MIL Standard TCP
+
+   One basic feature of TCP is the three-way handshake which is used to
+   set up a properly synchronized connection between two remote TCP
+   entities.  This mechanism is incorrectly specified in the current
+   specification of TCP.  One problem is that data associated with the
+   SYN packet can not be delivered.  This results from an incorrect
+   specification of the interaction between the accept_policy action
+   procedure and the record_syn action procedure.  Neither of the 2
+   possible strategies suggested in accept_policy will give the correct
+   result when called from the record_syn procedure, because the
+   recv_next variable is updated in record_syn before the accept_policy
+   procedure is called.
+
+   Another problem with the specification of the three-way handshake is
+   apparent in the actions listed for the Active Open event (with or
+   without data) when in the CLOSED state.  No retransmission timer is
+   set in these actions, and therefore if the initial SYN is lost, there
+   will be no timer expiration to trigger retransmission.  This will
+   prevent connection establishment if the initial SYN packet is lost by
+   the network.
+
+   The third problem with the specification is that the actions for
+   receiving data in the ESTAB state are incorrect.  The accept action
+   procedure must be called when data is received, so that arriving data
+   may be queued and possibly passed to the user.
+
+   A general problem with this specification is that the program
+   language and action table portions of the specification were clearly
+   not checked by any automatic syntax checking process.  Several
+   variable and procedure names are misspelled, and the syntax of the
+   action statements is often incorrect.  This can be confusing,
+   especially when a procedure name cannot be found in the alphabetized
+   list of procedures because of misspelling.
+
+   These are some of the very serious errors that we have discovered
+   with the MIL standard TCP.
+
+
+
+
+
+
+
+
+Sidhu & Blumer                                                  [Page 3]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+4.  Detailed Discussion of the Problem
+
+   Problem 1:  Problem with Receiving Data Accompanying SYN
+
+      The following scenario traces the actions of 2 communicating
+      entities during the establishment of a connection.  Only the
+      simplest case is considered, i.e., the case where the connection
+      is established by the exchange of 3 segments.
+
+      TCP entity A                                          TCP entity B
+      ------------                                          ------------
+
+      state                segment         segment          state
+      transition           recvd or sent   recvd or sent    transition
+                           by A            by B
+
+                                                        CLOSED -> LISTEN
+
+      CLOSED -> SYN_SENT   SYN -->
+
+                                           SYN -->   LISTEN -> SYN_RECVD
+                                           <-- SYN ACK
+
+      SYN_SENT -> ESTAB    <-- SYN ACK
+                           ACK -->
+
+                                           ACK -->    SYN_RECVD -> ESTAB
+
+      As shown in the above diagram, 5 state transitions occur and 3 TCP
+      segments are exchanged during the simplest case of the three-way
+      handshake.  We now examine in detail the actions of each entity
+      during this exchange.  Special attention is given to the sequence
+      numbers carried in each packet and recorded in the state variables
+      of each entity.
+
+      In the diagram below, the actions occurring within a procedure are
+      shown indented from the procedure call.  The resulting values of
+      sequence number variables are shown in square brackets to the
+      right of each statement.  The sequence number variables are shown
+      with the entity name (A or B) as prefix so that the two sets of
+      state variables may be easily distinguished.
+
+
+
+
+
+
+
+
+Sidhu & Blumer                                                  [Page 4]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+      Transition 1 (entity B goes from state CLOSED to state LISTEN).
+      The user associated with entity B issues a Passive Open.
+
+         Actions: (see p. 104)
+            open; (see p. 144)
+            new state := LISTEN;
+
+      Transition 2 (entity A goes from state CLOSED to SYN_SENT). The
+      user associated with entity A issues an Active Open with Data.
+
+         Actions: (see p. 104)
+            open; (see p. 144)
+            gen_syn(WITH_DATA); (see p. 141)
+               send_isn := gen_isn();                 [A.send_isn = 100]
+               send_next := send_isn + 1;            [A.send_next = 101]
+               send_una := send_isn;                  [A.send_una = 100]
+               seg.seq_num := send_isn;              [seg.seq_num = 100]
+               seg.ack_flag := FALSE;             [seg.ack_flag = FALSE]
+               seg.wndw := 0;                             [seg.wndw = 0]
+               amount := send_policy()               [assume amount > 0]
+            new state := SYN_SENT;
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sidhu & Blumer                                                  [Page 5]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+      Transition 3 (Entity B goes from state LISTEN to state SYN_RECVD).
+      Entity B receives the SYN segment accompanying data sent by entity
+      A.
+
+         Actions: (see p. 106)
+            (since this segment has no RESET, no ACK, does have SYN, and
+            we assume reasonable security and precedence parameters, row
+            3 of the table applies)
+            record_syn; (see p. 147)
+               recv_isn := seg.seq_num; [B.recv_isn = seg_seq_num = 100]
+               recv_next := recv_isn + 1;            [B.recv_next = 101]
+               if seg.ack_flag then
+                  send_una := seg.ack_num;                   [no change]
+               accept_policy; (see p. 131)
+                  Accept in-order data only:
+                     Acceptance Test is
+                        seg.seq_num = recv_next;
+                  Accept any data within the receive window:
+                     Acceptance Test has two parts
+                        recv_next =< seg.seq_num =< recv_next +
+                                                               recv_wndw
+                        or
+                        recv_next =< seg.seq_num + length =<
+                                                   recv_next + recv_wndw
+                        ********************************************
+                           An error occurs here, with either possible
+                           strategy given in accept_policy, because
+                           recv_next > seg.seq_num.  Therefore
+                           accept_policy will incorrectly indicate that
+                           the data cannot be accepted.
+                        ********************************************
+            gen_syn(WITH_ACK); (see p. 141)
+               send_isn := gen_isn();                 [B.send_isn = 300]
+               send_next := send_isn + 1;            [B.send_next = 301]
+               send_una := send_isn;                  [B.send_una = 300]
+               seg.seq_num := send_next;             [seg.seq_num = 301]
+               seg.ack_flag := TRUE;               [seg.ack_flag = TRUE]
+               seg.ack_num := recv_isn + 1;          [seg.ack_num = 102]
+            new state := SYN_RECVD;
+
+
+
+
+
+
+
+
+
+
+Sidhu & Blumer                                                  [Page 6]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+      Transition 4 (entity A goes from state SYN_SENT to ESTAB) Entity A
+      receives the SYN ACK sent by entity B.
+
+         Actions: (see p. 107)
+            In order to select the applicable row of the table on p.
+            107, we first evaluate the decision function
+            ACK_status_test1.
+               ACK_status_test1();
+                  if(seg.ack_flag = FALSE) then
+                     return(NONE);
+                  if(seg.ack_num <= send_una) or
+                     (seg.ack_num > send_next) then
+                        return(INVALID)
+                  else
+                     return(VALID);
+
+                  ... and so on.
+
+      The important thing to notice in the above scenario is the error
+      that occurs in transition 3, where the wrong value for recv_next
+      leads to the routine record_syn refusing to accept the data.
+
+   Problem 2:  Problem with Retransmission of SYN Packet
+
+      The actions listed for Active Open (with or without data; see p.
+      103) are calls to the routines open and gen_syn.  Neither of these
+      routines (or routines that they call) explicitly sets a
+      retransmission timer.  Therefore if the initial SYN is lost there
+      is no timer expiration to trigger retransmission of the SYN.  If
+      this happens, the TCP will fail in its attempt to establish the
+      desired connection with a remote TCP.
+
+      Note that this differs with the actions specified for transmission
+      of data from the ESTAB state.  In that transition the routine
+      dispatch (p. 137) is called first which in turn calls the routine
+      send_new_data (p.  156).  One of actions of the last routine is to
+      start a retransmission timer for the newly sent data.
+
+
+
+
+
+
+
+
+
+
+
+
+Sidhu & Blumer                                                  [Page 7]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+   Problem 3:  Problem with Receiving Data in TCP ESTAB State
+
+      When both entities are in the state ESTAB, and one sends data to
+      the other, an error in the actions of the receiver prohibits the
+      data from being accepted.  The following simple scenario
+      illustrates the problem.  Here the user associated with entity A
+      issues a Send request, and A sends data to entity B.  When B
+      receives the data it replies with an acknowledgment.
+
+      TCP entity A                                          TCP entity B
+      ------------                                          ------------
+
+      state                segment         segment          state
+      transition           recvd or sent   recvd or sent    transition
+                           by A            by B
+
+      ESTAB -> ESTAB       DATA -->
+
+                                           DATA -->       ESTAB -> ESTAB
+                                           <-- ACK
+
+      Transition 1 (entity A goes from state ESTAB to ESTAB) Entity A
+      sends data packet to entity B.
+
+         Actions: (see p. 110)
+            dispatch; (see p. 137)
+
+      Transition 2 (entity B goes from state ESTAB to ESTAB) Entity B
+      receives data packet from entity B.
+
+         Actions: (see p. 111)
+            Assuming the data is in order and valid, we use row 6 of the
+            table.
+            update; (see p. 159)
+            ************************************************************
+               An error occurs here, because the routine update does
+               nothing to accept the incoming data, or to arrange to
+               pass it on to the user.
+            ************************************************************
+
+
+
+
+
+
+
+
+
+
+Sidhu & Blumer                                                  [Page 8]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+5.  Solutions to Problems
+
+   The problem with record_syn and accept_policy can be solved by having
+   record_syn call accept_policy before the variable recv_next is
+   updated.
+
+   The problem with gen_syn can be corrected by having gen_syn or open
+   explicitly request the retransmission timer.
+
+   The problem with the reception of data in the ESTAB state is
+   apparently caused by the transposition of the action tables on pages
+   111 and 112.  These tables should be interchanged.  This solution
+   will also correct a related problem, namely that an entity can never
+   reach the CLOSE_WAIT state from the ESTAB state.
+
+   Syntax errors in the action statements and tables could be easily
+   caught by an automatic syntax checker if the document used a more
+   formal description technique.  This would be difficult to do for
+   [MILS83] since this document is not based on a formalized description
+   technique [BREM83].
+
+   The errors pointed out in this note have been submitted to DCA and
+   will be corrected in the next update of the MIL STD TCP
+   specification.
+
+6.  Implementation of MIL Standard TCP
+
+   In the discussion above, we pointed out several serious errors in the
+   specification of the Military Standard Transmission Control Protocol
+   [MILS83].  These errors imply that a TCP implementation that
+   faithfully conforms to the Military TCP standard will not be able to
+
+      Receive data sent with a SYN packet.
+
+      Establish a connection if the initial SYN packet is lost.
+
+      Receive data when in the ESTAB state.
+
+   It also follows from our discussion that an implementation of MIL
+   Standard TCP [MILS83] must include corrections mentioned above to get
+   a running TCP.
+
+   The problems pointed out in this paper with the current specification
+   of the MIL Standard TCP [MILS83] are based on an initial
+   investigation of this protocol standard by the authors.
+
+
+
+
+Sidhu & Blumer                                                  [Page 9]
+
+
+
+RFC 964                                                    November 1985
+Some Problems with MIL-STD TCP
+
+
+REFERENCES
+
+   [BLUT83]  Blumer, T. P., and Sidhu, D. P., "Mechanical Verification
+             and Automatic Implementation of Authentication Protocols
+             for Computer Networks", SDC Burroughs Report (1983),
+             submitted for publication.
+
+   [BLUT82]  Blumer, T. P., and Tenney, R. L., "A Formal Specification
+             Technique and Implementation Method for Protocols",
+             Computer Networks, Vol. 6, No. 3, July 1982, pp. 201-217.
+
+   [BREM83]  Breslin, M., Pollack, R. and Sidhu D. P., "Formalization of
+             DoD Protocol Specification Technique", SDC - Burroughs
+             Report 1983.
+
+   [CERV74]  Cerf, V., and Kahn, R., "A Protocol for Packet Network
+             Interconnection", IEEE Trans. Comm., May 1974.
+
+   [MILS83]  "Military Standard Transmission Control Protocol",
+             MIL-STD-1778, 12 August 1983.
+
+   [POSJ81]  Postel, J. (ed.), "DoD Standard Transmission Control
+             Protocol", Defense Advanced Research Projects Agency,
+             Information Processing Techniques Office, RFC-793,
+             September 1981.
+
+   [SIDD83]  Sidhu, D. P., and Blumer, T. P., "Verification of NBS Class
+             4 Transport Protocol", SDC Burroughs Report (1983),
+             submitted for publication.
+
+   [SUNC78]  Sunshine, C., and Dalal, Y., "Connection Management in
+             Transport Protocols", Computer Networks, Vol. 2, pp.454-473
+             (1978).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sidhu & Blumer                                                 [Page 10]
+
--- a/kernel/picotcp/RFC/rfc1066.txt
+++ b/kernel/picotcp/RFC/rfc1066.txt
--- a/kernel/picotcp/RFC/rfc1071.txt
+++ b/kernel/picotcp/RFC/rfc1071.txt
--- a/kernel/picotcp/RFC/rfc1072.txt
+++ b/kernel/picotcp/RFC/rfc1072.txt
@ -0,0 +1,893 @@
+Network Working Group                                        V. Jacobson
+Request for Comments: 1072                                           LBL
+                                                               R. Braden
+                                                                     ISI
+                                                            October 1988
+
+
+                  TCP Extensions for Long-Delay Paths
+
+
+Status of This Memo
+
+   This memo proposes a set of extensions to the TCP protocol to provide
+   efficient operation over a path with a high bandwidth*delay product.
+   These extensions are not proposed as an Internet standard at this
+   time.  Instead, they are intended as a basis for further
+   experimentation and research on transport protocol performance.
+   Distribution of this memo is unlimited.
+
+1. INTRODUCTION
+
+   Recent work on TCP performance has shown that TCP can work well over
+   a variety of Internet paths, ranging from 800 Mbit/sec I/O channels
+   to 300 bit/sec dial-up modems [Jacobson88].  However, there is still
+   a fundamental TCP performance bottleneck for one transmission regime:
+   paths with high bandwidth and long round-trip delays.  The
+   significant parameter is the product of bandwidth (bits per second)
+   and round-trip delay (RTT in seconds); this product is the number of
+   bits it takes to "fill the pipe", i.e., the amount of unacknowledged
+   data that TCP must handle in order to keep the pipeline full.  TCP
+   performance problems arise when this product is large, e.g.,
+   significantly exceeds 10**5 bits.  We will refer to an Internet path
+   operating in this region as a "long, fat pipe", and a network
+   containing this path as an "LFN" (pronounced "elephan(t)").
+
+   High-capacity packet satellite channels (e.g., DARPA's Wideband Net)
+   are LFN's.  For example, a T1-speed satellite channel has a
+   bandwidth*delay product of 10**6 bits or more; this corresponds to
+   100 outstanding TCP segments of 1200 bytes each!  Proposed future
+   terrestrial fiber-optical paths will also fall into the LFN class;
+   for example, a cross-country delay of 30 ms at a DS3 bandwidth
+   (45Mbps) also exceeds 10**6 bits.
+
+   Clever algorithms alone will not give us good TCP performance over
+   LFN's; it will be necessary to actually extend the protocol.  This
+   RFC proposes a set of TCP extensions for this purpose.
+
+   There are three fundamental problems with the current TCP over LFN
+
+
+
+Jacobson & Braden                                               [Page 1]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+   paths:
+
+
+   (1)  Window Size Limitation
+
+        The TCP header uses a 16 bit field to report the receive window
+        size to the sender.  Therefore, the largest window that can be
+        used is 2**16 = 65K bytes.  (In practice, some TCP
+        implementations will "break" for windows exceeding 2**15,
+        because of their failure to do unsigned arithmetic).
+
+        To circumvent this problem, we propose a new TCP option to allow
+        windows larger than 2**16. This option will define an implicit
+        scale factor, to be used to multiply the window size value found
+        in a TCP header to obtain the true window size.
+
+
+   (2)  Cumulative Acknowledgments
+
+        Any packet losses in an LFN can have a catastrophic effect on
+        throughput.  This effect is exaggerated by the simple cumulative
+        acknowledgment of TCP.  Whenever a segment is lost, the
+        transmitting TCP will (eventually) time out and retransmit the
+        missing segment. However, the sending TCP has no information
+        about segments that may have reached the receiver and been
+        queued because they were not at the left window edge, so it may
+        be forced to retransmit these segments unnecessarily.
+
+        We propose a TCP extension to implement selective
+        acknowledgements.  By sending selective acknowledgments, the
+        receiver of data can inform the sender about all segments that
+        have arrived successfully, so the sender need retransmit only
+        the segments that have actually been lost.
+
+        Selective acknowledgments have been included in a number of
+        experimental Internet protocols -- VMTP [Cheriton88], NETBLT
+        [Clark87], and RDP [Velten84].  There is some empirical evidence
+        in favor of selective acknowledgments -- simple experiments with
+        RDP have shown that disabling the selective acknowlegment
+        facility greatly increases the number of retransmitted segments
+        over a lossy, high-delay Internet path [Partridge87].  A
+        simulation study of a simple form of selective acknowledgments
+        added to the ISO transport protocol TP4 also showed promise of
+        performance improvement [NBS85].
+
+
+
+
+
+
+
+Jacobson & Braden                                               [Page 2]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+   (3)  Round Trip Timing
+
+        TCP implements reliable data delivery by measuring the RTT,
+        i.e., the time interval between sending a segment and receiving
+        an acknowledgment for it, and retransmitting any segments that
+        are not acknowledged within some small multiple of the average
+        RTT.  Experience has shown that accurate, current RTT estimates
+        are necessary to adapt to changing traffic conditions and,
+        without them, a busy network is subject to an instability known
+        as "congestion collapse" [Nagle84].
+
+        In part because TCP segments may be repacketized upon
+        retransmission, and in part because of complications due to the
+        cumulative TCP acknowledgement, measuring a segments's RTT may
+        involve a non-trivial amount of computation in some
+        implementations.  To minimize this computation, some
+        implementations time only one segment per window.  While this
+        yields an adequate approximation to the RTT for small windows
+        (e.g., a 4 to 8 segment Arpanet window), for an LFN (e.g., 100
+        segment Wideband  Network windows) it results in an unacceptably
+        poor RTT estimate.
+
+        In the presence of errors, the problem becomes worse.  Zhang
+        [Zhang86], Jain [Jain86] and Karn [Karn87] have shown that it is
+        not possible to accumulate reliable RTT estimates if
+        retransmitted segments are included in the estimate.  Since a
+        full window of data will have been transmitted prior to a
+        retransmission, all of the segments in that window will have to
+        be ACKed before the next RTT sample can be taken.  This means at
+        least an additional window's worth of time between RTT
+        measurements and, as the error rate approaches one per window of
+        data (e.g., 10**-6 errors per bit for the Wideband Net), it
+        becomes effectively impossible to obtain an RTT measurement.
+
+        We propose a TCP "echo" option that allows each segment to carry
+        its own timestamp.  This will allow every segment, including
+        retransmissions, to be timed at negligible computational cost.
+
+
+   In designing new TCP options, we must pay careful attention to
+   interoperability with existing implementations.  The only TCP option
+   defined to date is an "initial option", i.e., it may appear only on a
+   SYN segment.  It is likely that most implementations will properly
+   ignore any options in the SYN segment that they do not understand, so
+   new initial options should not cause a problem.  On the other hand,
+   we fear that receiving unexpected non-initial options may cause some
+   TCP's to crash.
+
+
+
+
+Jacobson & Braden                                               [Page 3]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+   Therefore, in each of the extensions we propose, non-initial options
+   may be sent only if an exchange of initial options has indicated that
+   both sides understand the extension.  This approach will also allow a
+   TCP to determine when the connection opens how big a TCP header it
+   will be sending.
+
+2. TCP WINDOW SCALE OPTION
+
+   The obvious way to implement a window scale factor would be to define
+   a new TCP option that could be included in any segment specifying a
+   window.  The receiver would include it in every acknowledgment
+   segment, and the sender would interpret it.  Unfortunately, this
+   simple approach would not work.  The sender must reliably know the
+   receiver's current scale factor, but a TCP option in an
+   acknowledgement segment will not be delivered reliably (unless the
+   ACK happens to be piggy-backed on data).
+
+   However, SYN segments are always sent reliably, suggesting that each
+   side may communicate its window scale factor in an initial TCP
+   option.  This approach has a disadvantage: the scale must be
+   established when the connection is opened, and cannot be changed
+   thereafter.  However, other alternatives would be much more
+   complicated, and we therefore propose a new initial option called
+   Window Scale.
+
+2.1  Window Scale Option
+
+      This three-byte option may be sent in a SYN segment by a TCP (1)
+      to indicate that it is prepared to do both send and receive window
+      scaling, and (2) to communicate a scale factor to be applied to
+      its receive window.  The scale factor is encoded logarithmically,
+      as a power of 2 (presumably to be implemented by binary shifts).
+
+      Note: the window in the SYN segment itself is never scaled.
+
+      TCP Window Scale Option:
+
+      Kind: 3
+
+             +---------+---------+---------+
+             | Kind=3  |Length=3 |shift.cnt|
+             +---------+---------+---------+
+
+      Here shift.cnt is the number of bits by which the receiver right-
+      shifts the true receive-window value, to scale it into a 16-bit
+      value to be sent in TCP header (this scaling is explained below).
+      The value shift.cnt may be zero (offering to scale, while applying
+      a scale factor of 1 to the receive window).
+
+
+
+Jacobson & Braden                                               [Page 4]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+      This option is an offer, not a promise; both sides must send
+      Window Scale options in their SYN segments to enable window
+      scaling in either direction.
+
+2.2  Using the Window Scale Option
+
+      A model implementation of window scaling is as follows, using the
+      notation of RFC-793 [Postel81]:
+
+      *    The send-window (SND.WND) and receive-window (RCV.WND) sizes
+           in the connection state block and in all sequence space
+           calculations are expanded from 16 to 32 bits.
+
+      *    Two window shift counts are added to the connection state:
+           snd.scale and rcv.scale.  These are shift counts to be
+           applied to the incoming and outgoing windows, respectively.
+           The precise algorithm is shown below.
+
+      *    All outgoing SYN segments are sent with the Window Scale
+           option, containing a value shift.cnt = R that the TCP would
+           like to use for its receive window.
+
+      *    Snd.scale and rcv.scale are initialized to zero, and are
+           changed only during processing of a received SYN segment.  If
+           the SYN segment contains a Window Scale option with shift.cnt
+           = S, set snd.scale to S and set rcv.scale to R; otherwise,
+           both snd.scale and rcv.scale are left at zero.
+
+      *    The window field (SEG.WND) in the header of every incoming
+           segment, with the exception of SYN segments, will be left-
+           shifted by snd.scale bits before updating SND.WND:
+
+              SND.WND = SEG.WND << snd.scale
+
+           (assuming the other conditions of RFC793 are met, and using
+           the "C" notation "<<" for left-shift).
+
+      *    The window field (SEG.WND) of every outgoing segment, with
+           the exception of SYN segments, will have been right-shifted
+           by rcv.scale bits:
+
+              SEG.WND = RCV.WND >> rcv.scale.
+
+
+      TCP determines if a data segment is "old" or "new" by testing if
+      its sequence number is within 2**31 bytes of the left edge of the
+      window.  If not, the data is "old" and discarded.  To insure that
+      new data is never mistakenly considered old and vice-versa, the
+
+
+
+Jacobson & Braden                                               [Page 5]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+      left edge of the sender's window has to be at least 2**31 away
+      from the right edge of the receiver's window.  Similarly with the
+      sender's right edge and receiver's left edge.  Since the right and
+      left edges of either the sender's or receiver's window differ by
+      the window size, and since the sender and receiver windows can be
+      out of phase by at most the window size, the above constraints
+      imply that 2 * the max window size must be less than 2**31, or
+
+           max window < 2**30
+
+      Since the max window is 2**S (where S is the scaling shift count)
+      times at most 2**16 - 1 (the maximum unscaled window), the maximum
+      window is guaranteed to be < 2*30 if S <= 14.  Thus, the shift
+      count must be limited to 14.  (This allows windows of 2**30 = 1
+      Gbyte.)  If a Window Scale option is received with a shift.cnt
+      value exceeding 14, the TCP should log the error but use 14
+      instead of the specified value.
+
+
+3. TCP SELECTIVE ACKNOWLEDGMENT OPTIONS
+
+   To minimize the impact on the TCP protocol, the selective
+   acknowledgment extension uses the form of two new TCP options. The
+   first is an enabling option, "SACK-permitted", that may be sent in a
+   SYN segment to indicate that the the SACK option may be used once the
+   connection is established.  The other is the SACK option itself,
+   which may be sent over an established connection once permission has
+   been given by SACK-permitted.
+
+   The SACK option is to be included in a segment sent from a TCP that
+   is receiving data to the TCP that is sending that data; we will refer
+   to these TCP's as the data receiver and the data sender,
+   respectively.  We will consider a particular simplex data flow; any
+   data flowing in the reverse direction over the same connection can be
+   treated independently.
+
+3.1  SACK-Permitted Option
+
+      This two-byte option may be sent in a SYN by a TCP that has been
+      extended to receive (and presumably process) the SACK option once
+      the connection has opened.
+
+
+
+
+
+
+
+
+
+
+Jacobson & Braden                                               [Page 6]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+      TCP Sack-Permitted Option:
+
+      Kind: 4
+
+             +---------+---------+
+             | Kind=4  | Length=2|
+             +---------+---------+
+
+3.2  SACK Option
+
+      The SACK option is to be used to convey extended acknowledgment
+      information over an established connection.  Specifically, it is
+      to be sent by a data receiver to inform the data transmitter of
+      non-contiguous blocks of data that have been received and queued.
+      The data receiver is awaiting the receipt of data in later
+      retransmissions to fill the gaps in sequence space between these
+      blocks.  At that time, the data receiver will acknowledge the data
+      normally by advancing the left window edge in the Acknowledgment
+      Number field of the TCP header.
+
+      It is important to understand that the SACK option will not change
+      the meaning of the Acknowledgment Number field, whose value will
+      still specify the left window edge, i.e., one byte beyond the last
+      sequence number of fully-received data.  The SACK option is
+      advisory; if it is ignored, TCP acknowledgments will continue to
+      function as specified in the protocol.
+
+      However, SACK will provide additional information that the data
+      transmitter can use to optimize retransmissions.  The TCP data
+      receiver may include the SACK option in an acknowledgment segment
+      whenever it has data that is queued and unacknowledged.  Of
+      course, the SACK option may be sent only when the TCP has received
+      the SACK-permitted option in the SYN segment for that connection.
+
+      TCP SACK Option:
+
+      Kind: 5
+
+      Length: Variable
+
+
+       +--------+--------+--------+--------+--------+--------+...---+
+       | Kind=5 | Length | Relative Origin |   Block Size    |      |
+       +--------+--------+--------+--------+--------+--------+...---+
+
+
+      This option contains a list of the blocks of contiguous sequence
+      space occupied by data that has been received and queued within
+
+
+
+Jacobson & Braden                                               [Page 7]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+      the window.  Each block is contiguous and isolated; that is, the
+      octets just below the block,
+
+             Acknowledgment Number + Relative Origin -1,
+
+      and just above the block,
+
+             Acknowledgment Number + Relative Origin + Block Size,
+
+      have not been received.
+
+      Each contiguous block of data queued at the receiver is defined in
+      the SACK option by two 16-bit integers:
+
+
+      *    Relative Origin
+
+           This is the first sequence number of this block, relative to
+           the Acknowledgment Number field in the TCP header (i.e.,
+           relative to the data receiver's left window edge).
+
+
+      *    Block Size
+
+           This is the size in octets of this block of contiguous data.
+
+
+      A SACK option that specifies n blocks will have a length of 4*n+2
+      octets, so the 44 bytes available for TCP options can specify a
+      maximum of 10 blocks.  Of course, if other TCP options are
+      introduced, they will compete for the 44 bytes, and the limit of
+      10 may be reduced in particular segments.
+
+      There is no requirement on the order in which blocks can appear in
+      a single SACK option.
+
+         Note: requiring that the blocks be ordered would allow a
+         slightly more efficient algorithm in the transmitter; however,
+         this does not seem to be an important optimization.
+
+3.3  SACK with Window Scaling
+
+      If window scaling is in effect, then 16 bits may not be sufficient
+      for the SACK option fields that define the origin and length of a
+      block.  There are two possible ways to handle this:
+
+      (1)  Expand the SACK origin and length fields to 24 or 32 bits.
+
+
+
+
+Jacobson & Braden                                               [Page 8]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+      (2)  Scale the SACK fields by the same factor as the window.
+
+
+      The first alternative would significantly reduce the number of
+      blocks possible in a SACK option; therefore, we have chosen the
+      second alternative, scaling the SACK information as well as the
+      window.
+
+      Scaling the SACK information introduces some loss of precision,
+      since a SACK option must report queued data blocks whose origins
+      and lengths are multiples of the window scale factor rcv.scale.
+      These reported blocks must be equal to or smaller than the actual
+      blocks of queued data.
+
+      Specifically, suppose that the receiver has a contiguous block of
+      queued data that occupies sequence numbers L, L+1, ... L+N-1, and
+      that the window scale factor is S = rcv.scale.  Then the
+      corresponding block that will be reported in a SACK option will
+      be:
+
+         Relative Origin = int((L+S-1)/S)
+
+         Block Size = int((L+N)/S) - (Relative Origin)
+
+      where the function int(x) returns the greatest integer contained
+      in x.
+
+      The resulting loss of precision is not a serious problem for the
+      sender.  If the data-sending TCP keeps track of the boundaries of
+      all segments in its retransmission queue, it will generally be
+      able to infer from the imprecise SACK data which full segments
+      don't need to be retransmitted.  This will fail only if S is
+      larger than the maximum segment size, in which case some segments
+      may be retransmitted unnecessarily.  If the sending TCP does not
+      keep track of transmitted segment boundaries, the imprecision of
+      the scaled SACK quantities will only result in retransmitting a
+      small amount of unneeded sequence space.  On the average, the data
+      sender will unnecessarily retransmit J*S bytes of the sequence
+      space for each SACK received; here J is the number of blocks
+      reported in the SACK, and S = snd.scale.
+
+3.4  SACK Option Examples
+
+      Assume the left window edge is 5000 and that the data transmitter
+      sends a burst of 8 segments, each containing 500 data bytes.
+      Unless specified otherwise, we assume that the scale factor S = 1.
+
+
+
+
+
+Jacobson & Braden                                               [Page 9]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+           Case 1: The first 4 segments are received but the last 4 are
+           dropped.
+
+           The data receiver will return a normal TCP ACK segment
+           acknowledging sequence number 7000, with no SACK option.
+
+
+           Case 2:  The first segment is dropped but the remaining 7 are
+           received.
+
+           The data receiver will return a TCP ACK segment that
+           acknowledges sequence number 5000 and contains a SACK option
+           specifying one block of queued data:
+
+                   Relative Origin = 500;  Block Size = 3500
+
+
+           Case 3:  The 2nd, 4th, 6th, and 8th (last) segments are
+           dropped.
+
+           The data receiver will return a TCP ACK segment that
+           acknowledges sequence number 5500 and contains a SACK option
+           specifying the 3 blocks:
+
+                   Relative Origin =  500;  Block Size = 500
+                   Relative Origin = 1500;  Block Size = 500
+                   Relative Origin = 2500;  Block Size = 500
+
+
+           Case 4: Same as Case 3, except Scale Factor S = 16.
+
+           The SACK option would specify the 3 scaled blocks:
+
+                   Relative Origin =   32;  Block Size = 30
+                   Relative Origin =   94;  Block Size = 31
+                   Relative Origin =  157;  Block Size = 30
+
+           These three reported blocks have sequence numbers 512 through
+           991, 1504 through 1999, and 2512 through 2992, respectively.
+
+
+3.5  Generating the SACK Option
+
+      Let us assume that the data receiver maintains a queue of valid
+      segments that it has neither passed to the user nor acknowledged
+      because of earlier missing data, and that this queue is ordered by
+      starting sequence number.  Computation of the SACK option can be
+      done with one pass down this queue.  Segments that occupy
+
+
+
+Jacobson & Braden                                              [Page 10]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+      contiguous sequence space are aggregated into a single SACK block,
+      and each gap in the sequence space (except a gap that is
+      terminated by the right window edge) triggers the start of a new
+      SACK block.  If this algorithm defines more than 10 blocks, only
+      the first 10 can be included in the option.
+
+3.6  Interpreting the SACK Option
+
+      The data transmitter is assumed to have a retransmission queue
+      that contains the segments that have been transmitted but not yet
+      acknowledged, in sequence-number order.  If the data transmitter
+      performs re-packetization before retransmission, the block
+      boundaries in a SACK option that it receives may not fall on
+      boundaries of segments in the retransmission queue; however, this
+      does not pose a serious difficulty for the transmitter.
+
+      Let us suppose that for each segment in the retransmission queue
+      there is a (new) flag bit "ACK'd", to be used to indicate that
+      this particular segment has been entirely acknowledged.  When a
+      segment is first transmitted, it will be entered into the
+      retransmission queue with its ACK'd bit off.  If the ACK'd bit is
+      subsequently turned on (as the result of processing a received
+      SACK option), the data transmitter will skip this segment during
+      any later retransmission.  However, the segment will not be
+      dequeued and its buffer freed until the left window edge is
+      advanced over it.
+
+      When an acknowledgment segment arrives containing a SACK option,
+      the data transmitter will turn on the ACK'd bits for segments that
+      have been selectively acknowleged.  More specifically, for each
+      block in the SACK option, the data transmitter will turn on the
+      ACK'd flags for all segments in the retransmission queue that are
+      wholly contained within that block.  This requires straightforward
+      sequence number comparisons.
+
+
+4.  TCP ECHO OPTIONS
+
+   A simple method for measuring the RTT of a segment would be: the
+   sender places a timestamp in the segment and the receiver returns
+   that timestamp in the corresponding ACK segment. When the ACK segment
+   arrives at the sender, the difference between the current time and
+   the timestamp is the RTT.  To implement this timing method, the
+   receiver must simply reflect or echo selected data (the timestamp)
+   from the sender's segments.  This idea is the basis of the "TCP Echo"
+   and "TCP Echo Reply" options.
+
+
+
+
+
+Jacobson & Braden                                              [Page 11]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+4.1  TCP Echo and TCP Echo Reply Options
+
+      TCP Echo Option:
+
+      Kind: 6
+
+      Length: 6
+
+          +--------+--------+--------+--------+--------+--------+
+          | Kind=6 | Length |   4 bytes of info to be echoed    |
+          +--------+--------+--------+--------+--------+--------+
+
+   This option carries four bytes of information that the receiving TCP
+   may send back in a subsequent TCP Echo Reply option (see below).  A
+   TCP may send the TCP Echo option in any segment, but only if a TCP
+   Echo option was received in a SYN segment for the connection.
+
+   When the TCP echo option is used for RTT measurement, it will be
+   included in data segments, and the four information bytes will define
+   the time at which the data segment was transmitted in any format
+   convenient to the sender.
+
+   TCP Echo Reply Option:
+
+   Kind: 7
+
+   Length: 6
+
+       +--------+--------+--------+--------+--------+--------+
+       | Kind=7 | Length |    4 bytes of echoed info         |
+       +--------+--------+--------+--------+--------+--------+
+
+
+   A TCP that receives a TCP Echo option containing four information
+   bytes will return these same bytes in a TCP Echo Reply option.
+
+   This TCP Echo Reply option must be returned in the next segment
+   (e.g., an ACK segment) that is sent. If more than one Echo option is
+   received before a reply segment is sent, the TCP must choose only one
+   of the options to echo, ignoring the others; specifically, it must
+   choose the newest segment with the oldest sequence number (see next
+   section.)
+
+   To use the TCP Echo and Echo Reply options, a TCP must send a TCP
+   Echo option in its own SYN segment and receive a TCP Echo option in a
+   SYN segment from the other TCP.  A TCP that does not implement the
+   TCP Echo or Echo Reply options must simply ignore any TCP Echo
+   options it receives.  However, a TCP should not receive one of these
+
+
+
+Jacobson & Braden                                              [Page 12]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+   options in a non-SYN segment unless it included a TCP Echo option in
+   its own SYN segment.
+
+4.2  Using the Echo Options
+
+   If we wish to use the Echo/Echo Reply options for RTT measurement, we
+   have to define what the receiver does when there is not a one-to-one
+   correspondence between data and ACK segments.  Assuming that we want
+   to minimize the state kept in the receiver (i.e., the number of
+   unprocessed Echo options), we can plan on a receiver remembering the
+   information value from at most one Echo between ACKs.  There are
+   three situations to consider:
+
+   (A)  Delayed ACKs.
+
+        Many TCP's acknowledge only every Kth segment out of a group of
+        segments arriving within a short time interval; this policy is
+        known generally as "delayed ACK's".  The data-sender TCP must
+        measure the effective RTT, including the additional time due to
+        delayed ACK's, or else it will retransmit unnecessarily.  Thus,
+        when delayed ACK's are in use, the receiver should reply with
+        the Echo option information from the earliest unacknowledged
+        segment.
+
+   (B)  A hole in the sequence space (segment(s) have been lost).
+
+        The sender will continue sending until the window is filled, and
+        we may be generating ACKs as these out-of-order segments arrive
+        (e.g., for the SACK information or to aid "fast retransmit").
+        An Echo Reply option will tell the sender the RTT of some
+        recently sent segment (since the ACK can only contain the
+        sequence number of the hole, the sender may not be able to
+        determine which segment, but that doesn't matter).  If the loss
+        was due to congestion, these RTTs may be particularly valuable
+        to the sender since they reflect the network characteristics
+        immediately after the congestion.
+
+   (C)  A filled hole in the sequence space.
+
+        The segment that fills the hole represents the most recent
+        measurement of the network characteristics.  On the other hand,
+        an RTT computed from an earlier segment would probably include
+        the sender's retransmit time-out, badly biasing the sender's
+        average RTT estimate.
+
+
+   Case (A) suggests the receiver should remember and return the Echo
+   option information from the oldest unacknowledged segment.  Cases (B)
+
+
+
+Jacobson & Braden                                              [Page 13]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+   and (C) suggest that the option should come from the most recent
+   unacknowledged segment.  An algorithm that covers all three cases is
+   for the receiver to return the Echo option information from the
+   newest segment with the oldest sequence number, as specified earlier.
+
+   A model implementation of these options is as follows.
+
+
+   (1)  Receiver Implementation
+
+        A 32-bit slot for Echo option data, rcv.echodata, is added to
+        the receiver connection state, together with a flag,
+        rcv.echopresent, that indicates whether there is anything in the
+        slot.  When the receiver generates a segment, it checks
+        rcv.echopresent and, if it is set, adds an echo-reply option
+        containing rcv.echodata to the outgoing segment then clears
+        rcv.echopresent.
+
+        If an incoming segment is in the window and contains an echo
+        option, the receiver checks rcv.echopresent.  If it isn't set,
+        the value of the echo option is copied to rcv.echodata and
+        rcv.echopresent is set.  If rcv.echopresent is already set, the
+        receiver checks whether the segment is at the left edge of the
+        window.  If so, the segment's echo option value is copied to
+        rcv.echodata (this is situation (C) above).  Otherwise, the
+        segment's echo option is ignored.
+
+
+   (2)  Sender Implementation
+
+        The sender's connection state has a single flag bit,
+        snd.echoallowed, added.  If snd.echoallowed is set or if the
+        segment contains a SYN, the sender is free to add a TCP Echo
+        option (presumably containing the current time in some units
+        convenient to the sender) to every outgoing segment.
+
+        Snd.echoallowed should be set if a SYN is received with a TCP
+        Echo option (presumably, a host that implements the option will
+        attempt to use it to time the SYN segment).
+
+
+5.  CONCLUSIONS AND ACKNOWLEDGMENTS
+
+We have proposed five new TCP options for scaled windows, selective
+acknowledgments, and round-trip timing, in order to provide efficient
+operation over large-bandwidth*delay-product paths.  These extensions
+are designed to provide compatible interworking with TCP's that do not
+implement the extensions.
+
+
+
+Jacobson & Braden                                              [Page 14]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+The Window Scale option was originally suggested by Mike St. Johns of
+USAF/DCA.  The present form of the option was suggested by Mike Karels
+of UC Berkeley in response to a more cumbersome scheme proposed by Van
+Jacobson.  Gerd Beling of FGAN (West Germany) contributed the initial
+definition of the SACK option.
+
+All three options have evolved through discussion with the End-to-End
+Task Force, and the authors are grateful to the other members of the
+Task Force for their advice and encouragement.
+
+6.  REFERENCES
+
+      [Cheriton88]  Cheriton, D., "VMTP: Versatile Message Transaction
+      Protocol", RFC 1045, Stanford University, February 1988.
+
+      [Jain86]  Jain, R., "Divergence of Timeout Algorithms for Packet
+      Retransmissions", Proc. Fifth Phoenix Conf. on Comp. and Comm.,
+      Scottsdale, Arizona, March 1986.
+
+      [Karn87]  Karn, P. and C. Partridge, "Estimating Round-Trip Times
+      in Reliable Transport Protocols", Proc. SIGCOMM '87, Stowe, VT,
+      August 1987.
+
+      [Clark87] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A Bulk
+      Data Transfer Protocol", RFC 998, MIT, March 1987.
+
+      [Nagle84]  Nagle, J., "Congestion Control in IP/TCP
+      Internetworks", RFC 896, FACC, January 1984.
+
+      [NBS85]  Colella, R., Aronoff, R., and K. Mills, "Performance
+      Improvements for ISO Transport", Ninth Data Comm Symposium,
+      published in ACM SIGCOMM Comp Comm Review, vol. 15, no. 5,
+      September 1985.
+
+      [Partridge87]  Partridge, C., "Private Communication", February
+      1987.
+
+      [Postel81]  Postel, J., "Transmission Control Protocol - DARPA
+      Internet Program Protocol Specification", RFC 793, DARPA,
+      September 1981.
+
+      [Velten84] Velten, D., Hinden, R., and J. Sax, "Reliable Data
+      Protocol", RFC 908, BBN, July 1984.
+
+      [Jacobson88] Jacobson, V., "Congestion Avoidance and Control", to
+      be presented at SIGCOMM '88, Stanford, CA., August 1988.
+
+      [Zhang86]  Zhang, L., "Why TCP Timers Don't Work Well", Proc.
+
+
+
+Jacobson & Braden                                              [Page 15]
+
+RFC 1072          TCP Extensions for Long-Delay Paths       October 1988
+
+
+      SIGCOMM '86, Stowe, Vt., August 1986.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Jacobson & Braden                                              [Page 16]
+
--- a/kernel/picotcp/RFC/rfc1106.txt
+++ b/kernel/picotcp/RFC/rfc1106.txt
@ -0,0 +1,731 @@
+
+
+
+
+
+
+Network Working Group                                             R. Fox
+Request for Comments:  1106                                       Tandem
+                                                               June 1989
+
+
+                     TCP Big Window and Nak Options
+
+Status of this Memo
+
+   This memo discusses two extensions to the TCP protocol to provide a
+   more efficient operation over a network with a high bandwidth*delay
+   product.  The extensions described in this document have been
+   implemented and shown to work using resources at NASA.  This memo
+   describes an Experimental Protocol, these extensions are not proposed
+   as an Internet standard, but as a starting point for further
+   research.  Distribution of this memo is unlimited.
+
+Abstract
+
+   Two extensions to the TCP protocol are described in this RFC in order
+   to provide a more efficient operation over a network with a high
+   bandwidth*delay product.  The main issue that still needs to be
+   solved is congestion versus noise.  This issue is touched on in this
+   memo, but further research is still needed on the applicability of
+   the extensions in the Internet as a whole infrastructure and not just
+   high bandwidth*delay product networks.  Even with this outstanding
+   issue, this document does describe the use of these options in the
+   isolated satellite network environment to help facilitate more
+   efficient use of this special medium to help off load bulk data
+   transfers from links needed for interactive use.
+
+1.  Introduction
+
+   Recent work on TCP has shown great performance gains over a variety
+   of network paths [1].  However, these changes still do not work well
+   over network paths that have a large round trip delay (satellite with
+   a 600 ms round trip delay) or a very large bandwidth
+   (transcontinental DS3 line).  These two networks exhibit a higher
+   bandwidth*delay product, over 10**6 bits, than the 10**5 bits that
+   TCP is currently limited to.  This high bandwidth*delay product
+   refers to the amount of data that may be unacknowledged so that all
+   of the networks bandwidth is being utilized by TCP.  This may also be
+   referred to as "filling the pipe" [2] so that the sender of data can
+   always put data onto the network and the receiver will always have
+   something to read, and neither end of the connection will be forced
+   to wait for the other end.
+
+   After the last batch of algorithm improvements to TCP, performance
+
+
+
+Fox                                                             [Page 1]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   over high bandwidth*delay networks is still very poor.  It appears
+   that no algorithm changes alone will make any significant
+   improvements over high bandwidth*delay networks, but will require an
+   extension to the protocol itself.  This RFC discusses two possible
+   options to TCP for this purpose.
+
+   The two options implemented and discussed in this RFC are:
+
+   1.  NAKs
+
+      This extension allows the receiver of data to inform the sender
+      that a packet of data was not received and needs to be resent.
+      This option proves to be useful over any network path (both high
+      and low bandwidth*delay type networks) that experiences periodic
+      errors such as lost packets, noisy links, or dropped packets due
+      to congestion.  The information conveyed by this option is
+      advisory and if ignored, does not have any effect on TCP what so
+      ever.
+
+   2.  Big Windows
+
+      This option will give a method of expanding the current 16 bit (64
+      Kbytes) TCP window to 32 bits of which 30 bits (over 1 gigabytes)
+      are allowed for the receive window.  (The maximum window size
+      allowed in TCP due to the requirement of TCP to detect old data
+      versus new data.  For a good explanation please see [2].)  No
+      changes are required to the standard TCP header [6]. The 16 bit
+      field in the TCP header that is used to convey the receive window
+      will remain unchanged.  The 32 bit receive window is achieved
+      through the use of an option that contains the upper half of the
+      window.  It is this option that is necessary to fill large data
+      pipes such as a satellite link.
+
+   This RFC is broken up into the following sections: section 2 will
+   discuss the operation of the NAK option in greater detail, section 3
+   will discuss the big window option in greater detail.  Section 4 will
+   discuss other effects of the big windows and nak feature when used
+   together.  Included in this section will be a brief discussion on the
+   effects of congestion versus noise to TCP and possible options for
+   satellite networks.  Section 5 will be a conclusion with some hints
+   as to what future development may be done at NASA, and then an
+   appendix containing some test results is included.
+
+2.  NAK Option
+
+   Any packet loss in a high bandwidth*delay network will have a
+   catastrophic effect on throughput because of the simple
+   acknowledgement of TCP.  TCP always acks the stream of data that has
+
+
+
+Fox                                                             [Page 2]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   successfully been received and tells the sender the next byte of data
+   of the stream that is expected.  If a packet is lost and succeeding
+   packets arrive the current protocol has no way of telling the sender
+   that it missed one packet but received following packets.  TCP
+   currently resends all of the data over again, after a timeout or the
+   sender suspects a lost packet due to a duplicate ack algorithm [1],
+   until the receiver receives the lost packet and can then ack the lost
+   packet as well as succeeding packets received.  On a normal low
+   bandwidth*delay network this effect is minimal if the timeout period
+   is set short enough.  However, on a long delay network such as a T1
+   satellite channel this is catastrophic because by the time the lost
+   packet can be sent and the ack returned the TCP window would have
+   been exhausted and both the sender and receiver would be temporarily
+   stalled waiting for the packet and ack to fully travel the data pipe.
+   This causes the pipe to become empty and requires the sender to
+   refill the pipe after the ack is received.  This will cause a minimum
+   of 3*X bandwidth loss, where X is the one way delay of the medium and
+   may be much higher depending on the size of the timeout period and
+   bandwidth*delay product.  Its 1X for the packet to be resent, 1X for
+   the ack to be received and 1X for the next packet being sent to reach
+   the destination.  This calculation assumes that the window size is
+   much smaller than the pipe size (window = 1/2 data pipe or 1X), which
+   is the typical case with the current TCP window limitation over long
+   delay networks such as a T1 satellite link.
+
+   An attempt to reduce this wasted bandwidth from 3*X was introduced in
+   [1] by having the sender resend a packet after it notices that a
+   number of consecutively received acks completely acknowledges already
+   acknowledged data.  On a typical network this will reduce the lost
+   bandwidth to almost nil, since the packet will be resent before the
+   TCP window is exhausted and with the data pipe being much smaller
+   than the TCP window, the data pipe will not become empty and no
+   bandwidth will be lost.  On a high delay network the reduction of
+   lost bandwidth is minimal such that lost bandwidth is still
+   significant.  On a very noisy satellite, for instance, the lost
+   bandwidth is very high (see appendix for some performance figures)
+   and performance is very poor.
+
+   There are two methods of informing the sender of lost data.
+   Selective acknowledgements and NAKS.  Selective acknowledgements have
+   been the object of research in a number of experimental protocols
+   including VMTP [3], NETBLT [4], and SatFTP [5].  The idea behind
+   selective acks is that the receiver tells the sender which pieces it
+   received so that the sender can resend the data not acked but already
+   sent once.  NAKs on the other hand, tell the sender that a particular
+   packet of data needs to be resent.
+
+   There are a couple of disadvantages of selective acks.  Namely, in
+
+
+
+Fox                                                             [Page 3]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   some of the protocols mentioned above, the receiver waits a certain
+   time before sending the selective ack so that acks may be bundled up.
+   This delay can cause some wasted bandwidth and requires more complex
+   state information than the simple nak.  Even if the receiver doesn't
+   bundle up the selective acks but sends them as it notices that
+   packets have been lost, more complex state information is needed to
+   determine which packets have been acked and which packets need to be
+   resent.  With naks, only the immediate data needed to move the left
+   edge of the window is naked, thus almost completely eliminating all
+   state information.
+
+   The selective ack has one advantage over naks.  If the link is very
+   noisy and packets are being lost close together, then the sender will
+   find out about all of the missing data at once and can send all of
+   the missing data out immediately in an attempt to move the left
+   window edge in the acknowledge number of the TCP header, thus keeping
+   the data pipe flowing.  Whereas with naks, the sender will be
+   notified of lost packets one at a time and this will cause the sender
+   to process extra packets compared to selective acks.  However,
+   empirical studies has shown that most lost packets occur far enough
+   apart that the advantage of selective acks over naks is rarely seen.
+   Also, if naks are sent out as soon as a packet has been determined
+   lost, then the advantage of selective acks becomes no more than
+   possibly a more aesthetic algorithm for handling lost data, but
+   offers no gains over naks as described in this paper.  It is this
+   reason that the simplicity of naks was chosen over selective acks for
+   the current implementation.
+
+2.1  Implementation details
+
+   When the receiver of data notices a gap between the expected sequence
+   number and the actual sequence number of the packet received, the
+   receiver can assume that the data between the two sequence numbers is
+   either going to arrive late or is lost forever.  Since the receiver
+   can not distinguish between the two events a nak should be sent in
+   the TCP option field.  Naking a packet still destined to arrive has
+   the effect of causing the sender to resend the packet, wasting one
+   packets worth of bandwidth.  Since this event is fairly rare, the
+   lost bandwidth is insignificant as compared to that of not sending a
+   nak when the packet is not going to arrive.  The option will take the
+   form as follows:
+
+      +========+=========+=========================+================+
+      +option= + length= + sequence number of      + number of      +
+      +   A    +    7    +  first byte being naked + segments naked +
+      +========+=========+=========================+================+
+
+   This option contains the first sequence number not received and a
+
+
+
+Fox                                                             [Page 4]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   count of how many segments of bytes needed to be resent, where
+   segments is the size of the current TCP MSS being used for the
+   connection.  Since a nak is an advisory piece of information, the
+   sending of a nak is unreliable and no means for retransmitting a nak
+   is provided at this time.
+
+   When the sender of data receives the option it may either choose to
+   do nothing or it will resend the missing data immediately and then
+   continue sending data where it left off before receiving the nak.
+   The receiver will keep track of the last nak sent so that it will not
+   repeat the same nak.  If it were to repeat the same nak the protocol
+   could get into the mode where on every reception of data the receiver
+   would nak the first missing data frame.  Since the data pipe may be
+   very large by the time the first nak is read and responded to by the
+   sender, many naks would have been sent by the receiver.  Since the
+   sender does not know that the naks are repetitious it will resend the
+   data each time, thus wasting the network bandwidth with useless
+   retransmissions of the same piece of data.  Having an unreliable nak
+   may result in a nak being damaged and not being received by the
+   sender, and in this case, we will let the tcp recover by its normal
+   means.  Empirical data has shown that the likelihood of the nak being
+   lost is quite small and thus, this advisory nak option works quite
+   well.
+
+3.  Big Window Option
+
+   Currently TCP has a 16 bit window limitation built into the protocol.
+   This limits the amount of outstanding unacknowledged data to 64
+   Kbytes.  We have already seen that some networks have a pipe larger
+   than 64 Kbytes.  A T1 satellite channel and a cross country DS3
+   network with a 30ms delay have data pipes much larger than 64 Kbytes.
+   Thus, even on a perfectly conditioned link with no bandwidth wasted
+   due to errors, the data pipe will not be filled and bandwidth will be
+   wasted.  What is needed is the ability to send more unacknowledged
+   data.  This is achieved by having bigger windows, bigger than the
+   current limitation of 16 bits.  This option to expands the window
+   size to 30 bits or over 1 gigabytes by literally expanding the window
+   size mechanism currently used by TCP.  The added option contains the
+   upper 15 bits of the window while the lower 16 bits will continue to
+   go where they normally go [6] in the TCP header.
+
+   A TCP session will use the big window options only if both sides
+   agree to use them, otherwise the option is not used and the normal 16
+   bit windows will be used.  Once the 2 sides agree to use the big
+   windows then every packet thereafter will be expected to contain the
+   window option with the current upper 15 bits of the window.  The
+   negotiation to decide whether or not to use the bigger windows takes
+   place during the SYN and SYN ACK segments of the TCP connection
+
+
+
+Fox                                                             [Page 5]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   startup process.  The originator of the connection will include in
+   the SYN segment the following option:
+
+                    1 byte    1 byte      4 bytes
+              +=========+==========+===============+
+              +option=B + length=6 + 30 bit window +
+              +=========+==========+===============+
+
+
+   If the other end of the connection wants to use big windows it will
+   include the same option back in the SYN ACK segment that it must
+   send.  At this point, both sides have agreed to use big windows and
+   the specified windows will be used.  It should be noted that the SYN
+   and SYN ACK segments will use the small windows, and once the big
+   window option has been negotiated then the bigger windows will be
+   used.
+
+   Once both sides have agreed to use 32 bit windows the protocol will
+   function just as it did before with no difference in operation, even
+   in the event of lost packets.  This claim holds true since the
+   rcv_wnd and snd_wnd variables of tcp contain the 16 bit windows until
+   the big window option is negotiated and then they are replaced with
+   the appropriate 32 bit values.  Thus, the use of big windows becomes
+   part of the state information kept by TCP.
+
+   Other methods of expanding the windows have been presented, including
+   a window multiple [2] or streaming [5], but this solution is more
+   elegant in the sense that it is a true extension of the window that
+   one day may easily become part of the protocol and not just be an
+   option to the protocol.
+
+3.1  How does it work
+
+   Once a connection has decided to use big windows every succeeding
+   packet must contain the following option:
+
+        +=========+==========+==========================+
+        +option=C + length=4 + upper 15 bits of rcv_wnd +
+        +=========+==========+==========================+
+
+   With all segments sent, the sender supplies the size of its receive
+   window.  If the connection is only using 16 bits then this option is
+   not supplied, otherwise the lower 16 bits of the receive window go
+   into the tcp header where it currently resides [6] and the upper 15
+   bits of the window is put into the data portion of the option C.
+   When the receiver processes the packet it must first reform the
+   window and then process the packet as it would in the absence of the
+   option.
+
+
+
+Fox                                                             [Page 6]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+3.2  Impact of changes
+
+   In implementing the first version of the big window option there was
+   very little change required to the source.  State information must be
+   added to the protocol to determine if the big window option is to be
+   used and all 16 bit variables that dealt with window information must
+   now become 32 bit quantities.  A future document will describe in
+   more detail the changes required to the 4.3 bsd tcp source code.
+   Test results of the window change only are presented in the appendix.
+   When expanding 16 bit quantities to 32 bit quantities in the TCP
+   control block in the source (4.3 bsd source) may cause the structure
+   to become larger than the mbuf used to hold the structure.  Care must
+   be taken to insure this doesn't occur with your system or
+   undetermined events may take place.
+
+4.  Effects of Big Windows and Naks when used together
+
+   With big windows alone, transfer times over a satellite were quite
+   impressive with the absence of any introduced errors.  However, when
+   an error simulator was used to create random errors during transfers,
+   performance went down extremely fast.  When the nak option was added
+   to the big window option performance in the face of errors went up
+   some but not to the level that was expected.  This section will
+   discuss some issues that were overcome to produce the results given
+   in the appendix.
+
+4.1  Window Size and Nak benefits
+
+   With out errors, the window size required to keep the data pipe full
+   is equal to the round trip delay * throughput desired, or the data
+   pipe bandwidth (called Z from now on).  This and other calculations
+   assume that processing time of the hosts is negligible.  In the event
+   of an error (without NAKs), the window size needs to become larger
+   than Z in order to keep the data pipe full while the sender is
+   waiting for the ack of the resent packet.  If the window size is
+   equaled to Z and we assume that the retransmission timer is equaled
+   to Z, then when a packet is lost, the retransmission timer will go
+   off as the last piece of data in the window is sent.  In this case,
+   the lost piece of data can be resent with no delay.  The data pipe
+   will empty out because it will take 1/2Z worth of data to get the ack
+   back to the sender, an additional 1/2Z worth of data to get the data
+   pipe refilled with new data.  This causes the required window to be
+   2Z, 1Z to keep the data pipe full during normal operations and 1Z to
+   keep the data pipe full while waiting for a lost packet to be resent
+   and acked.
+
+   If the same scenario in the last paragraph is used with the addition
+   of NAKs, the required window size still needs to be 2Z to avoid
+
+
+
+Fox                                                             [Page 7]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   wasting any bandwidth in the event of a dropped packet.  This appears
+   to mean that the nak option does not provide any benefits at all.
+   Testing showed that the retransmission timer was larger than the data
+   pipe and in the event of errors became much bigger than the data
+   pipe, because of the retransmission backoff.  Thus, the nak option
+   bounds the required window to 2Z such that in the event of an error
+   there is no lost bandwidth, even with the retransmission timer
+   fluctuations.  The results in the appendix shows that by using naks,
+   bandwidth waste associated with the retransmission timer facility is
+   eliminated.
+
+4.2  Congestions vs Noise
+
+   An issue that must be looked at when implementing both the NAKs and
+   big window scheme together is in the area of congestion versus lost
+   packets due to the medium, or noise.  In the recent algorithm
+   enhancements [1], slow start was introduced so that whenever a data
+   transfer is being started on a connection or right after a dropped
+   packet, the effective send window would be set to a very small size
+   (typically would equal the MSS being used).  This is done so that a
+   new connection would not cause congestion by immediately overloading
+   the network, and so that an existing connection would back off the
+   network if a packet was dropped due to congestion and allow the
+   network to clear up.  If a connection using big windows loses a
+   packet due to the medium (a packet corrupted by an error) the last
+   thing that should be done is to close the send window so that the
+   connection can only send 1 packet and must use the slow start
+   algorithm to slowly work itself back up to sending full windows worth
+   of data.  This algorithm would quickly limit the usefulness of the
+   big window and nak options over lossy links.
+
+   On the other hand, if a packet was dropped due to congestion and the
+   sender assumes the packet was dropped because of noise the sender
+   will continue sending large amounts of data.  This action will cause
+   the congestion to continue, more packets will be dropped, and that
+   part of the network will collapse.  In this instance, the sender
+   would want to back off from sending at the current window limit.
+   Using the current slow start mechanism over a satellite builds up the
+   window too slowly [1].  Possibly a better solution would be for the
+   window to be opened 2*Rlog2(W) instead of R*log2(W) [1] (open window
+   by 2 packets instead of 1 for each acked packet).  This will reduce
+   the wasted bandwidth by opening the window much quicker while giving
+   the network a chance to clear up.  More experimentation is necessary
+   to find the optimal rate of opening the window, especially when large
+   windows are being used.
+
+   The current recommendation for TCP is to use the slow start mechanism
+   in the event of any lost packet.  If an application knows that it
+
+
+
+Fox                                                             [Page 8]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   will be using a satellite with a high error rate, it doesn't make
+   sense to force it to use the slow start mechanism for every dropped
+   packet.  Instead, the application should be able to choose what
+   action should happen in the event of a lost packet.  In the BSD
+   environment, a setsockopt call should be provided so that the
+   application may inform TCP to handle lost packets in a special way
+   for this particular connection.  If the known error rate of a link is
+   known to be small, then by using slow start with modified rate from
+   above, will cause the amount of bandwidth loss to be very small in
+   respect to the amount of bandwidth actually utilized.  In this case,
+   the setsockopt call should not be used.  What is really needed is a
+   way for a host to determine if a packet or packets are being dropped
+   due to congestion or noise.  Then, the host can choose to do the
+   right thing.  This will require a mechanism like source quench to be
+   used.  For this to happen more experimentation is necessary to
+   determine a solid definition on the use of this mechanism.  Now it is
+   believed by some that using source quench to avoid congestion only
+   adds to the problem, not help suppress it.
+
+   The TCP used to gather the results in the appendix for the big window
+   with nak experiment, assumed that lost packets were the result of
+   noise and not congestion.  This assumption was used to show how to
+   make the current TCP work in such an environment.  The actual
+   satellite used in the experiment (when the satellite simulator was
+   not used) only experienced an error rate around 10e-10.  With this
+   error rate it is suggested that in practice when big windows are used
+   over the link, TCP should use the slow start mechanism for all lost
+   packets with the 2*Rlog2(W) rate discussed above.  Under most
+   situations when long delay networks are being used (transcontinental
+   DS3 networks using fiber with very low error rates, or satellite
+   links with low error rates) big windows and naks should be used with
+   the assumption that lost packets are the result of congestion until a
+   better algorithm is devised [7].
+
+   Another problem noticed, while testing the affects of slow start over
+   a satellite link, was at times, the retransmission timer was set so
+   restrictive, that milliseconds before a naked packet's ack is
+   received the retransmission timer would go off due to a timed packet
+   within the send window.  The timer was set at the round trip delay of
+   the network allowing no time for packet processing.  If this timer
+   went off due to congestion then backing off is the right thing to do,
+   otherwise to avoid the scenario discovered by experimentation, the
+   transmit timer should be set a little longer so that the
+   retransmission timer does not go off too early.  Care must be taken
+   to make sure the right thing is done in the implementation in
+   question so that a packet isn't retransmitted too soon, and blamed on
+   congestion when in fact, the ack is on its way.
+
+
+
+
+Fox                                                             [Page 9]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+4.3  Duplicate Acks
+
+   Another problem found with the 4.3bsd implementation is in the area
+   of duplicate acks.  When the sender of data receives a certain number
+   of acks (3 in the current Berkeley release) that acknowledge
+   previously acked data before, it then assumes that a packet has been
+   lost and will resend the one packet assumed lost, and close its send
+   window as if the network is congested and the slow start algorithm
+   mention above will be used to open the send window.  This facility is
+   no longer needed since the sender can use the reception of a nak as
+   its indicator that a particular packet was dropped.  If the nak
+   packet is lost then the retransmit timer will go off and the packet
+   will be retransmitted by normal means.  If a senders algorithm
+   continues to count duplicate acks the sender will find itself
+   possibly receiving many duplicate acks after it has already resent
+   the packet due to a nak being received because of the large size of
+   the data pipe.  By receiving all of these duplicate acks the sender
+   may find itself doing nothing but resending the same packet of data
+   unnecessarily while keeping the send window closed for absolutely no
+   reason.  By removing this feature of the implementation a user can
+   expect to find a satellite connection working much better in the face
+   of errors and other connections should not see any performance loss,
+   but a slight improvement in performance if anything at all.
+
+5.  Conclusion
+
+   This paper has described two new options that if used will make TCP a
+   more efficient protocol in the face of errors and a more efficient
+   protocol over networks that have a high bandwidth*delay product
+   without decreasing performance over more common networks.  If a
+   system that implements the options talks with one that does not, the
+   two systems should still be able to communicate with no problems.
+   This assumes that the system doesn't use the option numbers defined
+   in this paper in some other way or doesn't panic when faced with an
+   option that the machine does not implement.  Currently at NASA, there
+   are many machines that do not implement either option and communicate
+   just fine with the systems that do implement them.
+
+   The drive for implementing big windows has been the direct result of
+   trying to make TCP more efficient over large delay networks [2,3,4,5]
+   such as a T1 satellite.  However, another practical use of large
+   windows is becoming more apparent as the local area networks being
+   developed are becoming faster and supporting much larger MTU's.
+   Hyperchannel, for instances, has been stated to be able to support 1
+   Mega bit MTU's in their new line of products.  With the current
+   implementation of TCP, efficient use of hyperchannel is not utilized
+   as it should because the physical mediums MTU is larger than the
+   maximum window of the protocol being used.  By increasing the TCP
+
+
+
+Fox                                                            [Page 10]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   window size, better utilization of networks like hyperchannel will be
+   gained instantly because the sender can send 64 Kbyte packets (IP
+   limitation) but not have to operate in a stop and wait fashion.
+   Future work is being started to increase the IP maximum datagram size
+   so that even better utilization of fast local area networks will be
+   seen by having the TCP/IP protocols being able to send large packets
+   over mediums with very large MTUs.  This will hopefully, eliminate
+   the network protocol as the bottleneck in data transfers while
+   workstations and workstation file system technology advances even
+   more so, than it already has.
+
+   An area of concern when using the big window mechanism is the use of
+   machine resources.  When running over a satellite and a packet is
+   dropped such that 2Z (where Z is the round trip delay) worth of data
+   is unacknowledged, both ends of the connection need to be able to
+   buffer the data using machine mbufs (or whatever mechanism the
+   machine uses), usually a valuable and scarce commodity.  If the
+   window size is not chosen properly, some machines will crash when the
+   memory is all used up, or it will keep other parts of the system from
+   running.  Thus, setting the window to some fairly large arbitrary
+   number is not a good idea, especially on a general purpose machine
+   where many users log on at any time.  What is currently being
+   engineered at NASA is the ability for certain programs to use the
+   setsockopt feature or 4.3bsd asking to use big windows such that the
+   average user may not have access to the large windows, thus limiting
+   the use of big windows to applications that absolutely need them and
+   to protect a valuable system resource.
+
+6.  References
+
+  [1]  Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 88,
+       Stanford, Ca., August 1988.
+
+  [2]  Jacobson, V., and R. Braden, "TCP Extensions for Long-Delay
+       Paths", LBL, USC/Information Sciences Institute, RFC 1072,
+       October 1988.
+
+  [3]  Cheriton, D., "VMTP: Versatile Message Transaction Protocol", RFC
+       1045, Stanford University, February 1988.
+
+  [4]  Clark, D., M. Lambert, and L. Zhang, "NETBLT: A Bulk Data
+       Transfer Protocol", RFC 998, MIT, March 1987.
+
+  [5]  Fox, R., "Draft of Proposed Solution for High Delay Circuit File
+       Transfer", GE/NAS Internal Document, March 1988.
+
+  [6]  Postel, J., "Transmission Control Protocol -  DARPA Internet
+       Program Protocol Specification",  RFC 793, DARPA, September 1981.
+
+
+
+Fox                                                            [Page 11]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+  [7]  Leiner, B., "Critical Issues in High Bandwidth Networking", RFC
+       1077, DARPA, November 1989.
+
+7.  Appendix
+
+   Both options have been implemented and tested.  Contained in this
+   section is some performance gathered to support the use of these two
+   options.  The satellite channel used was a 1.544 Mbit link with a
+   580ms round trip delay.  All values are given as units of bytes.
+
+
+   TCP with Big Windows, No Naks:
+
+
+               |---------------transfer rates----------------------|
+   Window Size |  no error  |  10e-7 error rate | 10e-6 error rate |
+   -----------------------------------------------------------------
+     64K       |   94K      |      53K          |      14K         |
+   -----------------------------------------------------------------
+     72K       |   106K     |      51K          |      15K         |
+   -----------------------------------------------------------------
+     80K       |   115K     |      42K          |      14K         |
+   -----------------------------------------------------------------
+     92K       |   115K     |      43K          |      14K         |
+    -----------------------------------------------------------------
+     100K      |   135K     |      66K          |      15K         |
+   -----------------------------------------------------------------
+     112K      |   126K     |      53K          |      17K         |
+   -----------------------------------------------------------------
+     124K      |   154K     |      45K          |      14K         |
+   -----------------------------------------------------------------
+     136K      |   160K     |      66K          |      15K         |
+   -----------------------------------------------------------------
+     156K      |   167K     |      45K          |      14K         |
+   -----------------------------------------------------------------
+                                Figure 1.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Fox                                                            [Page 12]
+
+RFC 1106             TCP Big Window and Nak Options            June 1989
+
+
+   TCP with Big Windows, and Naks:
+
+
+               |---------------transfer rates----------------------|
+   Window Size |  no error  |  10e-7 error rate | 10e-6 error rate |
+   -----------------------------------------------------------------
+     64K       |   95K      |      83K          |      43K         |
+   -----------------------------------------------------------------
+     72K       |   104K     |      87K          |      49K         |
+   -----------------------------------------------------------------
+     80K       |   117K     |      96K          |      62K         |
+   -----------------------------------------------------------------
+     92K       |   124K     |      119K         |      39K         |
+   -----------------------------------------------------------------
+     100K      |   140K     |      124K         |      35K         |
+   -----------------------------------------------------------------
+     112K      |   151K     |      126K         |      53K         |
+   -----------------------------------------------------------------
+     124K      |   160K     |      140K         |      36K         |
+   -----------------------------------------------------------------
+     136K      |   167K     |      148K         |      38K         |
+   -----------------------------------------------------------------
+     156K      |   167K     |      160K         |      38K         |
+   -----------------------------------------------------------------
+                                Figure 2.
+
+   With a 10e-6 error rate, many naks as well as data packets were
+   dropped, causing the wild swing in transfer times.  Also, please note
+   that the machines used are SGI Iris 2500 Turbos with the 3.6 OS with
+   the new TCP enhancements.  The performance associated with the Irises
+   are slower than a Sun 3/260, but due to some source code restrictions
+   the Iris was used.  Initial results on the Sun showed slightly higher
+   performance and less variance.
+
+Author's Address
+
+   Richard Fox
+   950 Linden #208
+   Sunnyvale, Cal, 94086
+
+   EMail: rfox@tandem.com
+
+
+
+
+
+
+
+
+
+
+Fox                                                            [Page 13]
+
--- a/kernel/picotcp/RFC/rfc1110.txt
+++ b/kernel/picotcp/RFC/rfc1110.txt
@ -0,0 +1,171 @@
+
+
+
+
+
+
+Network Working Group                                        A. McKenzie
+Request for Comments: 1110                                       BBN STC
+                                                             August 1989
+
+
+                A Problem with the TCP Big Window Option
+
+Status of this Memo
+
+   This memo comments on the TCP Big Window option described in RFC
+   1106.  Distribution of this memo is unlimited.
+
+Abstract
+
+   The TCP Big Window option discussed in RFC 1106 will not work
+   properly in an Internet environment which has both a high bandwidth *
+   delay product and the possibility of disordering and duplicating
+   packets.  In such networks, the window size must not be increased
+   without a similar increase in the sequence number space.  Therefore,
+   a different approach to big windows should be taken in the Internet.
+
+Discussion
+
+   TCP was designed to work in a packet store-and-forward environment
+   characterized by the possibility of packet loss, packet disordering,
+   and packet duplication.  Packet loss can occur, for example, by a
+   congested network element discarding a packet.  Packet disordering
+   can occur, for example, by packets of a TCP connection being
+   arbitrarily transmitted partially over a low bandwidth terrestrial
+   path and partially over a high bandwidth satellite path.  Packet
+   duplication can occur, for example, when two directly-connected
+   network elements use a reliable link protocol and the link goes down
+   after the receiver correctly receives a packet but before the
+   transmitter receives an acknowledgement for the packet; the
+   transmitter and receiver now each take responsibility for attempting
+   to deliver the same packet to its ultimate destination.
+
+   TCP has the task of recreating at the destination an exact copy of
+   the data stream generated at the source, in the same order and with
+   no gaps or duplicates.  The mechanism used to accomplish this task is
+   to assign a "unique" sequence number to each byte of data at its
+   source, and to sort the bytes at the destination according to the
+   sequence number.  The sorting operation corrects any disordering.  An
+   acknowledgement, timeout, and retransmission scheme corrects for data
+   loss.  The uniqueness of the sequence number corrects for data
+   duplication.
+
+   As a practical matter, however, the sequence number is not unique; it
+
+
+
+McKenzie                                                        [Page 1]
+
+RFC 1110           Comments on TCP Big Window Option         August 1989
+
+
+   is contained in a 32-bit field and therefore "wraps around" after the
+   transmission of 2**32 bytes of data.  Two additional mechanisms are
+   used to insure the effective uniqueness of sequence numbers; these
+   are the TCP transmission window and bounds on packet lifetime within
+   the Internet, including the IP Time-to-Live (TTL).  The transmission
+   window specifies the maximum number of bytes which may be sent by the
+   source in one source-destination roundtrip time.  Since the TCP
+   transmission window is specified by 16 bits, which is 1/65536 of the
+   sequence number space, a sequence number will not be reused (used to
+   number another byte) for 65,536 roundtrip times.  So long as the
+   combination of gateway action on the IP TTL and holding times within
+   the individual networks which interconnect the gateways do not allow
+   a packet's lifetime to exceed 65,536 roundtrip times, each sequence
+   number is effectively unique.  It was believed by the TCP designers
+   that the networks and gateways forming the internet would meet this
+   constraint, and such has been the case.
+
+   The proposed TCP Big Window option, as described in RFC 1106, expands
+   the size of the window specification to 30 bits, while leaving the
+   sequence number space unchanged.  Thus, a sequence number can be
+   reused after 4 roundtrip times.  Further, the Nak option allows a
+   packet to be retransmitted (i.e., potentially duplicated) by the
+   source after only one roundtrip time.  Thus, if a packet becomes
+   "lost" in the Internet for only about 5 roundtrip times it may be
+   delivered when its sequence number again lies within the window,
+   albeit a later cycle of the window.  In this case, TCP will not
+   necessarily recreate at the destination an exact copy of the data
+   stream generated at the source; it may replace some data with earlier
+   data.
+
+   Of course, the problem described above results from the storage of
+   the "lost" packet within the net, and its subsequent out-of-order
+   delivery.  RFC 1106 seems to describe use of the proposed options in
+   an isolated satellite network.  We may hypothesize that this network
+   is memoryless, and thus cannot deliver packets out of order; it
+   either delivers a packet in order or loses it.  If this is the case,
+   then there is no problem with the proposed options.  The Internet,
+   however, can deliver packets out of order, and this will likely
+   continue to be true even if gigabit links become part of the
+   Internet.  Therefore, the approach described in RFC 1106 cannot be
+   adopted for general Internet use.
+
+
+
+
+
+
+
+
+
+
+McKenzie                                                        [Page 2]
+
+RFC 1110           Comments on TCP Big Window Option         August 1989
+
+
+Author's Address
+
+   Alex McKenzie
+   Bolt Beranek and Newman Inc.
+   10 Moulton Street
+   Cambridge, MA 02238
+
+   Phone: (617) 873-2962
+
+   EMail: MCKENZIE@BBN.COM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McKenzie                                                        [Page 3]
+
--- a/kernel/picotcp/RFC/rfc1122.txt
+++ b/kernel/picotcp/RFC/rfc1122.txt
--- a/kernel/picotcp/RFC/rfc1123.txt
+++ b/kernel/picotcp/RFC/rfc1123.txt
--- a/kernel/picotcp/RFC/rfc1146.txt
+++ b/kernel/picotcp/RFC/rfc1146.txt
@ -0,0 +1,283 @@
+
+
+
+
+
+
+Network Working Group                                          J. Zweig
+Request for Comments: 1146                                         UIUC
+Obsoletes: RFC 1145                                        C. Partridge
+                                                                    BBN
+                                                             March 1990
+
+
+                     TCP Alternate Checksum Options
+
+Status of This Memo
+
+   This memo suggests a pair of TCP options to allow use of alternate
+   data checksum algorithms in the TCP header.  The use of these options
+   is experimental, and not recommended for production use.
+
+   Note:  This RFC corrects errors introduced in the editing process in
+   RFC 1145.
+
+   Distribution of this memo is unlimited.
+
+Introduction
+
+   Some members of the networking community have expressed interest in
+   using checksum-algorithms with different error detection and
+   correction properties than the standard TCP checksum.  The option
+   described in this memo provides a mechanism to negotiate the use of
+   an alternate checksum at connection-establishment time, as well as a
+   mechanism to carry additional checksum information for algorithms
+   that utilize checksums that are longer than 16 bits.
+
+Definition of the Options
+
+   The TCP Alternate Checksum Request Option may be sent in a SYN
+   segment by a TCP to indicate that the TCP is prepared to both
+   generate and receive checksums based on an alternate algorithm.
+   During communication, the alternate checksum replaces the regular TCP
+   checksum in the checksum field of the TCP header.  Should the
+   alternate checksum require more than 2 octets to transmit, the
+   checksum may either be moved into a TCP Alternate Checksum Data
+   Option and the checksum field of the TCP header be sent as 0, or the
+   data may be split between the header field and the option.  Alternate
+   checksums are computed over the same data as the regular TCP checksum
+   (see TCP Alternate Checksum Data Option discussion below).
+
+TCP Alternate Checksum Request Option
+
+   The format of the TCP Alternate Checksum Request Option is:
+
+
+
+
+Zweig & Partridge                                               [Page 1]
+
+RFC 1146             TCP Alternate Checksum Options           March 1990
+
+
+                 +----------+----------+----------+
+                 |  Kind=14 | Length=3 |  chksum  |
+                 +----------+----------+----------+
+
+   Here chksum is a number identifying the type of checksum to be used.
+
+   The currently defined values of chksum are:
+
+                   0  -- TCP checksum
+                   1  -- 8-bit  Fletcher's algorithm (see Appendix I)
+                   2  -- 16-bit Fletcher's algorithm (see Appendix II)
+
+   Note that the 8-bit Fletcher algorithm gives a 16-bit checksum and
+   the 16-bit algorithm gives a 32-bit checksum.
+
+   Alternate checksum negotiation proceeds as follows:
+
+      A SYN segment used to originate a connection may contain the
+      Alternate Checksum Request Option, which specifies an alternate
+      checksum-calculation algorithm to be used for the connection.  The
+      acknowledging SYN-ACK segment may also carry the option.
+
+      If both SYN segments carry the Alternate Checksum Request option,
+      and both specify the same algorithm, that algorithm must be used
+      for the remainder of the connection.  Otherwise, the standard TCP
+      checksum algorithm must be used for the entire connection.  Thus,
+      for example, if one TCP specifies type 1 checksums, and the other
+      specifies type 2 checksums, then they will use type 0 (the regular
+      TCP checksum).  Note that in practice, one TCP will typically be
+      responding to the other's SYN, and thus either accepting or
+      rejecting the proposed alternate checksum algorithm.
+
+      Any segment with the SYN bit set must always use the standard TCP
+      checksum algorithm.  Thus the SYN segment will always be
+      understood by the receiving TCP.  The alternate checksum must not
+      be used until the first non-SYN segment.  In addition, because RST
+      segments may also be received or sent without complete state
+      information, any segment with the RST bit set must use the
+      standard TCP checksum.
+
+      The option may not be sent in any segment that does not have the
+      SYN bit set.
+
+      An implementation of TCP which does not support the option should
+      silently ignore it (as RFC 1122 requires).  Ignoring the option
+      will force any TCP attempting to use an alternate checksum to use
+      the standard TCP checksum algorithm, thus ensuring
+      interoperability.
+
+
+
+Zweig & Partridge                                               [Page 2]
+
+RFC 1146             TCP Alternate Checksum Options           March 1990
+
+
+TCP Alternate Checksum Data Option
+
+   The format of the TCP Alternate Checksum Data Option is:
+
+                +---------+---------+---------+     +---------+
+                | Kind=15 |Length=N |  data   | ... |  data   |
+                +---------+---------+---------+     +---------+
+
+   This field is used only when the alternate checksum that is
+   negotiated is longer than 16 bits.  These checksums will not fit in
+   the checksum field of the TCP header and thus at least part of them
+   must be put in an option.  Whether the checksum is split between the
+   checksum field in the TCP header and the option or the entire
+   checksum is placed in the option is determined on a checksum by
+   checksum basis.
+
+   The length of this option will depend on the choice of alternate
+   checksum algorithm for this connection.
+
+   While computing the alternate checksum, the TCP checksum field and
+   the data portion TCP Alternate Checksum Data Option are replaced with
+   zeros.
+
+   An otherwise acceptable segment carrying this option on a connection
+   using a 16-bit checksum algorithm, or carrying this option with an
+   inappropriate number of data octets for the chosen alternate checksum
+   algorithm is in error and must be discarded; a RST-segment must be
+   generated, and the connection aborted.
+
+   Note the requirement above that RST and SYN segments must always use
+   the standard TCP checksum.
+
+APPENDIX I:  The 8-bit Fletcher Checksum Algorithm
+
+   The 8-bit Fletcher Checksum Algorithm is calculated over a sequence
+   of data octets (call them D[1] through D[N]) by maintaining 2
+   unsigned 1's-complement 8-bit accumulators A and B whose contents are
+   initially zero, and performing the following loop where i ranges from
+   1 to N:
+
+           A := A + D[i]
+           B := B + A
+
+   It can be shown that at the end of the loop A will contain the 8-bit
+   1's complement sum of all octets in the datagram, and that B will
+   contain (N)D[1] + (N-1)D[2] + ... + D[N].
+
+   The octets covered by this algorithm should be the same as those over
+
+
+
+Zweig & Partridge                                               [Page 3]
+
+RFC 1146             TCP Alternate Checksum Options           March 1990
+
+
+   which the standard TCP checksum calculation is performed, with the
+   pseudoheader being D[1] through D[12] and the TCP header beginning at
+   D[13].  Note that, for purposes of the checksum computation, the
+   checksum field itself must be equal to zero.
+
+   At the end of the loop, the A goes in the first byte of the TCP
+   checksum and B goes in the second byte.
+
+   Note that, unlike the OSI version of the Fletcher checksum, this
+   checksum does not adjust the check bytes so that the receiver
+   checksum is 0.
+
+   There are a number of much faster algorithms for calculating the two
+   octets of the 8-bit Fletcher checksum.  For more information see
+   [Sklower89], [Nakassis88] and [Fletcher82].  Naturally, any
+   computation which computes the same number as would be calculated by
+   the loop above may be used to calculate the checksum.  One advantage
+   of the Fletcher algorithms over the standard TCP checksum algorithm
+   is the ability to detect the transposition of octets/words of any
+   size within a datagram.
+
+APPENDIX II:  The 16-bit Fletcher Checksum Algorithm
+
+   The 16-bit Fletcher Checksum algorithm proceeds in precisely the same
+   manner as the 8-bit checksum algorithm,, except that A, B and the
+   D[i] are 16-bit quantities.  It is necessary (as it is with the
+   standard TCP checksum algorithm) to pad a datagram containing an odd
+   number of octets with a zero octet.
+
+   Result A should be placed in the TCP header checksum field and Result
+   B should appear in an TCP Alternate Checksum Data option.  This
+   option must be present in every TCP header. The two bytes reserved
+   for B should be set to zero during the calculation of the checksum.
+
+   The checksum field of the TCP header shall contain the contents of A
+   at the end of the loop.  The TCP Alternate Checksum Data option must
+   be present and contain the contents of B at the end of the loop.
+
+BIBLIOGRAPHY:
+
+   [BrBoPa89]     Braden, R., Borman, D., and C. Partridge, "Computing
+                  the Internet Checksum", ACM Computer Communication
+                  Review, Vol. 19, No. 2, pp. 86-101, April 1989.
+                  [Note that this includes Plummer, W. "IEN-45: TCP
+                  Checksum Function Design" (1978) as an appendix.]
+
+   [Fletcher82]   Fletcher, J., "An Arithmetic Checksum for Serial
+                  Transmissions", IEEE Transactions on Communication,
+
+
+
+Zweig & Partridge                                               [Page 4]
+
+RFC 1146             TCP Alternate Checksum Options           March 1990
+
+
+                  Vol. COM-30, No. 1, pp. 247-252, January 1982.
+
+   [Nakassis88]   Nakassis, T., "Fletcher's Error Detection Algorithm:
+                  How to implement it efficiently and how to avoid the
+                  most common pitfalls", ACM Computer Communication
+                  Review, Vol. 18, No. 5, pp. 86-94, October 1988.
+
+   [Sklower89]    Sklower, K., "Improving the Efficiency of the OSI
+                  Checksum Calculation", ACM Computer Communication
+                  Review, Vol. 19, No. 5, pp. 32-43, October 1989.
+
+Security Considerations
+
+   Security issues are not addressed in this memo.
+
+Authors' Addresses
+
+   Johnny Zweig
+   Digital Computer Lab
+   University of Illinois (UIUC)
+   1304 West Springfield Avenue
+   CAMPUS MC 258
+   Urbana, IL 61801
+
+   Phone:  (217) 333-7937
+
+   EMail:  zweig@CS.UIUC.EDU
+
+
+   Craig Partridge
+   Bolt Beranek and Newman Inc.
+   50 Moulton Street
+   Cambridge, MA 02138
+
+   Phone: (617) 873-2459
+
+   EMail: craig@BBN.COM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zweig & Partridge                                               [Page 5]
+
--- a/kernel/picotcp/RFC/rfc1156.txt
+++ b/kernel/picotcp/RFC/rfc1156.txt
--- a/kernel/picotcp/RFC/rfc1180.txt
+++ b/kernel/picotcp/RFC/rfc1180.txt
--- a/kernel/picotcp/RFC/rfc1185.txt
+++ b/kernel/picotcp/RFC/rfc1185.txt
--- a/kernel/picotcp/RFC/rfc1213.txt
+++ b/kernel/picotcp/RFC/rfc1213.txt
--- a/kernel/picotcp/RFC/rfc1263.txt
+++ b/kernel/picotcp/RFC/rfc1263.txt
--- a/kernel/picotcp/RFC/rfc1323.txt
+++ b/kernel/picotcp/RFC/rfc1323.txt
--- a/kernel/picotcp/RFC/rfc1332.txt
+++ b/kernel/picotcp/RFC/rfc1332.txt
@ -0,0 +1,787 @@
+
+
+
+
+
+
+Network Working Group                                        G. McGregor
+Request for Comments: 1332                                         Merit
+Obsoletes: RFC 1172                                             May 1992
+
+
+
+           The PPP Internet Protocol Control Protocol (IPCP)
+
+
+
+Status of this Memo
+
+   This RFC specifies an IAB standards track protocol for the Internet
+   community, and requests discussion and suggestions for improvements.
+   Please refer to the current edition of the "IAB Official Protocol
+   Standards" for the standardization state and status of this protocol.
+   Distribution of this memo is unlimited.
+
+Abstract
+
+   The Point-to-Point Protocol (PPP) [1] provides a standard method of
+   encapsulating Network Layer protocol information over point-to-point
+   links.  PPP also defines an extensible Link Control Protocol, and
+   proposes a family of Network Control Protocols (NCPs) for
+   establishing and configuring different network-layer protocols.
+
+   This document defines the NCP for establishing and configuring the
+   Internet Protocol [2] over PPP, and a method to negotiate and use Van
+   Jacobson TCP/IP header compression [3] with PPP.
+
+   This RFC is a product of the Point-to-Point Protocol Working Group of
+   the Internet Engineering Task Force (IETF).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                        [Page i]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+Table of Contents
+
+
+     1.     Introduction ..........................................    1
+
+     2.     A PPP Network Control Protocol (NCP) for IP ...........    2
+        2.1       Sending IP Datagrams ............................    2
+
+     3.     IPCP Configuration Options ............................    4
+        3.1       IP-Addresses ....................................    5
+        3.2       IP-Compression-Protocol .........................    6
+        3.3       IP-Address ......................................    8
+
+     4.     Van Jacobson TCP/IP header compression ................    9
+        4.1       Configuration Option Format .....................    9
+
+     APPENDICES ...................................................   11
+
+     A.     IPCP Recommended Options ..............................   11
+
+     SECURITY CONSIDERATIONS ......................................   11
+
+     REFERENCES ...................................................   11
+
+     ACKNOWLEDGEMENTS .............................................   11
+
+     CHAIR'S ADDRESS ..............................................   12
+
+     AUTHOR'S ADDRESS .............................................   12
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                       [Page ii]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+1.  Introduction
+
+   PPP has three main components:
+
+      1. A method for encapsulating datagrams over serial links.
+
+      2. A Link Control Protocol (LCP) for establishing, configuring,
+         and testing the data-link connection.
+
+      3. A family of Network Control Protocols (NCPs) for establishing
+         and configuring different network-layer protocols.
+
+   In order to establish communications over a point-to-point link, each
+   end of the PPP link must first send LCP packets to configure and test
+   the data link.  After the link has been established and optional
+   facilities have been negotiated as needed by the LCP, PPP must send
+   NCP packets to choose and configure one or more network-layer
+   protocols.  Once each of the chosen network-layer protocols has been
+   configured, datagrams from each network-layer protocol can be sent
+   over the link.
+
+   The link will remain configured for communications until explicit LCP
+   or NCP packets close the link down, or until some external event
+   occurs (an inactivity timer expires or network administrator
+   intervention).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                        [Page 1]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+2.  A PPP Network Control Protocol (NCP) for IP
+
+   The IP Control Protocol (IPCP) is responsible for configuring,
+   enabling, and disabling the IP protocol modules on both ends of the
+   point-to-point link.  IPCP uses the same packet exchange machanism as
+   the Link Control Protocol (LCP).  IPCP packets may not be exchanged
+   until PPP has reached the Network-Layer Protocol phase.  IPCP packets
+   received before this phase is reached should be silently discarded.
+
+   The IP Control Protocol is exactly the same as the Link Control
+   Protocol [1] with the following exceptions:
+
+   Data Link Layer Protocol Field
+
+      Exactly one IPCP packet is encapsulated in the Information field
+      of PPP Data Link Layer frames where the Protocol field indicates
+      type hex 8021 (IP Control Protocol).
+
+   Code field
+
+      Only Codes 1 through 7 (Configure-Request, Configure-Ack,
+      Configure-Nak, Configure-Reject, Terminate-Request, Terminate-Ack
+      and Code-Reject) are used.  Other Codes should be treated as
+      unrecognized and should result in Code-Rejects.
+
+   Timeouts
+
+      IPCP packets may not be exchanged until PPP has reached the
+      Network-Layer Protocol phase.  An implementation should be
+      prepared to wait for Authentication and Link Quality Determination
+      to finish before timing out waiting for a Configure-Ack or other
+      response.  It is suggested that an implementation give up only
+      after user intervention or a configurable amount of time.
+
+   Configuration Option Types
+
+      IPCP has a distinct set of Configuration Options, which are
+      defined below.
+
+2.1.  Sending IP Datagrams
+
+   Before any IP packets may be communicated, PPP must reach the
+   Network-Layer Protocol phase, and the IP Control Protocol must reach
+   the Opened state.
+
+   Exactly one IP packet is encapsulated in the Information field of PPP
+   Data Link Layer frames where the Protocol field indicates type hex
+   0021 (Internet Protocol).
+
+
+
+McGregor                                                        [Page 2]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+   The maximum length of an IP packet transmitted over a PPP link is the
+   same as the maximum length of the Information field of a PPP data
+   link layer frame.  Larger IP datagrams must be fragmented as
+   necessary.  If a system wishes to avoid fragmentation and reassembly,
+   it should use the TCP Maximum Segment Size option [4], and MTU
+   discovery [5].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                        [Page 3]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+3.  IPCP Configuration Options
+
+IPCP Configuration Options allow negotiatiation of desirable Internet
+Protocol parameters.  IPCP uses the same Configuration Option format
+defined for LCP [1], with a separate set of Options.
+
+The most up-to-date values of the IPCP Option Type field are specified
+in the most recent "Assigned Numbers" RFC [6].  Current values are
+assigned as follows:
+
+   1       IP-Addresses
+   2       IP-Compression-Protocol
+   3       IP-Address
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                        [Page 4]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+3.1.  IP-Addresses
+
+   Description
+
+      The use of the Configuration Option IP-Addresses has been
+      deprecated.  It has been determined through implementation
+      experience that it is difficult to ensure negotiation convergence
+      in all cases using this option.  RFC 1172 [7] provides information
+      for implementations requiring backwards compatability.  The IP-
+      Address Configuration Option replaces this option, and its use is
+      preferred.
+
+      This option SHOULD NOT be sent in a Configure-Request if a
+      Configure-Request has been received which includes either an IP-
+      Addresses or IP-Address option.  This option MAY be sent if a
+      Configure-Reject is received for the IP-Address option, or a
+      Configure-Nak is received with an IP-Addresses option as an
+      appended option.
+
+      Support for this option MAY be removed after the IPCP protocol
+      status advances to Internet Draft Standard.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                        [Page 5]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+3.2.  IP-Compression-Protocol
+
+   Description
+
+      This Configuration Option provides a way to negotiate the use of a
+      specific compression protocol.  By default, compression is not
+      enabled.
+
+   A summary of the IP-Compression-Protocol Configuration Option format
+   is shown below.  The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |     IP-Compression-Protocol   |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |    Data ...
+   +-+-+-+-+
+
+   Type
+
+      2
+
+   Length
+
+      >= 4
+
+   IP-Compression-Protocol
+
+      The IP-Compression-Protocol field is two octets and indicates the
+      compression protocol desired.  Values for this field are always
+      the same as the PPP Data Link Layer Protocol field values for that
+      same compression protocol.
+
+      The most up-to-date values of the IP-Compression-Protocol field
+      are specified in the most recent "Assigned Numbers" RFC [6].
+      Current values are assigned as follows:
+
+         Value (in hex)          Protocol
+
+         002d                    Van Jacobson Compressed TCP/IP
+
+   Data
+
+      The Data field is zero or more octets and contains additional data
+      as determined by the particular compression protocol.
+
+
+
+
+
+McGregor                                                        [Page 6]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+   Default
+
+      No compression protocol enabled.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                        [Page 7]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+3.3.  IP-Address
+
+   Description
+
+      This Configuration Option provides a way to negotiate the IP
+      address to be used on the local end of the link.  It allows the
+      sender of the Configure-Request to state which IP-address is
+      desired, or to request that the peer provide the information.  The
+      peer can provide this information by NAKing the option, and
+      returning a valid IP-address.
+
+      If negotiation about the remote IP-address is required, and the
+      peer did not provide the option in its Configure-Request, the
+      option SHOULD be appended to a Configure-Nak.  The value of the
+      IP-address given must be acceptable as the remote IP-address, or
+      indicate a request that the peer provide the information.
+
+      By default, no IP address is assigned.
+
+   A summary of the IP-Address Configuration Option format is shown
+   below.  The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |           IP-Address
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+           IP-Address (cont)       |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Type
+
+      3
+
+   Length
+
+      6
+
+   IP-Address
+
+      The four octet IP-Address is the desired local address of the
+      sender of a Configure-Request.  If all four octets are set to
+      zero, it indicates a request that the peer provide the IP-Address
+      information.
+
+   Default
+
+      No IP address is assigned.
+
+
+
+McGregor                                                        [Page 8]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+4.  Van Jacobson TCP/IP header compression
+
+Van Jacobson TCP/IP header compression reduces the size of the TCP/IP
+headers to as few as three bytes.  This can be a significant improvement
+on slow serial lines, particularly for interactive traffic.
+
+The IP-Compression-Protocol Configuration Option is used to indicate the
+ability to receive compressed packets.  Each end of the link must
+separately request this option if bi-directional compression is desired.
+
+The PPP Protocol field is set to the following values when transmitting
+IP packets:
+
+   Value (in hex)
+
+   0021      Type IP.  The IP protocol is not TCP, or the packet is a
+             fragment, or cannot be compressed.
+
+   002d      Compressed TCP.  The TCP/IP headers are replaced by the
+             compressed header.
+
+   002f      Uncompressed TCP.  The IP protocol field is replaced by
+             the slot identifier.
+
+4.1.  Configuration Option Format
+
+   A summary of the IP-Compression-Protocol Configuration Option format
+   to negotiate Van Jacobson TCP/IP header compression is shown below.
+   The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |     IP-Compression-Protocol   |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |  Max-Slot-Id  | Comp-Slot-Id  |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Type
+
+      2
+
+   Length
+
+      6
+
+
+
+
+
+
+McGregor                                                        [Page 9]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+   IP-Compression-Protocol
+
+      002d (hex) for Van Jacobson Compressed TCP/IP headers.
+
+   Max-Slot-Id
+
+      The Max-Slot-Id field is one octet and indicates the maximum slot
+      identifier.  This is one less than the actual number of slots; the
+      slot identifier has values from zero to Max-Slot-Id.
+
+         Note: There may be implementations that have problems with only
+         one slot (Max-Slot-Id = 0).  See the discussion in reference
+         [3].  The example implementation in [3] will only work with 3
+         through 254 slots.
+
+   Comp-Slot-Id
+
+      The Comp-Slot-Id field is one octet and indicates whether the slot
+      identifier field may be compressed.
+
+         0  The slot identifier must not be compressed.  All compressed
+            TCP packets must set the C bit in every change mask, and
+            must include the slot identifier.
+
+         1  The slot identifer may be compressed.
+
+      The slot identifier must not be compressed if there is no ability
+      for the PPP link level to indicate an error in reception to the
+      decompression module.  Synchronization after errors depends on
+      receiving a packet with the slot identifier.  See the discussion
+      in reference [3].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                       [Page 10]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+A.  IPCP Recommended Options
+
+   The following Configurations Options are recommended:
+
+      IP-Compression-Protocol -- with at least 4 slots, usually 16
+      slots.
+
+      IP-Address -- only on dial-up lines.
+
+
+Security Considerations
+
+   Security issues are not discussed in this memo.
+
+
+References
+
+   [1]   Simpson, W., "The Point-to-Point Protocol", RFC 1331, May 1992.
+
+   [2]   Postel, J., "Internet Protocol", RFC 791, USC/Information
+         Sciences Institute, September 1981.
+
+   [3]   Jacobson, V., "Compressing TCP/IP Headers", RFC 1144, January
+         1990.
+
+   [4]   Postel, J., "The TCP Maximum Segment Size Option and Related
+         Topics", RFC 879, USC/Information Sciences Institute, November
+         1983.
+
+   [5]   Mogul, J., and S. Deering, "Path MTU Discovery", RFC 1191,
+         November 1990.
+
+   [6]   Reynolds, J., and J. Postel, "Assigned Numbers", RFC 1060,
+         USC/Information Sciences Institute, March 1990.
+
+   [7]   Perkins, D., and R. Hobby, "Point-to-Point Protocol (PPP)
+         initial configuration options", RFC 1172, August 1990.
+
+
+Acknowledgments
+
+   Some of the text in this document is taken from RFCs 1171 & 1172, by
+   Drew Perkins of Carnegie Mellon University, and by Russ Hobby of the
+   University of California at Davis.
+
+   Information leading to the expanded IP-Compression option provided by
+   Van Jacobson at SIGCOMM '90.
+
+
+
+
+McGregor                                                       [Page 11]
+
+RFC 1332                        PPP IPCP                        May 1992
+
+
+   Bill Simpson helped with the document formatting.
+
+
+Chair's Address
+
+   The working group can be contacted via the current chair:
+
+      Brian Lloyd
+      Lloyd & Associates
+      3420 Sudbury Road
+      Cameron Park, California 95682
+
+      Phone: (916) 676-1147
+
+      EMail: brian@ray.lloyd.com
+
+
+
+Author's Address
+
+   Questions about this memo can also be directed to:
+
+      Glenn McGregor
+      Merit Network, Inc.
+      1071 Beal Avenue
+      Ann Arbor, MI 48109-2103
+
+      Phone: (313) 763-1203
+
+      EMail: Glenn.McGregor@Merit.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McGregor                                                       [Page 12]
+
--- a/kernel/picotcp/RFC/rfc1334.txt
+++ b/kernel/picotcp/RFC/rfc1334.txt
@ -0,0 +1,899 @@
+
+
+
+
+
+
+Network Working Group                                           B. Lloyd
+Request for Comments: 1334                                           L&A
+                                                              W. Simpson
+                                                              Daydreamer
+                                                            October 1992
+
+
+                      PPP Authentication Protocols
+
+Status of this Memo
+
+   This RFC specifies an IAB standards track protocol for the Internet
+   community, and requests discussion and suggestions for improvements.
+   Please refer to the current edition of the "IAB Official Protocol
+   Standards" for the standardization state and status of this protocol.
+   Distribution of this memo is unlimited.
+
+Abstract
+
+   The Point-to-Point Protocol (PPP) [1] provides a standard method of
+   encapsulating Network Layer protocol information over point-to-point
+   links.  PPP also defines an extensible Link Control Protocol, which
+   allows negotiation of an Authentication Protocol for authenticating
+   its peer before allowing Network Layer protocols to transmit over the
+   link.
+
+   This document defines two protocols for Authentication: the Password
+   Authentication Protocol and the Challenge-Handshake Authentication
+   Protocol.  This memo is the product of the Point-to-Point Protocol
+   Working Group of the Internet Engineering Task Force (IETF).
+   Comments on this memo should be submitted to the ietf-ppp@ucdavis.edu
+   mailing list.
+
+Table of Contents
+
+   1.  Introduction ...............................................    2
+   1.1 Specification Requirements .................................    2
+   1.2 Terminology ................................................    3
+   2. Password Authentication Protocol ............................    3
+   2.1 Configuration Option Format ................................    4
+   2.2 Packet Format ..............................................    5
+   2.2.1 Authenticate-Request .....................................    5
+   2.2.2 Authenticate-Ack and Authenticate-Nak ....................    7
+   3. Challenge-Handshake Authentication Protocol..................    8
+   3.1 Configuration Option Format ................................    9
+   3.2 Packet Format ..............................................   10
+   3.2.1 Challenge and Response ...................................   11
+   3.2.2 Success and Failure ......................................   13
+
+
+
+Lloyd & Simpson                                                 [Page 1]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+   SECURITY CONSIDERATIONS ........................................   14
+   REFERENCES .....................................................   15
+   ACKNOWLEDGEMENTS ...............................................   16
+   CHAIR'S ADDRESS ................................................   16
+   AUTHOR'S ADDRESS ...............................................   16
+
+1.  Introduction
+
+   PPP has three main components:
+
+      1. A method for encapsulating datagrams over serial links.
+
+      2. A Link Control Protocol (LCP) for establishing, configuring,
+         and testing the data-link connection.
+
+      3. A family of Network Control Protocols (NCPs) for establishing
+         and configuring different network-layer protocols.
+
+   In order to establish communications over a point-to-point link, each
+   end of the PPP link must first send LCP packets to configure the data
+   link during Link Establishment phase.  After the link has been
+   established, PPP provides for an optional Authentication phase before
+   proceeding to the Network-Layer Protocol phase.
+
+   By default, authentication is not mandatory.  If authentication of
+   the link is desired, an implementation MUST specify the
+   Authentication-Protocol Configuration Option during Link
+   Establishment phase.
+
+   These authentication protocols are intended for use primarily by
+   hosts and routers that connect to a PPP network server via switched
+   circuits or dial-up lines, but might be applied to dedicated links as
+   well.  The server can use the identification of the connecting host
+   or router in the selection of options for network layer negotiations.
+
+   This document defines the PPP authentication protocols.  The Link
+   Establishment and Authentication phases, and the Authentication-
+   Protocol Configuration Option, are defined in The Point-to-Point
+   Protocol (PPP) [1].
+
+1.1.  Specification Requirements
+
+   In this document, several words are used to signify the requirements
+   of the specification.  These words are often capitalized.
+
+   MUST
+      This word, or the adjective "required", means that the definition
+      is an absolute requirement of the specification.
+
+
+
+Lloyd & Simpson                                                 [Page 2]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+   MUST NOT
+      This phrase means that the definition is an absolute prohibition
+      of the specification.
+
+   SHOULD
+      This word, or the adjective "recommended", means that there may
+      exist valid reasons in particular circumstances to ignore this
+      item, but the full implications should be understood and carefully
+      weighed before choosing a different course.
+
+   MAY
+      This word, or the adjective "optional", means that this item is
+      one of an allowed set of alternatives.  An implementation which
+      does not include this option MUST be prepared to interoperate with
+      another implementation which does include the option.
+
+1.2.  Terminology
+
+   This document frequently uses the following terms:
+
+   authenticator
+      The end of the link requiring the authentication.  The
+      authenticator specifies the authentication protocol to be used in
+      the Configure-Request during Link Establishment phase.
+
+   peer
+      The other end of the point-to-point link; the end which is being
+      authenticated by the authenticator.
+
+   silently discard
+      This means the implementation discards the packet without further
+      processing.  The implementation SHOULD provide the capability of
+      logging the error, including the contents of the silently
+      discarded packet, and SHOULD record the event in a statistics
+      counter.
+
+2.  Password Authentication Protocol
+
+   The Password Authentication Protocol (PAP) provides a simple method
+   for the peer to establish its identity using a 2-way handshake.  This
+   is done only upon initial link establishment.
+
+   After the Link Establishment phase is complete, an Id/Password pair
+   is repeatedly sent by the peer to the authenticator until
+   authentication is acknowledged or the connection is terminated.
+
+   PAP is not a strong authentication method.  Passwords are sent over
+   the circuit "in the clear", and there is no protection from playback
+
+
+
+Lloyd & Simpson                                                 [Page 3]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+   or repeated trial and error attacks.  The peer is in control of the
+   frequency and timing of the attempts.
+
+   Any implementations which include a stronger authentication method
+   (such as CHAP, described below) MUST offer to negotiate that method
+   prior to PAP.
+
+   This authentication method is most appropriately used where a
+   plaintext password must be available to simulate a login at a remote
+   host.  In such use, this method provides a similar level of security
+   to the usual user login at the remote host.
+
+      Implementation Note: It is possible to limit the exposure of the
+      plaintext password to transmission over the PPP link, and avoid
+      sending the plaintext password over the entire network.  When the
+      remote host password is kept as a one-way transformed value, and
+      the algorithm for the transform function is implemented in the
+      local server, the plaintext password SHOULD be locally transformed
+      before comparison with the transformed password from the remote
+      host.
+
+2.1.  Configuration Option Format
+
+   A summary of the Authentication-Protocol Configuration Option format
+   to negotiate the Password Authentication Protocol is shown below.
+   The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |     Authentication-Protocol   |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Type
+
+      3
+
+   Length
+
+      4
+
+   Authentication-Protocol
+
+      c023 (hex) for Password Authentication Protocol.
+
+   Data
+
+      There is no Data field.
+
+
+
+Lloyd & Simpson                                                 [Page 4]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+2.2.  Packet Format
+
+   Exactly one Password Authentication Protocol packet is encapsulated
+   in the Information field of a PPP Data Link Layer frame where the
+   protocol field indicates type hex c023 (Password Authentication
+   Protocol).  A summary of the PAP packet format is shown below.  The
+   fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |    Data ...
+   +-+-+-+-+
+
+   Code
+
+      The Code field is one octet and identifies the type of PAP packet.
+      PAP Codes are assigned as follows:
+
+         1       Authenticate-Request
+         2       Authenticate-Ack
+         3       Authenticate-Nak
+
+   Identifier
+
+      The Identifier field is one octet and aids in matching requests
+      and replies.
+
+   Length
+
+      The Length field is two octets and indicates the length of the PAP
+      packet including the Code, Identifier, Length and Data fields.
+      Octets outside the range of the Length field should be treated as
+      Data Link Layer padding and should be ignored on reception.
+
+   Data
+
+      The Data field is zero or more octets.  The format of the Data
+      field is determined by the Code field.
+
+2.2.1.  Authenticate-Request
+
+   Description
+
+      The Authenticate-Request packet is used to begin the Password
+      Authentication Protocol.  The link peer MUST transmit a PAP packet
+
+
+
+Lloyd & Simpson                                                 [Page 5]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+      with the Code field set to 1 (Authenticate-Request) during the
+      Authentication phase.  The Authenticate-Request packet MUST be
+      repeated until a valid reply packet is received, or an optional
+      retry counter expires.
+
+      The authenticator SHOULD expect the peer to send an Authenticate-
+      Request packet.  Upon reception of an Authenticate-Request packet,
+      some type of Authenticate reply (described below) MUST be
+      returned.
+
+         Implementation Note: Because the Authenticate-Ack might be
+         lost, the authenticator MUST allow repeated Authenticate-
+         Request packets after completing the Authentication phase.
+         Protocol phase MUST return the same reply Code returned when
+         the Authentication phase completed (the message portion MAY be
+         different).  Any Authenticate-Request packets received during
+         any other phase MUST be silently discarded.
+
+         When the Authenticate-Nak is lost, and the authenticator
+         terminates the link, the LCP Terminate-Request and Terminate-
+         Ack provide an alternative indication that authentication
+         failed.
+
+   A summary of the Authenticate-Request packet format is shown below.
+   The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | Peer-ID Length|  Peer-Id ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+
+   | Passwd-Length |  Password  ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Code
+
+      1 for Authenticate-Request.
+
+   Identifier
+
+      The Identifier field is one octet and aids in matching requests
+      and replies.  The Identifier field MUST be changed each time an
+      Authenticate-Request packet is issued.
+
+
+
+
+
+
+Lloyd & Simpson                                                 [Page 6]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+   Peer-ID-Length
+
+      The Peer-ID-Length field is one octet and indicates the length of
+      the Peer-ID field.
+
+   Peer-ID
+
+      The Peer-ID field is zero or more octets and indicates the name of
+      the peer to be authenticated.
+
+   Passwd-Length
+
+      The Passwd-Length field is one octet and indicates the length of
+      the Password field.
+
+   Password
+
+      The Password field is zero or more octets and indicates the
+      password to be used for authentication.
+
+2.2.2.  Authenticate-Ack and Authenticate-Nak
+
+   Description
+
+      If the Peer-ID/Password pair received in an Authenticate-Request
+      is both recognizable and acceptable, then the authenticator MUST
+      transmit a PAP packet with the Code field set to 2 (Authenticate-
+      Ack).
+
+      If the Peer-ID/Password pair received in a Authenticate-Request is
+      not recognizable or acceptable, then the authenticator MUST
+      transmit a PAP packet with the Code field set to 3 (Authenticate-
+      Nak), and SHOULD take action to terminate the link.
+
+   A summary of the Authenticate-Ack and Authenticate-Nak packet format
+   is shown below.  The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |  Msg-Length   |  Message  ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+-
+
+   Code
+
+      2 for Authenticate-Ack;
+
+
+
+Lloyd & Simpson                                                 [Page 7]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+      3 for Authenticate-Nak.
+
+   Identifier
+
+      The Identifier field is one octet and aids in matching requests
+      and replies.  The Identifier field MUST be copied from the
+      Identifier field of the Authenticate-Request which caused this
+      reply.
+
+   Msg-Length
+
+      The Msg-Length field is one octet and indicates the length of the
+      Message field.
+
+   Message
+
+      The Message field is zero or more octets, and its contents are
+      implementation dependent.  It is intended to be human readable,
+      and MUST NOT affect operation of the protocol.  It is recommended
+      that the message contain displayable ASCII characters 32 through
+      126 decimal.  Mechanisms for extension to other character sets are
+      the topic of future research.
+
+3.  Challenge-Handshake Authentication Protocol
+
+   The Challenge-Handshake Authentication Protocol (CHAP) is used to
+   periodically verify the identity of the peer using a 3-way handshake.
+   This is done upon initial link establishment, and MAY be repeated
+   anytime after the link has been established.
+
+   After the Link Establishment phase is complete, the authenticator
+   sends a "challenge" message to the peer.  The peer responds with a
+   value calculated using a "one-way hash" function.  The authenticator
+   checks the response against its own calculation of the expected hash
+   value.  If the values match, the authentication is acknowledged;
+   otherwise the connection SHOULD be terminated.
+
+   CHAP provides protection against playback attack through the use of
+   an incrementally changing identifier and a variable challenge value.
+   The use of repeated challenges is intended to limit the time of
+   exposure to any single attack.  The authenticator is in control of
+   the frequency and timing of the challenges.
+
+   This authentication method depends upon a "secret" known only to the
+   authenticator and that peer.  The secret is not sent over the link.
+   This method is most likely used where the same secret is easily
+   accessed from both ends of the link.
+
+
+
+
+Lloyd & Simpson                                                 [Page 8]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+      Implementation Note: CHAP requires that the secret be available in
+      plaintext form.  To avoid sending the secret over other links in
+      the network, it is recommended that the challenge and response
+      values be examined at a central server, rather than each network
+      access server.  Otherwise, the secret SHOULD be sent to such
+      servers in a reversably encrypted form.
+
+   The CHAP algorithm requires that the length of the secret MUST be at
+   least 1 octet.  The secret SHOULD be at least as large and
+   unguessable as a well-chosen password.  It is preferred that the
+   secret be at least the length of the hash value for the hashing
+   algorithm chosen (16 octets for MD5).  This is to ensure a
+   sufficiently large range for the secret to provide protection against
+   exhaustive search attacks.
+
+   The one-way hash algorithm is chosen such that it is computationally
+   infeasible to determine the secret from the known challenge and
+   response values.
+
+   The challenge value SHOULD satisfy two criteria: uniqueness and
+   unpredictability.  Each challenge value SHOULD be unique, since
+   repetition of a challenge value in conjunction with the same secret
+   would permit an attacker to reply with a previously intercepted
+   response.  Since it is expected that the same secret MAY be used to
+   authenticate with servers in disparate geographic regions, the
+   challenge SHOULD exhibit global and temporal uniqueness.  Each
+   challenge value SHOULD also be unpredictable, least an attacker trick
+   a peer into responding to a predicted future challenge, and then use
+   the response to masquerade as that peer to an authenticator.
+   Although protocols such as CHAP are incapable of protecting against
+   realtime active wiretapping attacks, generation of unique
+   unpredictable challenges can protect against a wide range of active
+   attacks.
+
+   A discussion of sources of uniqueness and probability of divergence
+   is included in the Magic-Number Configuration Option [1].
+
+3.1.  Configuration Option Format
+
+   A summary of the Authentication-Protocol Configuration Option format
+   to negotiate the Challenge-Handshake Authentication Protocol is shown
+   below.  The fields are transmitted from left to right.
+
+
+
+
+
+
+
+
+
+Lloyd & Simpson                                                 [Page 9]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |     Authentication-Protocol   |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |   Algorithm   |
+   +-+-+-+-+-+-+-+-+
+
+   Type
+
+      3
+
+   Length
+
+      5
+
+   Authentication-Protocol
+
+      c223 (hex) for Challenge-Handshake Authentication Protocol.
+
+   Algorithm
+
+      The Algorithm field is one octet and indicates the one-way hash
+      method to be used.  The most up-to-date values of the CHAP
+      Algorithm field are specified in the most recent "Assigned
+      Numbers" RFC [2].  Current values are assigned as follows:
+
+         0-4     unused (reserved)
+         5       MD5 [3]
+
+3.2.  Packet Format
+
+   Exactly one Challenge-Handshake Authentication Protocol packet is
+   encapsulated in the Information field of a PPP Data Link Layer frame
+   where the protocol field indicates type hex c223 (Challenge-Handshake
+   Authentication Protocol).  A summary of the CHAP packet format is
+   shown below.  The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |    Data ...
+   +-+-+-+-+
+
+
+
+
+
+
+Lloyd & Simpson                                                [Page 10]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+   Code
+
+      The Code field is one octet and identifies the type of CHAP
+      packet.  CHAP Codes are assigned as follows:
+
+         1       Challenge
+         2       Response
+         3       Success
+         4       Failure
+
+   Identifier
+
+      The Identifier field is one octet and aids in matching challenges,
+      responses and replies.
+
+   Length
+
+      The Length field is two octets and indicates the length of the
+      CHAP packet including the Code, Identifier, Length and Data
+      fields.  Octets outside the range of the Length field should be
+      treated as Data Link Layer padding and should be ignored on
+      reception.
+
+   Data
+
+      The Data field is zero or more octets.  The format of the Data
+      field is determined by the Code field.
+
+3.2.1.  Challenge and Response
+
+   Description
+
+      The Challenge packet is used to begin the Challenge-Handshake
+      Authentication Protocol.  The authenticator MUST transmit a CHAP
+      packet with the Code field set to 1 (Challenge).  Additional
+      Challenge packets MUST be sent until a valid Response packet is
+      received, or an optional retry counter expires.
+
+      A Challenge packet MAY also be transmitted at any time during the
+      Network-Layer Protocol phase to ensure that the connection has not
+      been altered.
+
+      The peer SHOULD expect Challenge packets during the Authentication
+      phase and the Network-Layer Protocol phase.  Whenever a Challenge
+      packet is received, the peer MUST transmit a CHAP packet with the
+      Code field set to 2 (Response).
+
+      Whenever a Response packet is received, the authenticator compares
+
+
+
+Lloyd & Simpson                                                [Page 11]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+      the Response Value with its own calculation of the expected value.
+      Based on this comparison, the authenticator MUST send a Success or
+      Failure packet (described below).
+
+         Implementation Note: Because the Success might be lost, the
+         authenticator MUST allow repeated Response packets after
+         completing the Authentication phase.  To prevent discovery of
+         alternative Names and Secrets, any Response packets received
+         having the current Challenge Identifier MUST return the same
+         reply Code returned when the Authentication phase completed
+         (the message portion MAY be different).  Any Response packets
+         received during any other phase MUST be silently discarded.
+
+         When the Failure is lost, and the authenticator terminates the
+         link, the LCP Terminate-Request and Terminate-Ack provide an
+         alternative indication that authentication failed.
+
+   A summary of the Challenge and Response packet format is shown below.
+   The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |  Value-Size   |  Value ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |  Name ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Code
+
+      1 for Challenge;
+
+      2 for Response.
+
+   Identifier
+
+      The Identifier field is one octet.  The Identifier field MUST be
+      changed each time a Challenge is sent.
+
+      The Response Identifier MUST be copied from the Identifier field
+      of the Challenge which caused the Response.
+
+   Value-Size
+
+      This field is one octet and indicates the length of the Value
+      field.
+
+
+
+Lloyd & Simpson                                                [Page 12]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+   Value
+
+      The Value field is one or more octets.  The most significant octet
+      is transmitted first.
+
+      The Challenge Value is a variable stream of octets.  The
+      importance of the uniqueness of the Challenge Value and its
+      relationship to the secret is described above.  The Challenge
+      Value MUST be changed each time a Challenge is sent.  The length
+      of the Challenge Value depends upon the method used to generate
+      the octets, and is independent of the hash algorithm used.
+
+      The Response Value is the one-way hash calculated over a stream of
+      octets consisting of the Identifier, followed by (concatenated
+      with) the "secret", followed by (concatenated with) the Challenge
+      Value.  The length of the Response Value depends upon the hash
+      algorithm used (16 octets for MD5).
+
+   Name
+
+      The Name field is one or more octets representing the
+      identification of the system transmitting the packet.  There are
+      no limitations on the content of this field.  For example, it MAY
+      contain ASCII character strings or globally unique identifiers in
+      ASN.1 syntax.  The Name should not be NUL or CR/LF terminated.
+      The size is determined from the Length field.
+
+      Since CHAP may be used to authenticate many different systems, the
+      content of the name field(s) may be used as a key to locate the
+      proper secret in a database of secrets.  This also makes it
+      possible to support more than one name/secret pair per system.
+
+3.2.2.  Success and Failure
+
+   Description
+
+      If the Value received in a Response is equal to the expected
+      value, then the implementation MUST transmit a CHAP packet with
+      the Code field set to 3 (Success).
+
+      If the Value received in a Response is not equal to the expected
+      value, then the implementation MUST transmit a CHAP packet with
+      the Code field set to 4 (Failure), and SHOULD take action to
+      terminate the link.
+
+   A summary of the Success and Failure packet format is shown below.
+   The fields are transmitted from left to right.
+
+
+
+
+Lloyd & Simpson                                                [Page 13]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |  Message  ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+-
+
+   Code
+
+      3 for Success;
+
+      4 for Failure.
+
+   Identifier
+
+      The Identifier field is one octet and aids in matching requests
+      and replies.  The Identifier field MUST be copied from the
+      Identifier field of the Response which caused this reply.
+
+   Message
+
+      The Message field is zero or more octets, and its contents are
+      implementation dependent.  It is intended to be human readable,
+      and MUST NOT affect operation of the protocol.  It is recommended
+      that the message contain displayable ASCII characters 32 through
+      126 decimal.  Mechanisms for extension to other character sets are
+      the topic of future research.  The size is determined from the
+      Length field.
+
+Security Considerations
+
+      Security issues are the primary topic of this RFC.
+
+      The interaction of the authentication protocols within PPP are
+      highly implementation dependent.  This is indicated by the use of
+      SHOULD throughout the document.
+
+      For example, upon failure of authentication, some implementations
+      do not terminate the link.  Instead, the implementation limits the
+      kind of traffic in the Network-Layer Protocols to a filtered
+      subset, which in turn allows the user opportunity to update
+      secrets or send mail to the network administrator indicating a
+      problem.
+
+      There is no provision for re-tries of failed authentication.
+      However, the LCP state machine can renegotiate the authentication
+      protocol at any time, thus allowing a new attempt.  It is
+
+
+
+Lloyd & Simpson                                                [Page 14]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+      recommended that any counters used for authentication failure not
+      be reset until after successful authentication, or subsequent
+      termination of the failed link.
+
+      There is no requirement that authentication be full duplex or that
+      the same protocol be used in both directions.  It is perfectly
+      acceptable for different protocols to be used in each direction.
+      This will, of course, depend on the specific protocols negotiated.
+
+      In practice, within or associated with each PPP server, there is a
+      database which associates "user" names with authentication
+      information ("secrets").  It is not anticipated that a particular
+      named user would be authenticated by multiple methods.  This would
+      make the user vulnerable to attacks which negotiate the least
+      secure method from among a set (such as PAP rather than CHAP).
+      Instead, for each named user there should be an indication of
+      exactly one method used to authenticate that user name.  If a user
+      needs to make use of different authentication method under
+      different circumstances, then distinct user names SHOULD be
+      employed, each of which identifies exactly one authentication
+      method.
+
+      Passwords and other secrets should be stored at the respective
+      ends such that access to them is as limited as possible.  Ideally,
+      the secrets should only be accessible to the process requiring
+      access in order to perform the authentication.
+
+      The secrets should be distributed with a mechanism that limits the
+      number of entities that handle (and thus gain knowledge of) the
+      secret.  Ideally, no unauthorized person should ever gain
+      knowledge of the secrets.  It is possible to achieve this with
+      SNMP Security Protocols [4], but such a mechanism is outside the
+      scope of this specification.
+
+      Other distribution methods are currently undergoing research and
+      experimentation.  The SNMP Security document also has an excellent
+      overview of threats to network protocols.
+
+References
+
+   [1] Simpson, W., "The Point-to-Point Protocol (PPP)", RFC 1331,
+       Daydreamer, May 1992.
+
+   [2] Reynolds, J., and J. Postel, "Assigned Numbers", RFC 1340,
+       USC/Information Sciences Institute, July 1992.
+
+
+
+
+
+
+Lloyd & Simpson                                                [Page 15]
+
+RFC 1334                   PPP Authentication               October 1992
+
+
+   [3] Rivest, R., and S. Dusse, "The MD5 Message-Digest Algorithm", MIT
+       Laboratory for Computer Science and RSA Data Security, Inc.  RFC
+       1321, April 1992.
+
+   [4] Galvin, J., McCloghrie, K., and J. Davin, "SNMP Security
+       Protocols", Trusted Information Systems, Inc., Hughes LAN
+       Systems, Inc., MIT Laboratory for Computer Science, RFC 1352,
+       July 1992.
+
+Acknowledgments
+
+   Some of the text in this document is taken from RFC 1172, by Drew
+   Perkins of Carnegie Mellon University, and by Russ Hobby of the
+   University of California at Davis.
+
+   Special thanks to Dave Balenson, Steve Crocker, James Galvin, and
+   Steve Kent, for their extensive explanations and suggestions.  Now,
+   if only we could get them to agree with each other.
+
+Chair's Address
+
+   The working group can be contacted via the current chair:
+
+      Brian Lloyd
+      Lloyd & Associates
+      3420 Sudbury Road
+      Cameron Park, California 95682
+
+      Phone: (916) 676-1147
+
+      EMail: brian@lloyd.com
+
+Author's Address
+
+   Questions about this memo can also be directed to:
+
+      William Allen Simpson
+      Daydreamer
+      Computer Systems Consulting Services
+      P O Box 6205
+      East Lansing, MI  48826-6205
+
+      EMail: Bill.Simpson@um.cc.umich.edu
+
+
+
+
+
+
+
+
+Lloyd & Simpson                                                [Page 16]
+
--- a/kernel/picotcp/RFC/rfc1337.txt
+++ b/kernel/picotcp/RFC/rfc1337.txt
@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group                                          R. Braden
+Request for Comments: 1337                                           ISI
+                                                                May 1992
+
+
+                 TIME-WAIT Assassination Hazards in TCP
+
+Status of This Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard.  Distribution of this memo is
+   unlimited.
+
+Abstract
+
+   This note describes some theoretically-possible failure modes for TCP
+   connections and discusses possible remedies.  In particular, one very
+   simple fix is identified.
+
+1. INTRODUCTION
+
+   Experiments to validate the recently-proposed TCP extensions [RFC-
+   1323] have led to the discovery of a new class of TCP failures, which
+   have been dubbed the "TIME-WAIT Assassination hazards".  This note
+   describes these hazards, gives examples, and discusses possible
+   prevention measures.
+
+   The failures in question all result from old duplicate segments.  In
+   brief, the TCP mechanisms to protect against old duplicate segments
+   are [RFC-793]:
+
+   (1)  The 3-way handshake rejects old duplicate initial <SYN>
+        segments, avoiding the hazard of replaying a connection.
+
+   (2)  Sequence numbers are used to reject old duplicate data and ACK
+        segments from the current incarnation of a given connection
+        (defined by a particular host and port pair).  Sequence numbers
+        are also used to reject old duplicate <SYN,ACK> segments.
+
+        For very high-speed connections, Jacobson's PAWS ("Protect
+        Against Wrapped Sequences") mechanism [RFC-1323] effectively
+        extends the sequence numbers so wrap-around will not introduce a
+        hazard within the same incarnation.
+
+   (3)  There are two mechanisms to avoid hazards due to old duplicate
+        segments from an earlier instance of the same connection; see
+        the Appendix to [RFC-1185] for details.
+
+
+
+
+Braden                                                          [Page 1]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+        For "short and slow" connections [RFC-1185], the clock-driven
+        ISN (initial sequence number) selection prevents the overlap of
+        the sequence spaces of the old and new incarnations [RFC-793].
+        (The algorithm used by Berkeley BSD TCP for stepping ISN
+        complicates the analysis slightly but does not change the
+        conclusions.)
+
+   (4)  TIME-WAIT state removes the hazard of old duplicates for "fast"
+        or "long" connections, in which clock-driven ISN selection is
+        unable to prevent overlap of the old and new sequence spaces.
+        The TIME-WAIT delay allows all old duplicate segments time
+        enough to die in the Internet before the connection is reopened.
+
+   (5)  After a system crash, the Quiet Time at system startup allows
+        old duplicates to disappear before any connections are opened.
+
+   Our new observation is that (4) is unreliable: TIME-WAIT state can be
+   prematurely terminated ("assassinated") by an old duplicate data or
+   ACK segment from the current or an earlier incarnation of the same
+   connection.  We refer to this as "TIME-WAIT Assassination" (TWA).
+
+   Figure 1 shows an example of TIME-WAIT assassination.  Segments 1-5
+   are copied exactly from Figure 13 of RFC-793, showing a normal close
+   handshake.  Packets 5.1, 5.2, and 5.3 are an extension to this
+   sequence, illustrating TWA.   Here 5.1 is *any* old segment that is
+   unacceptable to TCP A.  It might be unacceptable because of its
+   sequence number or because of an old PAWS timestamp.  In either case,
+   TCP A sends an ACK segment 5.2 for its current SND.NXT and RCV.NXT.
+   Since it has no state for this connection, TCP B reflects this as RST
+   segment 5.3, which assassinates the TIME-WAIT state at A!
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Braden                                                          [Page 2]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+
+       TCP A                                                TCP B
+
+   1.  ESTABLISHED                                          ESTABLISHED
+
+       (Close)
+   2.  FIN-WAIT-1  --> <SEQ=100><ACK=300><CTL=FIN,ACK>  --> CLOSE-WAIT
+
+   3.  FIN-WAIT-2  <-- <SEQ=300><ACK=101><CTL=ACK>      <-- CLOSE-WAIT
+
+                                                            (Close)
+   4.  TIME-WAIT   <-- <SEQ=300><ACK=101><CTL=FIN,ACK>  <-- LAST-ACK
+
+   5.  TIME-WAIT   --> <SEQ=101><ACK=301><CTL=ACK>      --> CLOSED
+
+  - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+
+   5.1. TIME-WAIT   <--  <SEQ=255><ACK=33> ... old duplicate
+
+   5.2  TIME-WAIT   --> <SEQ=101><ACK=301><CTL=ACK>    -->  ????
+
+   5.3  CLOSED      <-- <SEQ=301><CTL=RST>             <--  ????
+      (prematurely)
+
+                         Figure 1.  TWA Example
+
+
+   Note that TWA is not at all an unlikely event if there are any
+   duplicate segments that may be delayed in the network.  Furthermore,
+   TWA cannot be prevented by PAWS timestamps; the event may happen
+   within the same tick of the timestamp clock.  TWA is a consequence of
+   TCP's half-open connection discovery mechanism (see pp 33-34 of
+   [RFC-793]), which is designed to clean up after a system crash.
+
+2. The TWA Hazards
+
+   2.1 Introduction
+
+      If the connection is immediately reopened after a TWA event, the
+      new incarnation will be exposed to old duplicate segments (except
+      for the initial <SYN> segment, which is handled by the 3-way
+      handshake).  There are three possible hazards that result:
+
+      H1.  Old duplicate data may be accepted erroneously.
+
+      H2.  The new connection may be de-synchronized, with the two ends
+           in permanent disagreement on the state.  Following the spec
+           of RFC-793, this desynchronization results in an infinite ACK
+
+
+
+Braden                                                          [Page 3]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+           loop.  (It might be reasonable to change this aspect of RFC-
+           793 and kill the connection instead.)
+
+           This hazard results from acknowledging something that was not
+           sent.  This may result from an old duplicate ACK or as a
+           side-effect of hazard H1.
+
+      H3.  The new connection may die.
+
+           A duplicate segment (data or ACK) arriving in SYN-SENT state
+           may kill the new connection after it has apparently opened
+           successfully.
+
+      Each of these hazards requires that the seqence space of the new
+      connection overlap to some extent with the sequence space of the
+      previous incarnation.  As noted above, this is only possible for
+      "fast" or "long" connections.  Since these hazards all require the
+      coincidence of an old duplicate falling into a particular range of
+      new sequence numbers, they are much less probable than TWA itself.
+
+      TWA and the three hazards H1, H2, and H3 have been demonstrated on
+      a stock Sun OS 4.1.1 TCP running in an simulated environment that
+      massively duplicates segments.  This environment is far more
+      hazardous than most real TCP's must cope with, and the conditions
+      were carefully tuned to create the necessary conditions for the
+      failures.  However, these demonstrations are in effect an
+      existence proof for the hazards.
+
+      We now present example scenarios for each of these hazards.  Each
+      scenario is assumed to follow immediately after a TWA event
+      terminated the previous incarnation of the same connection.
+
+   2.2  HAZARD H1: Acceptance of erroneous old duplicate data.
+
+      Without the protection of the TIME-WAIT delay, it is possible for
+      erroneous old duplicate data from the earlier incarnation to be
+      accepted.  Figure 2 shows precisely how this might happen.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Braden                                                          [Page 4]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+
+           TCP A                                                 TCP B
+
+      1. ESTABL.  --> <SEQ=400><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
+
+      2. ESTABL.  <--     <SEQ=101><ACK=500><CTL=ACK>     <--   ESTABL.
+
+      3.  (old dupl)...<SEQ=560><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
+
+      4. ESTABL.  <--     <SEQ=101><ACK=500><CTL=ACK>     <--   ESTABL.
+
+      5. ESTABL.  --> <SEQ=500><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
+
+      6.             ...  <SEQ=101><ACK=640><CTL=ACK>     <--   ESTABL.
+
+     - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+
+      7a. ESTABL. --> <SEQ=600><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
+
+      8a. ESTABL.  <--    <SEQ=101><ACK=640><CTL=ACK> ...
+
+      9a. ESTABL. --> <SEQ=700><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
+
+                    Figure 2: Accepting Erroneous Data
+
+      The connection has already been successfully reopened after the
+      assumed TWA event.  Segment 1 is a normal data segment and segment
+      2 is the corresponding ACK segment.  Old duplicate data segment 3
+      from the earlier incarnation happens to fall within the current
+      receive window, resulting in a duplicate ACK segment #4.  The
+      erroneous data is queued and "lurks" in the TCP reassembly queue
+      until data segment 5 overlaps it.  At that point, either 80 or 40
+      bytes of erroneous data is delivered to the user B; the choice
+      depends upon the particulars of the reassembly algorithm, which
+      may accept the first or the last duplicate data.
+
+      As a result, B sends segment 6, an ACK for sequence = 640, which
+      is 40 beyond any data sent by A.  Assume for the present that this
+      ACK arrives at A *after* A has sent segment 7a, the next full data
+      segment.  In that case, the ACK segment 8a acknowledges data that
+      has been sent, and the error goes undetected.  Another possible
+      continuation after segment 6 leads to hazard H3, shown below.
+
+   2.3  HAZARD H2: De-synchronized Connection
+
+      This hazard may result either as a side effect of H1 or directly
+      from an old duplicate ACK that happens to be acceptable but
+      acknowledges something that has not been sent.
+
+
+
+Braden                                                          [Page 5]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+      Referring to Figure 2 above, suppose that the ACK generated by the
+      old duplicate data segment arrived before the next data segment
+      had been sent.  The result is an infinite ACK loop, as shown by
+      the following alternate continuation of Figure 2.
+
+     - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+      7b. ESTABL.  <--    <SEQ=101><ACK=640><CTL=ACK>   ...
+     (ACK something not yet
+      sent => send ACK)
+
+      8b. ESTABL.  -->    <SEQ=600><ACK101><CTL=ACK>       -->   ESTABL.
+                                                       (Below window =>
+                                                            send ACK)
+
+      9b. ESTABL.  <--    <SEQ=101><ACK=640><CTL=ACK>     <--    ESTABL.
+
+                               (etc.!)
+
+                     Figure 3: Infinite ACK loop
+
+
+   2.4  HAZARD H3:  Connection Failure
+
+      An old duplicate ACK segment may lead to an apparent refusal of
+      TCP A's next connection attempt, as illustrated in Figure 4.  Here
+      <W=...> indicates the TCP window field SEG.WIND.*
+
+        TCP A                                                     TCP B
+
+    1.  CLOSED                                                   LISTEN
+
+    2.  SYN-SENT    --> <SEQ=100><CTL=SYN>                 --> SYN-RCVD
+
+    3.         ... <SEQ=400><ACK=101><CTL=SYN,ACK><W=800>  <-- SYN-RCVD
+
+    4.  SYN-SENT    <-- <SEQ=300><ACK=123><CTL=ACK> ... (old duplicate)
+
+    5.  SYN-SENT    --> <SEQ=123><CTL=RST>                   --> LISTEN
+
+    6.  ESTABLISHED <-- <SEQ=400><ACK=101><CTL=SYN,ACK><W=900> ...
+
+    7.  ESTABLISHED --> <SEQ=101><ACK=401><CTL=ACK>          --> LISTEN
+
+    8.  CLOSED      <--  <SEQ=401><CTL=RST>                  <-- LISTEN
+
+
+           Figure 4: Connection Failure from Old Duplicate
+
+
+
+
+Braden                                                          [Page 6]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+      The key to the failure in Figure 4 is that the RST segment 5 is
+      acceptable to TCP B in SYN-RECEIVED state, because the sequence
+      space of the earlier connection that produced this old duplicate
+      overlaps the new connection space.  Thus, <SEQ=123> in segment #5
+      falls within TCP B's receive window [101,900).  In experiments,
+      this failure mode was very easy to demonstrate.  (Kurt Matthys has
+      pointed out that this scenario is time-dependent:  if TCP A should
+      timeout and retransmit the initial SYN after segment 5 arrives and
+      before segment 6, then the open will complete successfully.)
+
+3. Fixes for TWA Hazards
+
+   We discuss three possible fixes to TCP to avoid these hazards.
+
+   (F1) Ignore RST segments in TIME-WAIT state.
+
+        If the 2 minute MSL is enforced, this fix avoids all three
+        hazards.
+
+        This is the simplest fix.  One could also argue that it is
+        formally the correct thing to do; since allowing time for old
+        duplicate segments to die is one of TIME-WAIT state's functions,
+        the state should not be truncated by a RST segment.
+
+   (F2) Use PAWS to avoid the hazards.
+
+        Suppose that the TCP ignores RST segments in TIME-WAIT state,
+        but only long enough to guarantee that the timestamp clocks on
+        both ends have ticked.  Then the PAWS mechanism [RFC-1323] will
+        prevent old duplicate data segments from interfering with the
+        new incarnation, eliminating hazard H1.  For reasons explained
+        below, however, it may not eliminate all old duplicate ACK
+        segments, so hazards H2 and H3 will still exist.
+
+        In the language of the TCP Extensions RFC [RFC-1323]:
+
+           When processing a RST bit in TIME-WAIT state:
+
+               If (Snd.TS.OK is off) or (Time.in.TW.state() >= W)
+                   then enter the CLOSED state, delete the TCB,
+                   drop the RST segment, and return.
+
+               else simply drop the RST segment and return.
+
+        Here "Time.in.TW.state()" is a function returning the elapsed
+        time since TIME-WAIT state was entered, and W is a constant that
+        is at least twice the longest possible period for timestamp
+        clocks, i.e., W = 2 secs [RFC-1323].
+
+
+
+Braden                                                          [Page 7]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+        This assumes that the timestamp clock at each end continues to
+        advance at a constant rate whether or not there are any open
+        connections.  We do not have to consider what happens across a
+        system crash (e.g., the timestamp clock may jump randomly),
+        because of the assumed Quiet Time at system startup.
+
+        Once this change is in place, the initial timestamps that occur
+        on the SYN and {SYN,ACK} segments reopening the connection will
+        be larger than any timestamp on a segment from earlier
+        incarnations.  As a result, the PAWS mechanism operating in the
+        new connection incarnation will avoid the H1 hazard, ie.
+        acceptance of old duplicate data.
+
+        The effectiveness of fix (F2) in preventing acceptance of old
+        duplicate data segments, i.e., hazard H1, has been demonstrated
+        in the Sun OS TCP mentioned earlier.  Unfortunately, these tests
+        revealed a somewhat surprising fact:  old duplicate ACKs from
+        the earlier incarnation can still slip past PAWS, so that (F2)
+        will not prevent failures H2 or H3.  What happens is that TIME-
+        WAIT state effectively regenerates the timestamp of an old
+        duplicate ACK.  That is, when an old duplicate arrives in TIME-
+        WAIT state, an extended TCP will send out its own ACK with a
+        timestamp option containing its CURRENT timestamp clock value.
+        If this happens immediately before the TWA mechanism kills
+        TIME-WAIT state, the result will be a "new old duplicate"
+        segment with a current timestamp that may pass the PAWS test on
+        the reopened connection.
+
+        Whether H2 and H3 are critical depends upon how often they
+        happen and what assumptions the applications make about TCP
+        semantics.  In the case of the H3 hazard, merely trying the open
+        again is likely to succeed.  Furthermore, many production TCPs
+        have (despite the advice of the researchers who developed TCP)
+        incorporated a "keep-alive" mechanism, which may kill
+        connections unnecessarily.  The frequency of occurrence of H2
+        and H3 may well be much lower than keep-alive failures or
+        transient internet routing failures.
+
+   (F3) Use 64-bit Sequence Numbers
+
+        O'Malley and Peterson [RFC-1264] have suggested expansion of the
+        TCP sequence space to 64 bits as an alternative to PAWS for
+        avoiding the hazard of wrapped sequence numbers within the same
+        incarnation.  It is worthwhile to inquire whether 64-bit
+        sequence numbers could be used to avoid the TWA hazards as well.
+
+        Using 64 bit sequence numbers would not prevent TWA - the early
+        termination of TIME-WAIT state.  However, it appears that a
+
+
+
+Braden                                                          [Page 8]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+        combination of 64-bit sequence numbers with an appropriate
+        modification of the TCP parameters could defeat all of the TWA
+        hazards H1, H2, and H3.  The basis for this is explained in an
+        appendix to this memo.  In summary, it could be arranged that
+        the same sequence space would be reused only after a very long
+        period of time, so every connection would be "slow" and "short".
+
+4.  Conclusions
+
+   Of the three fixes described in the previous section, fix (F1),
+   ignoring RST segments in TIME-WAIT state, seems like the best short-
+   term solution.  It is certainly the simplest.  It would be very
+   desirable to do an extended test of this change in a production
+   environment, to ensure there is no unexpected bad effect of ignoring
+   RSTs in TIME-WAIT state.
+
+   Fix (F2) is more complex and is at best a partial fix.  (F3), using
+   64-bit sequence numbers, would be a significant change in the
+   protocol, and its implications need to be thoroughly understood.
+   (F3) may turn out to be a long-term fix for the hazards discussed in
+   this note.
+
+APPENDIX: Using 64-bit Sequence Numbers
+
+   This appendix provides a justification of our statement that 64-bit
+   sequence numbers could prevent the TWA hazards.
+
+   The theoretical ISN calculation used by TCP is:
+
+       ISN = (R*T) mod 2**n.
+
+   where T is the real time in seconds (from an arbitrary origin, fixed
+   when the system is started), R is a constant, currently 250 KBps, and
+   n = 32 is the size of the sequence number field.
+
+   The limitations of current TCP are established by n, R, and the
+   maximum segment lifetime MSL = 4 minutes.  The shortest time Twrap to
+   wrap the sequence space is:
+
+       Twrap = (2**n)/r
+
+   where r is the maximum transfer rate.  To avoid old duplicate
+   segments in the same connection, we require that Twrap > MSL (in
+   practice, we need Twrap >> MSL).
+
+
+
+
+
+
+
+Braden                                                          [Page 9]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+   The clock-driven ISN numbers wrap in time TwrapISN:
+
+       TwrapISN = (2**n)/R
+
+   For current TCP, TwrapISN = 4.55 hours.
+
+   The cases for old duplicates from previous connections can be divided
+   into four regions along two dimensions:
+
+   *    Slow vs. fast connections, corresponding to r < R or r >= R.
+
+   *    Short vs. long connections, corresponding to duration E <
+        TwrapISN or E >= TwrapISN.
+
+   On short slow connections, the clock-driven ISN selection rejects old
+   duplicates.  For all other cases, the TIME-WAIT delay of 2*MSL is
+   required so old duplicates can expire before they infect a new
+   incarnation.  This is discussed in detail in the Appendix to [RFC-
+   1185].
+
+   With this background, we can consider the effect of increasing n to
+   64.  We would like to increase both R and TwrapISN far enough that
+   all connections will be short and slow, i.e., so that the clock-
+   driven ISN selection will reject all old duplicates.  Put another
+   way, we want to every connection to have a unique chunk of the
+   seqence space.  For this purpose, we need R larger than the maximum
+   foreseeable rate r, and TwrapISN greater than the longest foreseeable
+   connection duration E.
+
+   In fact, this appears feasible with n = 64 bits.  Suppose that we use
+   R = 2**33 Bps; this is approximately 8 gigabytes per second, a
+   reasonable upper limit on throughput of a single TCP connection.
+   Then TwrapISN = 68 years, a reasonable upper limit on TCP connection
+   duration.  Note that this particular choice of R corresponds to
+   incrementing the ISN by 2**32 every 0.5 seconds, as would happen with
+   the Berkeley BSD implementation of TCP.  Then the low-order 32 bits
+   of a 64-bit ISN would always be exactly zero.
+
+   REFERENCES
+
+      [RFC-793]  Postel, J., "Transmission Control Protocol", RFC-793,
+      USC/Information Sciences Institute, September 1981.
+
+      [RFC-1185]  Jacobson, V., Braden, R., and Zhang, L., "TCP
+      Extension for High-Speed Paths", RFC-1185, Lawrence Berkeley Labs,
+      USC/Information Sciences Institute, and Xerox Palo Alto Research
+      Center, October 1990.
+
+
+
+
+Braden                                                         [Page 10]
+
+RFC 1337                 TCP TIME-WAIT Hazards                  May 1992
+
+
+      [RFC-1263]  O'Malley, S. and L. Peterson, "TCP Extensions
+      Considered Harmful", RFC-1263, University of Arizona, October
+      1991.
+
+      [RFC-1323]  Jacobson, V., Braden, R. and D. Borman "TCP Extensions
+      for High Performance", RFC-1323, Lawrence Berkeley Labs,
+      USC/Information Sciences Institute, and Cray Research, May 1992.
+
+Security Considerations
+
+   Security issues are not discussed in this memo.
+
+Author's Address:
+
+   Bob Braden
+   University of Southern California
+   Information Sciences Institute
+   4676 Admiralty Way
+   Marina del Rey, CA 90292
+
+   Phone: (213) 822-1511
+   EMail: Braden@ISI.EDU
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Braden                                                         [Page 11]
+
--- a/kernel/picotcp/RFC/rfc1350.txt
+++ b/kernel/picotcp/RFC/rfc1350.txt
@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group                                         K. Sollins
+Request For Comments: 1350                                           MIT
+STD: 33                                                        July 1992
+Obsoletes: RFC 783
+
+
+                     THE TFTP PROTOCOL (REVISION 2)
+
+Status of this Memo
+
+   This RFC specifies an IAB standards track protocol for the Internet
+   community, and requests discussion and suggestions for improvements.
+   Please refer to the current edition of the "IAB Official Protocol
+   Standards" for the standardization state and status of this protocol.
+   Distribution of this memo is unlimited.
+
+Summary
+
+   TFTP is a very simple protocol used to transfer files.  It is from
+   this that its name comes, Trivial File Transfer Protocol or TFTP.
+   Each nonterminal packet is acknowledged separately.  This document
+   describes the protocol and its types of packets.  The document also
+   explains the reasons behind some of the design decisions.
+
+Acknowlegements
+
+   The protocol was originally designed by Noel Chiappa, and was
+   redesigned by him, Bob Baldwin and Dave Clark, with comments from
+   Steve Szymanski.  The current revision of the document includes
+   modifications stemming from discussions with and suggestions from
+   Larry Allen, Noel Chiappa, Dave Clark, Geoff Cooper, Mike Greenwald,
+   Liza Martin, David Reed, Craig Milo Rogers (of USC-ISI), Kathy
+   Yellick, and the author.  The acknowledgement and retransmission
+   scheme was inspired by TCP, and the error mechanism was suggested by
+   PARC's EFTP abort message.
+
+   The May, 1992 revision to fix the "Sorcerer's Apprentice" protocol
+   bug [4] and other minor document problems was done by Noel Chiappa.
+
+   This research was supported by the Advanced Research Projects Agency
+   of the Department of Defense and was monitored by the Office of Naval
+   Research under contract number N00014-75-C-0661.
+
+1. Purpose
+
+   TFTP is a simple protocol to transfer files, and therefore was named
+   the Trivial File Transfer Protocol or TFTP.  It has been implemented
+   on top of the Internet User Datagram protocol (UDP or Datagram) [2]
+
+
+
+Sollins                                                         [Page 1]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+   so it may be used to move files between machines on different
+   networks implementing UDP.  (This should not exclude the possibility
+   of implementing TFTP on top of other datagram protocols.)  It is
+   designed to be small and easy to implement.  Therefore, it lacks most
+   of the features of a regular FTP.  The only thing it can do is read
+   and write files (or mail) from/to a remote server.  It cannot list
+   directories, and currently has no provisions for user authentication.
+   In common with other Internet protocols, it passes 8 bit bytes of
+   data.
+
+   Three modes of transfer are currently supported: netascii (This is
+   ascii as defined in "USA Standard Code for Information Interchange"
+   [1] with the modifications specified in "Telnet Protocol
+   Specification" [3].)  Note that it is 8 bit ascii.  The term
+   "netascii" will be used throughout this document to mean this
+   particular version of ascii.); octet (This replaces the "binary" mode
+   of previous versions of this document.) raw 8 bit bytes; mail,
+   netascii characters sent to a user rather than a file.  (The mail
+   mode is obsolete and should not be implemented or used.)  Additional
+   modes can be defined by pairs of cooperating hosts.
+
+   Reference [4] (section 4.2) should be consulted for further valuable
+   directives and suggestions on TFTP.
+
+2. Overview of the Protocol
+
+   Any transfer begins with a request to read or write a file, which
+   also serves to request a connection.  If the server grants the
+   request, the connection is opened and the file is sent in fixed
+   length blocks of 512 bytes.  Each data packet contains one block of
+   data, and must be acknowledged by an acknowledgment packet before the
+   next packet can be sent.  A data packet of less than 512 bytes
+   signals termination of a transfer.  If a packet gets lost in the
+   network, the intended recipient will timeout and may retransmit his
+   last packet (which may be data or an acknowledgment), thus causing
+   the sender of the lost packet to retransmit that lost packet.  The
+   sender has to keep just one packet on hand for retransmission, since
+   the lock step acknowledgment guarantees that all older packets have
+   been received.  Notice that both machines involved in a transfer are
+   considered senders and receivers.  One sends data and receives
+   acknowledgments, the other sends acknowledgments and receives data.
+
+   Most errors cause termination of the connection.  An error is
+   signalled by sending an error packet.  This packet is not
+   acknowledged, and not retransmitted (i.e., a TFTP server or user may
+   terminate after sending an error message), so the other end of the
+   connection may not get it.  Therefore timeouts are used to detect
+   such a termination when the error packet has been lost.  Errors are
+
+
+
+Sollins                                                         [Page 2]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+   caused by three types of events: not being able to satisfy the
+   request (e.g., file not found, access violation, or no such user),
+   receiving a packet which cannot be explained by a delay or
+   duplication in the network (e.g., an incorrectly formed packet), and
+   losing access to a necessary resource (e.g., disk full or access
+   denied during a transfer).
+
+   TFTP recognizes only one error condition that does not cause
+   termination, the source port of a received packet being incorrect.
+   In this case, an error packet is sent to the originating host.
+
+   This protocol is very restrictive, in order to simplify
+   implementation.  For example, the fixed length blocks make allocation
+   straight forward, and the lock step acknowledgement provides flow
+   control and eliminates the need to reorder incoming data packets.
+
+3. Relation to other Protocols
+
+   As mentioned TFTP is designed to be implemented on top of the
+   Datagram protocol (UDP).  Since Datagram is implemented on the
+   Internet protocol, packets will have an Internet header, a Datagram
+   header, and a TFTP header.  Additionally, the packets may have a
+   header (LNI, ARPA header, etc.)  to allow them through the local
+   transport medium.  As shown in Figure 3-1, the order of the contents
+   of a packet will be: local medium header, if used, Internet header,
+   Datagram header, TFTP header, followed by the remainder of the TFTP
+   packet.  (This may or may not be data depending on the type of packet
+   as specified in the TFTP header.)  TFTP does not specify any of the
+   values in the Internet header.  On the other hand, the source and
+   destination port fields of the Datagram header (its format is given
+   in the appendix) are used by TFTP and the length field reflects the
+   size of the TFTP packet.  The transfer identifiers (TID's) used by
+   TFTP are passed to the Datagram layer to be used as ports; therefore
+   they must be between 0 and 65,535.  The initialization of TID's is
+   discussed in the section on initial connection protocol.
+
+   The  TFTP header consists of a 2 byte opcode field which indicates
+   the packet's type (e.g., DATA, ERROR, etc.)  These opcodes and  the
+   formats of  the various types of packets are discussed further in the
+   section on TFTP packets.
+
+
+
+
+
+
+
+
+
+
+
+Sollins                                                         [Page 3]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+          ---------------------------------------------------
+         |  Local Medium  |  Internet  |  Datagram  |  TFTP  |
+          ---------------------------------------------------
+
+                      Figure 3-1: Order of Headers
+
+
+4. Initial Connection Protocol
+
+   A transfer is established by sending a request (WRQ to write onto a
+   foreign file system, or RRQ to read from it), and receiving a
+   positive reply, an acknowledgment packet for write, or the first data
+   packet for read.  In general an acknowledgment packet will contain
+   the block number of the data packet being acknowledged.  Each data
+   packet has associated with it a block number; block numbers are
+   consecutive and begin with one.  Since the positive response to a
+   write request is an acknowledgment packet, in this special case the
+   block number will be zero.  (Normally, since an acknowledgment packet
+   is acknowledging a data packet, the acknowledgment packet will
+   contain the block number of the data packet being acknowledged.)  If
+   the reply is an error packet, then the request has been denied.
+
+   In order to create a connection, each end of the connection chooses a
+   TID for itself, to be used for the duration of that connection.  The
+   TID's chosen for a connection should be randomly chosen, so that the
+   probability that the same number is chosen twice in immediate
+   succession is very low.  Every packet has associated with it the two
+   TID's of the ends of the connection, the source TID and the
+   destination TID.  These TID's are handed to the supporting UDP (or
+   other datagram protocol) as the source and destination ports.  A
+   requesting host chooses its source TID as described above, and sends
+   its initial request to the known TID 69 decimal (105 octal) on the
+   serving host.  The response to the request, under normal operation,
+   uses a TID chosen by the server as its source TID and the TID chosen
+   for the previous message by the requestor as its destination TID.
+   The two chosen TID's are then used for the remainder of the transfer.
+
+   As an example, the following shows the steps used to establish a
+   connection to write a file.  Note that WRQ, ACK, and DATA are the
+   names of the write request, acknowledgment, and data types of packets
+   respectively.  The appendix contains a similar example for reading a
+   file.
+
+
+
+
+
+
+
+
+
+Sollins                                                         [Page 4]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+      1. Host A sends  a  "WRQ"  to  host  B  with  source=  A's  TID,
+         destination= 69.
+
+      2. Host  B  sends  a "ACK" (with block number= 0) to host A with
+         source= B's TID, destination= A's TID.
+
+   At this point the connection has been established and the first data
+   packet can be sent by Host A with a sequence number of 1.  In the
+   next step, and in all succeeding steps, the hosts should make sure
+   that the source TID matches the value that was agreed on in steps 1
+   and 2.  If a source TID does not match, the packet should be
+   discarded as erroneously sent from somewhere else.  An error packet
+   should be sent to the source of the incorrect packet, while not
+   disturbing the transfer.  This can be done only if the TFTP in fact
+   receives a packet with an incorrect TID.  If the supporting protocols
+   do not allow it, this particular error condition will not arise.
+
+   The following example demonstrates a correct operation of the
+   protocol in which the above situation can occur.  Host A sends a
+   request to host B. Somewhere in the network, the request packet is
+   duplicated, and as a result two acknowledgments are returned to host
+   A, with different TID's chosen on host B in response to the two
+   requests.  When the first response arrives, host A continues the
+   connection.  When the second response to the request arrives, it
+   should be rejected, but there is no reason to terminate the first
+   connection.  Therefore, if different TID's are chosen for the two
+   connections on host B and host A checks the source TID's of the
+   messages it receives, the first connection can be maintained while
+   the second is rejected by returning an error packet.
+
+5. TFTP Packets
+
+   TFTP supports five types of packets, all of which have been mentioned
+   above:
+
+          opcode  operation
+            1     Read request (RRQ)
+            2     Write request (WRQ)
+            3     Data (DATA)
+            4     Acknowledgment (ACK)
+            5     Error (ERROR)
+
+   The TFTP header of a packet contains the  opcode  associated  with
+   that packet.
+
+
+
+
+
+
+
+Sollins                                                         [Page 5]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+            2 bytes     string    1 byte     string   1 byte
+            ------------------------------------------------
+           | Opcode |  Filename  |   0  |    Mode    |   0  |
+            ------------------------------------------------
+
+                       Figure 5-1: RRQ/WRQ packet
+
+
+   RRQ and WRQ packets (opcodes 1 and 2 respectively) have the format
+   shown in Figure 5-1.  The file name is a sequence of bytes in
+   netascii terminated by a zero byte.  The mode field contains the
+   string "netascii", "octet", or "mail" (or any combination of upper
+   and lower case, such as "NETASCII", NetAscii", etc.) in netascii
+   indicating the three modes defined in the protocol.  A host which
+   receives netascii mode data must translate the data to its own
+   format.  Octet mode is used to transfer a file that is in the 8-bit
+   format of the machine from which the file is being transferred.  It
+   is assumed that each type of machine has a single 8-bit format that
+   is more common, and that that format is chosen.  For example, on a
+   DEC-20, a 36 bit machine, this is four 8-bit bytes to a word with
+   four bits of breakage.  If a host receives a octet file and then
+   returns it, the returned file must be identical to the original.
+   Mail mode uses the name of a mail recipient in place of a file and
+   must begin with a WRQ.  Otherwise it is identical to netascii mode.
+   The mail recipient string should be of the form "username" or
+   "username@hostname".  If the second form is used, it allows the
+   option of mail forwarding by a relay computer.
+
+   The discussion above assumes that both the sender and recipient are
+   operating in the same mode, but there is no reason that this has to
+   be the case.  For example, one might build a storage server.  There
+   is no reason that such a machine needs to translate netascii into its
+   own form of text.  Rather, the sender might send files in netascii,
+   but the storage server might simply store them without translation in
+   8-bit format.  Another such situation is a problem that currently
+   exists on DEC-20 systems.  Neither netascii nor octet accesses all
+   the bits in a word.  One might create a special mode for such a
+   machine which read all the bits in a word, but in which the receiver
+   stored the information in 8-bit format.  When such a file is
+   retrieved from the storage site, it must be restored to its original
+   form to be useful, so the reverse mode must also be implemented.  The
+   user site will have to remember some information to achieve this.  In
+   both of these examples, the request packets would specify octet mode
+   to the foreign host, but the local host would be in some other mode.
+   No such machine or application specific modes have been specified in
+   TFTP, but one would be compatible with this specification.
+
+   It is also possible to define other modes for cooperating pairs of
+
+
+
+Sollins                                                         [Page 6]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+   hosts, although this must be done with care.  There is no requirement
+   that any other hosts implement these.  There is no central authority
+   that will define these modes or assign them names.
+
+
+                   2 bytes     2 bytes      n bytes
+                   ----------------------------------
+                  | Opcode |   Block #  |   Data     |
+                   ----------------------------------
+
+                        Figure 5-2: DATA packet
+
+
+   Data is actually transferred in DATA packets depicted in Figure 5-2.
+   DATA packets (opcode = 3) have a block number and data field.  The
+   block numbers on data packets begin with one and increase by one for
+   each new block of data.  This restriction allows the program to use a
+   single number to discriminate between new packets and duplicates.
+   The data field is from zero to 512 bytes long.  If it is 512 bytes
+   long, the block is not the last block of data; if it is from zero to
+   511 bytes long, it signals the end of the transfer.  (See the section
+   on Normal Termination for details.)
+
+   All  packets other than duplicate ACK's and those used for
+   termination are acknowledged unless a timeout occurs [4].  Sending a
+   DATA packet is an acknowledgment for the first ACK packet of the
+   previous DATA packet. The WRQ and DATA packets are acknowledged by
+   ACK or ERROR packets, while RRQ
+
+
+                         2 bytes     2 bytes
+                         ---------------------
+                        | Opcode |   Block #  |
+                         ---------------------
+
+                         Figure 5-3: ACK packet
+
+
+   and ACK packets are acknowledged by  DATA  or ERROR packets.  Figure
+   5-3 depicts an ACK packet; the opcode is 4.  The  block  number  in
+   an  ACK echoes the block number of the DATA packet being
+   acknowledged.  A WRQ is acknowledged with an ACK packet having a
+   block number of zero.
+
+
+
+
+
+
+
+
+Sollins                                                         [Page 7]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+               2 bytes     2 bytes      string    1 byte
+               -----------------------------------------
+              | Opcode |  ErrorCode |   ErrMsg   |   0  |
+               -----------------------------------------
+
+                        Figure 5-4: ERROR packet
+
+
+   An ERROR packet (opcode 5) takes the form depicted in Figure 5-4.  An
+   ERROR packet can be the acknowledgment of any other type of packet.
+   The error code is an integer indicating the nature of the error.  A
+   table of values and meanings is given in the appendix.  (Note that
+   several error codes have been added to this version of this
+   document.) The error message is intended for human consumption, and
+   should be in netascii.  Like all other strings, it is terminated with
+   a zero byte.
+
+6. Normal Termination
+
+   The end of a transfer is marked by a DATA packet that contains
+   between 0 and 511 bytes of data (i.e., Datagram length < 516).  This
+   packet is acknowledged by an ACK packet like all other DATA packets.
+   The host acknowledging the final DATA packet may terminate its side
+   of the connection on sending the final ACK.  On the other hand,
+   dallying is encouraged.  This means that the host sending the final
+   ACK will wait for a while before terminating in order to retransmit
+   the final ACK if it has been lost.  The acknowledger will know that
+   the ACK has been lost if it receives the final DATA packet again.
+   The host sending the last DATA must retransmit it until the packet is
+   acknowledged or the sending host times out.  If the response is an
+   ACK, the transmission was completed successfully.  If the sender of
+   the data times out and is not prepared to retransmit any more, the
+   transfer may still have been completed successfully, after which the
+   acknowledger or network may have experienced a problem.  It is also
+   possible in this case that the transfer was unsuccessful.  In any
+   case, the connection has been closed.
+
+7. Premature Termination
+
+   If a request can not be granted, or some error occurs during the
+   transfer, then an ERROR packet (opcode 5) is sent.  This is only a
+   courtesy since it will not be retransmitted or acknowledged, so it
+   may never be received.  Timeouts must also be used to detect errors.
+
+
+
+
+
+
+
+
+Sollins                                                         [Page 8]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+I. Appendix
+
+Order of Headers
+
+                                                  2 bytes
+    ----------------------------------------------------------
+   |  Local Medium  |  Internet  |  Datagram  |  TFTP Opcode  |
+    ----------------------------------------------------------
+
+TFTP Formats
+
+   Type   Op #     Format without header
+
+          2 bytes    string   1 byte     string   1 byte
+          -----------------------------------------------
+   RRQ/  | 01/02 |  Filename  |   0  |    Mode    |   0  |
+   WRQ    -----------------------------------------------
+          2 bytes    2 bytes       n bytes
+          ---------------------------------
+   DATA  | 03    |   Block #  |    Data    |
+          ---------------------------------
+          2 bytes    2 bytes
+          -------------------
+   ACK   | 04    |   Block #  |
+          --------------------
+          2 bytes  2 bytes        string    1 byte
+          ----------------------------------------
+   ERROR | 05    |  ErrorCode |   ErrMsg   |   0  |
+          ----------------------------------------
+
+Initial Connection Protocol for reading a file
+
+   1. Host  A  sends  a  "RRQ"  to  host  B  with  source= A's TID,
+      destination= 69.
+
+   2. Host B sends a "DATA" (with block number= 1) to host  A  with
+      source= B's TID, destination= A's TID.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sollins                                                         [Page 9]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+Error Codes
+
+   Value     Meaning
+
+   0         Not defined, see error message (if any).
+   1         File not found.
+   2         Access violation.
+   3         Disk full or allocation exceeded.
+   4         Illegal TFTP operation.
+   5         Unknown transfer ID.
+   6         File already exists.
+   7         No such user.
+
+Internet User Datagram Header [2]
+
+   (This has been included only for convenience.  TFTP need not be
+   implemented on top of the Internet User Datagram Protocol.)
+
+     Format
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |          Source Port          |       Destination Port        |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |            Length             |           Checksum            |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+   Values of Fields
+
+
+   Source Port     Picked by originator of packet.
+
+   Dest. Port      Picked by destination machine (69 for RRQ or WRQ).
+
+   Length          Number of bytes in UDP packet, including UDP header.
+
+   Checksum        Reference 2 describes rules for computing checksum.
+                   (The implementor of this should be sure that the
+                   correct algorithm is used here.)
+                   Field contains zero if unused.
+
+   Note: TFTP passes transfer identifiers (TID's) to the Internet User
+   Datagram protocol to be used as the source and destination ports.
+
+
+
+
+
+
+Sollins                                                        [Page 10]
+
+RFC 1350                    TFTP Revision 2                    July 1992
+
+
+References
+
+   [1]  USA Standard Code for Information Interchange, USASI X3.4-1968.
+
+   [2]  Postel, J., "User Datagram  Protocol," RFC 768, USC/Information
+        Sciences Institute, 28 August 1980.
+
+   [3]  Postel, J., "Telnet Protocol Specification," RFC 764,
+        USC/Information Sciences Institute, June, 1980.
+
+   [4]  Braden, R., Editor, "Requirements for Internet Hosts --
+        Application and Support", RFC 1123, USC/Information Sciences
+        Institute, October 1989.
+
+Security Considerations
+
+   Since TFTP includes no login or access control mechanisms, care must
+   be taken in the rights granted to a TFTP server process so as not to
+   violate the security of the server hosts file system.  TFTP is often
+   installed with controls such that only files that have public read
+   access are available via TFTP and writing files via TFTP is
+   disallowed.
+
+Author's Address
+
+   Karen R. Sollins
+   Massachusetts Institute of Technology
+   Laboratory for Computer Science
+   545 Technology Square
+   Cambridge, MA 02139-1986
+
+   Phone: (617) 253-6006
+
+   EMail: SOLLINS@LCS.MIT.EDU
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sollins                                                        [Page 11]
+
--- a/kernel/picotcp/RFC/rfc1379.txt
+++ b/kernel/picotcp/RFC/rfc1379.txt
--- a/kernel/picotcp/RFC/rfc1470.txt
+++ b/kernel/picotcp/RFC/rfc1470.txt
--- a/kernel/picotcp/RFC/rfc1624.txt
+++ b/kernel/picotcp/RFC/rfc1624.txt
@ -0,0 +1,339 @@
+
+
+
+
+
+
+Network Working Group                             A. Rijsinghani, Editor
+Request for Comments: 1624                 Digital Equipment Corporation
+Updates: 1141                                                   May 1994
+Category: Informational
+
+
+                  Computation of the Internet Checksum
+                         via Incremental Update
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  This memo
+   does not specify an Internet standard of any kind.  Distribution of
+   this memo is unlimited.
+
+Abstract
+
+   This memo describes an updated technique for incremental computation
+   of the standard Internet checksum.  It updates the method described
+   in RFC 1141.
+
+Table of Contents
+
+   1. Introduction ..........................................  1
+   2. Notation and Equations ................................  2
+   3. Discussion ............................................  2
+   4. Examples ..............................................  3
+   5. Checksum verification by end systems ..................  4
+   6. Historical Note .......................................  4
+   7. Acknowledgments .......................................  5
+   8. Security Considerations ...............................  5
+   9. Conclusions ...........................................  5
+   10. Author's Address .....................................  5
+   11. References ...........................................  6
+
+1.  Introduction
+
+   Incremental checksum update is useful in speeding up several
+   types of operations routinely performed on IP packets, such as
+   TTL update, IP fragmentation, and source route update.
+
+   RFC 1071, on pages 4 and 5, describes a procedure to
+   incrementally update the standard Internet checksum.  The
+   relevant discussion, though comprehensive, was not complete.
+   Therefore, RFC 1141 was published to replace this description
+   on Incremental Update.  In particular, RFC 1141 provides a
+   more detailed exposure to the procedure described in RFC 1071.
+   However, it computes a result for certain cases that differs
+
+
+
+Rijsinghani                                                     [Page 1]
+
+RFC 1624             Incremental Internet Checksum              May 1994
+
+
+   from the one obtained from scratch (one's complement of one's
+   complement sum of the original fields).
+
+   For the sake of completeness, this memo briefly highlights key
+   points from RFCs 1071 and 1141.  Based on these discussions,
+   an updated procedure to incrementally compute the standard
+   Internet checksum is developed and presented.
+
+2.  Notation and Equations
+
+   Given the following notation:
+
+          HC  - old checksum in header
+          C   - one's complement sum of old header
+          HC' - new checksum in header
+          C'  - one's complement sum of new header
+          m   - old value of a 16-bit field
+          m'  - new value of a 16-bit field
+
+          RFC 1071 states that C' is:
+
+          C' = C + (-m) + m'    --    [Eqn. 1]
+             = C + (m' - m)
+
+   As RFC 1141 points out, the equation above is not useful for direct
+   use in incremental updates since C and C' do not refer to the actual
+   checksum stored in the header.  In addition, it is pointed out that
+   RFC 1071 did not specify that all arithmetic must be performed using
+   one's complement arithmetic.
+
+   Finally, complementing the above equation to get the actual checksum,
+   RFC 1141 presents the following:
+
+          HC' = ~(C + (-m) + m')
+              = HC + (m - m')
+              = HC + m + ~m'    --    [Eqn. 2]
+
+3.  Discussion
+
+   Although this equation appears to work, there are boundary conditions
+   under which it produces a result which differs from the one obtained
+   by checksum computation from scratch.  This is due to the way zero is
+   handled in one's complement arithmetic.
+
+   In one's complement, there are two representations of zero: the all
+   zero and the all one bit values, often referred to as +0 and -0.
+   One's complement addition of non-zero inputs can produce -0 as a
+   result, but never +0.  Since there is guaranteed to be at least one
+
+
+
+Rijsinghani                                                     [Page 2]
+
+RFC 1624             Incremental Internet Checksum              May 1994
+
+
+   non-zero field in the IP header, and the checksum field in the
+   protocol header is the complement of the sum, the checksum field can
+   never contain ~(+0), which is -0 (0xFFFF).  It can, however, contain
+   ~(-0), which is +0 (0x0000).
+
+   RFC 1141 yields an updated header checksum of -0 when it should be
+   +0.  This is because it assumed that one's complement has a
+   distributive property, which does not hold when the result is 0 (see
+   derivation of [Eqn. 2]).
+
+   The problem is avoided by not assuming this property.  The correct
+   equation is given below:
+
+          HC' = ~(C + (-m) + m')    --    [Eqn. 3]
+              = ~(~HC + ~m + m')
+
+4.  Examples
+
+   Consider an IP packet header in which a 16-bit field m = 0x5555
+   changes to m' = 0x3285.  Also, the one's complement sum of all other
+   header octets is 0xCD7A.
+
+   Then the header checksum would be:
+
+          HC = ~(0xCD7A + 0x5555)
+             = ~0x22D0
+             =  0xDD2F
+
+   The new checksum via recomputation is:
+
+          HC' = ~(0xCD7A + 0x3285)
+              = ~0xFFFF
+              =  0x0000
+
+   Using [Eqn. 2], as specified in RFC 1141, the new checksum is
+   computed as:
+
+          HC' = HC + m + ~m'
+              =  0xDD2F + 0x5555 + ~0x3285
+              =  0xFFFF
+
+   which does not match that computed from scratch, and moreover can
+   never obtain for an IP header.
+
+
+
+
+
+
+
+
+Rijsinghani                                                     [Page 3]
+
+RFC 1624             Incremental Internet Checksum              May 1994
+
+
+   Applying [Eqn. 3] to the example above, we get the correct result:
+
+          HC' = ~(C + (-m) + m')
+              = ~(0x22D0 + ~0x5555 + 0x3285)
+              = ~0xFFFF
+              =  0x0000
+
+5.  Checksum verification by end systems
+
+   If an end system verifies the checksum by including the checksum
+   field itself in the one's complement sum and then comparing the
+   result against -0, as recommended by RFC 1071, it does not matter if
+   an intermediate system generated a -0 instead of +0 due to the RFC
+   1141 property described here.  In the example above:
+
+          0xCD7A + 0x3285 + 0xFFFF = 0xFFFF
+          0xCD7A + 0x3285 + 0x0000 = 0xFFFF
+
+   However, implementations exist which verify the checksum by computing
+   it and comparing against the header checksum field.
+
+   It is recommended that intermediate systems compute incremental
+   checksum using the method described in this document, and end systems
+   verify checksum as per the method described in RFC 1071.
+
+   The method in [Eqn. 3] is slightly more expensive than the one in RFC
+   1141.  If this is a concern, the two additional instructions can be
+   eliminated by subtracting complements with borrow [see Sec. 7].  This
+   would result in the following equation:
+
+          HC' = HC - ~m - m'    --    [Eqn. 4]
+
+          In the example shown above,
+
+          HC' = HC - ~m - m'
+              =  0xDD2F - ~0x5555 - 0x3285
+              =  0x0000
+
+6.  Historical Note
+
+   A historical aside: the fact that standard one's complement
+   arithmetic produces negative zero results is one of its main
+   drawbacks; it makes for difficulty in interpretation.  In the CDC
+   6000 series computers [4], this problem was avoided by using
+   subtraction as the primitive in one's complement arithmetic (i.e.,
+   addition is subtraction of the complement).
+
+
+
+
+
+Rijsinghani                                                     [Page 4]
+
+RFC 1624             Incremental Internet Checksum              May 1994
+
+
+7.  Acknowledgments
+
+   The contribution of the following individuals to the work that led to
+   this document is acknowledged:
+
+          Manu Kaycee - Ascom Timeplex, Incorporated
+          Paul Koning - Digital Equipment Corporation
+          Tracy Mallory - 3Com Corporation
+          Krishna Narayanaswamy - Digital Equipment Corporation
+          Atul Pandya - Digital Equipment Corporation
+
+   The failure condition was uncovered as a result of IP testing on a
+   product which implemented the RFC 1141 algorithm.  It was analyzed,
+   and the updated algorithm devised.  This algorithm was also verified
+   using simulation.  It was also shown that the failure condition
+   disappears if the checksum verification is done as per RFC 1071.
+
+8.  Security Considerations
+
+   Security issues are not discussed in this memo.
+
+9.  Conclusions
+
+   It is recommended that either [Eqn. 3] or [Eqn. 4] be the
+   implementation technique used for incremental update of the standard
+   Internet checksum.
+
+10.  Author's Address
+
+   Anil Rijsinghani
+   Digital Equipment Corporation
+   550 King St
+   Littleton, MA 01460
+
+   Phone: (508) 486-6786
+   EMail: anil@levers.enet.dec.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rijsinghani                                                     [Page 5]
+
+RFC 1624             Incremental Internet Checksum              May 1994
+
+
+11.  References
+
+   [1] Postel, J., "Internet Protocol - DARPA Internet Program Protocol
+       Specification", STD 5, RFC 791, DARPA, September 1981.
+
+   [2] Braden, R., Borman, D., and C. Partridge, "Computing the Internet
+       Checksum", RFC 1071, ISI, Cray Research, BBN Laboratories,
+       September 1988.
+
+   [3] Mallory, T., and A. Kullberg, "Incremental Updating of the
+       Internet Checksum", RFC 1141, BBN Communications, January 1990.
+
+   [4] Thornton, J., "Design of a Computer -- the Control
+       Data 6600", Scott, Foresman and Company, 1970.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rijsinghani                                                     [Page 6]
+
--- a/kernel/picotcp/RFC/rfc1644.txt
+++ b/kernel/picotcp/RFC/rfc1644.txt
--- a/kernel/picotcp/RFC/rfc1661.txt
+++ b/kernel/picotcp/RFC/rfc1661.txt
--- a/kernel/picotcp/RFC/rfc1662.txt
+++ b/kernel/picotcp/RFC/rfc1662.txt
--- a/kernel/picotcp/RFC/rfc1693.txt
+++ b/kernel/picotcp/RFC/rfc1693.txt
--- a/kernel/picotcp/RFC/rfc1877.txt
+++ b/kernel/picotcp/RFC/rfc1877.txt
@ -0,0 +1,339 @@
+
+
+
+
+
+
+Network Working Group                                            S. Cobb
+Request for Comments: 1877                                     Microsoft
+Category: Informational                                    December 1995
+
+
+         PPP Internet Protocol Control Protocol Extensions for
+                         Name Server Addresses
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  This memo
+   does not specify an Internet standard of any kind.  Distribution of
+   this memo is unlimited.
+
+Abstract
+
+   The Point-to-Point Protocol (PPP) [1] provides a standard method for
+   transporting multi-protocol datagrams over point-to-point links.  PPP
+   defines an extensible Link Control Protocol and a family of Network
+   Control Protocols (NCPs) for establishing and configuring different
+   network-layer protocols.
+
+   This document extends the NCP for establishing and configuring the
+   Internet Protocol over PPP [2], defining the negotiation of primary
+   and secondary Domain Name System (DNS) [3] and NetBIOS Name Server
+   (NBNS) [4] addresses.
+
+Table of Contents
+
+     1.     Additional IPCP Configuration options .................    1
+        1.1         Primary DNS Server Address ....................    2
+        1.2         Primary NBNS Server Address ...................    3
+        1.3         Secondary DNS Server Address ..................    4
+        1.4         Secondary NBNS Server Address .................    5
+     REFRENCES ....................................................    6
+     SECURITY CONSIDERATIONS ......................................    6
+     CHAIR'S ADDRESS ..............................................    6
+     AUTHOR'S ADDRESS .............................................    6
+
+1.  Additional IPCP Configuration Options
+
+   The four name server address configuration options, 129 to 132,
+   provide a method of obtaining the addresses of Domain Name System
+   (DNS) servers and (NetBIOS Name Server (NBNS) nodes on the remote
+   network.
+
+
+
+
+
+
+Cobb                         Informational                      [Page 1]
+
+RFC 1877                  PPP IPCP Extensions              December 1995
+
+
+   Primary and secondary addresses are negotiated independently.  They
+   serve identical purposes, except that when both are present an
+   attempt SHOULD be made to resolve names using the primary address
+   before using the secondary address.
+
+   For implementational convenience, these options are designed to be
+   identical in format and behavior to option 3 (IP-Address) which is
+   already present in most IPCP implementations.
+
+   Since the usefulness of name server address information is dependent
+   on the topology of the remote network and local peer's application,
+   it is suggested that these options not be included in the list of
+   "IPCP Recommended Options".
+
+1.1.  Primary DNS Server Address
+
+   Description
+
+      This Configuration Option defines a method for negotiating with
+      the remote peer the address of the primary DNS server to be used
+      on the local end of the link.  If local peer requests an invalid
+      server address (which it will typically do intentionally) the
+      remote peer specifies the address by NAKing this option, and
+      returning the IP address of a valid DNS server.
+
+      By default, no primary DNS address is provided.
+
+   A summary of the Primary DNS Address Configuration Option format is
+   shown below.  The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |      Primary-DNS-Address
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      Primary-DNS-Address (cont)   |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Type
+
+      129
+
+   Length
+
+      6
+
+
+
+
+
+
+Cobb                         Informational                      [Page 2]
+
+RFC 1877                  PPP IPCP Extensions              December 1995
+
+
+   Primary-DNS-Address
+
+      The four octet Primary-DNS-Address is the address of the primary
+      DNS server to be used by the local peer.  If all four octets are
+      set to zero, it indicates an explicit request that the peer
+      provide the address information in a Config-Nak packet.
+
+   Default
+
+      No address is provided.
+
+1.2.  Primary NBNS Server Address
+
+   Description
+
+      This Configuration Option defines a method for negotiating with
+      the remote peer the address of the primary NBNS server to be used
+      on the local end of the link.  If local peer requests an invalid
+      server address (which it will typically do intentionally) the
+      remote peer specifies the address by NAKing this option, and
+      returning the IP address of a valid NBNS server.
+
+      By default, no primary NBNS address is provided.
+
+   A summary of the Primary NBNS Address Configuration Option format is
+   shown below.  The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |      Primary-NBNS-Address
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      Primary-NBNS-Address (cont)  |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Type
+
+      130
+
+   Length
+
+      6
+
+   Primary-NBNS-Address
+
+      The four octet Primary-NBNS-Address is the address of the primary
+      NBNS server to be used by the local peer.  If all four octets are
+      set to zero, it indicates an explicit request that the peer
+
+
+
+Cobb                         Informational                      [Page 3]
+
+RFC 1877                  PPP IPCP Extensions              December 1995
+
+
+      provide the address information in a Config-Nak packet.
+
+   Default
+
+      No address is provided.
+
+1.3.  Secondary DNS Server Address
+
+   Description
+
+      This Configuration Option defines a method for negotiating with
+      the remote peer the address of the secondary DNS server to be used
+      on the local end of the link.  If local peer requests an invalid
+      server address (which it will typically do intentionally) the
+      remote peer specifies the address by NAKing this option, and
+      returning the IP address of a valid DNS server.
+
+      By default, no secondary DNS address is provided.
+
+   A summary of the Secondary DNS Address Configuration Option format is
+   shown below.  The fields are transmitted from left to right.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |      Secondary-DNS-Address
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      Secondary-DNS-Address (cont) |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Type
+
+      131
+
+   Length
+
+      6
+
+   Secondary-DNS-Address
+
+      The four octet Secondary-DNS-Address is the address of the primary
+      NBNS server to be used by the local peer.  If all four octets are
+      set to zero, it indicates an explicit request that the peer
+      provide the address information in a Config-Nak packet.
+
+   Default
+
+      No address is provided.
+
+
+
+Cobb                         Informational                      [Page 4]
+
+RFC 1877                  PPP IPCP Extensions              December 1995
+
+
+1.4.  Secondary NBNS Server Address
+
+   Description
+
+      This Configuration Option defines a method for negotiating with
+      the remote peer the address of the secondary NBNS server to be
+      used on the local end of the link.  If local peer requests an
+      invalid server address (which it will typically do intentionally)
+      the remote peer specifies the address by NAKing this option, and
+      returning the IP address of a valid NBNS server.
+
+      By default, no secondary NBNS address is provided.
+
+   A summary of the Secondary NBNS Address Configuration Option format
+   is shown below.  The fields are transmitted from left to right.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |     Type      |    Length     |      Secondary-NBNS-Address
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+        Secondary-NBNS-Address (cont) |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Type
+
+         132
+
+      Length
+
+         6
+
+      Secondary-NBNS-Address
+
+         The four octet Secondary-NBNS-Address is the address of the
+         secondary NBNS server to be used by the local peer.  If all
+         four octets are set to zero, it indicates an explicit request
+         that the peer provide the address information in a Config-Nak
+         packet.
+
+      Default
+
+         No address is provided.
+
+
+
+
+
+
+
+
+Cobb                         Informational                      [Page 5]
+
+RFC 1877                  PPP IPCP Extensions              December 1995
+
+
+References
+
+   [1] Simpson, W., Editor, "The Point-to-Point Protocol (PPP)", STD 51,
+       RFC 1661, Daydreamer, July 1994.
+
+   [2] McGregor, G., "PPP Internet Control Protocol", RFC 1332, Merit,
+       May 1992.
+
+   [3] Auerbach, K., and A. Aggarwal, "Protocol Standard for a NetBIOS
+       Service on a TCP/UDP Transport", STD 19, RFCs 1001 and 1002,
+       March 1987.
+
+   [4] Mockapetris, P., "Domain Names - Concepts and Facilities", STD
+       13, RFC 1034, USC/Information Sciences Institute, November 1987.
+
+   [5] Mockapetris, P., "Domain Names - Implementation and
+       Specification", STD 13, RFC 1035, USC/Information Sciences
+       Institute, November 1987.
+
+Security Considerations
+
+   Security issues are not discussed in this memo.
+
+Chair's Address
+
+   The working group can be contacted via the current chair:
+
+      Fred Baker
+      Cisco Systems
+      519 Lado Drive
+      Santa Barbara, California  93111
+
+      EMail: fred@cisco.com
+
+Author's Address
+
+   Questions about this memo can also be directed to:
+
+      Steve Cobb
+      Microsoft Corporation
+      One Microsoft Way
+      Redmond, WA  98052-6399
+
+      Phone: (206) 882-8080
+
+      EMail: stevec@microsoft.com
+
+
+
+
+
+Cobb                         Informational                      [Page 6]
+
--- a/kernel/picotcp/RFC/rfc1936.txt
+++ b/kernel/picotcp/RFC/rfc1936.txt
--- a/kernel/picotcp/RFC/rfc1948.txt
+++ b/kernel/picotcp/RFC/rfc1948.txt
@ -0,0 +1,339 @@
+
+
+
+
+
+
+Network Working Group                                        S. Bellovin
+Request for Comments: 1948                                 AT&T Research
+Category: Informational                                         May 1996
+
+
+               Defending Against Sequence Number Attacks
+
+Status of This Memo
+
+   This memo provides information for the Internet community.  This memo
+   does not specify an Internet standard of any kind.  Distribution of
+   this memo is unlimited.
+
+Abstract
+
+   IP spoofing attacks based on sequence number spoofing have become a
+   serious threat on the Internet (CERT Advisory CA-95:01).  While
+   ubiquitous crypgraphic authentication is the right answer, we propose
+   a simple modification to TCP implementations that should be a very
+   substantial block to the current wave of attacks.
+
+Overview and Rational
+
+   In 1985, Morris [1] described a form of attack based on guessing what
+   sequence numbers TCP [2] will use for new connections.  Briefly, the
+   attacker gags a host trusted by the target, impersonates the IP
+   address of the trusted host when talking to the target, and completes
+   the 3-way handshake based on its guess at the next initial sequence
+   number to be used.  An ordinary connection to the target is used to
+   gather sequence number state information.  This entire sequence,
+   coupled with address-based authentication, allows the attacker to
+   execute commands on the target host.
+
+   Clearly, the proper solution is cryptographic authentication [3,4].
+   But it will quite a long time before that is deployed.  It has
+   therefore been necessary for many sites to restrict use of protocols
+   that rely on address-based authentication, such as rlogin and rsh.
+   Unfortunately, the prevalence of "sniffer attacks" -- network
+   eavesdropping (CERT Advisory CA-94:01) -- has rendered ordinary
+   TELNET [5] very dangerous as well.  The Internet is thus left without
+   a safe, secure mechanism for remote login.
+
+   We propose a simple change to TCP implementations that will block
+   most sequence number guessing attacks.  More precisely, such attacks
+   will remain possible if and only if the Bad Guy already has the
+   ability to launch even more devastating attacks.
+
+
+
+
+
+Bellovin                     Informational                      [Page 1]
+
+RFC 1948                Sequence Number Attacks                 May 1996
+
+
+Details of the Attack
+
+   In order to understand the particular case of sequence number
+   guessing, one must look at the 3-way handshake used in the TCP open
+   sequence [2].  Suppose client machine A wants to talk to rsh server
+   B.  It sends the following message:
+
+           A->B: SYN, ISNa
+
+   That is, it sends a packet with the SYN ("synchronize sequence
+   number") bit set and an initial sequence number ISNa.
+
+   B replies with
+
+           B->A: SYN, ISNb, ACK(ISNa)
+
+   In addition to sending its own initial sequence number, it
+   acknowledges A's.  Note that the actual numeric value ISNa must
+   appear in the message.
+
+   A concludes the handshake by sending
+
+           A->B: ACK(ISNb)
+
+   The initial sequence numbers are intended to be more or less random.
+   More precisely, RFC 793 specifies that the 32-bit counter be
+   incremented by 1 in the low-order position about every 4
+   microseconds.  Instead, Berkeley-derived kernels increment it by a
+   constant every second, and by another constant for each new
+   connection.  Thus, if you open a connection to a machine, you know to
+   a very high degree of confidence what sequence number it will use for
+   its next connection.  And therein lies the attack.
+
+   The attacker X first opens a real connection to its target B -- say,
+   to the mail port or the TCP echo port.  This gives ISNb.  It then
+   impersonates A and sends
+
+        Ax->B: SYN, ISNx
+
+   where "Ax" denotes a packet sent by X pretending to be A.
+
+   B's response to X's original SYN (so to speak)
+
+        B->A: SYN, ISNb', ACK(ISNx)
+
+
+
+
+
+
+
+Bellovin                     Informational                      [Page 2]
+
+RFC 1948                Sequence Number Attacks                 May 1996
+
+
+   goes to the legitimate A, about which more anon.  X never sees that
+   message but can still send
+
+        Ax->B: ACK(ISNb')
+
+   using the predicted value for ISNb'.  If the guess is right -- and
+   usually it will be -- B's rsh server thinks it has a legitimate
+   connection with A, when in fact X is sending the packets.  X can't
+   see the output from this session, but it can execute commands as more
+   or less any user -- and in that case, the game is over and X has won.
+
+   There is a minor difficulty here.  If A sees B's message, it will
+   realize that B is acknowledging something it never sent, and will
+   send a RST packet in response to tear down the connection.  There are
+   a variety of ways to prevent this; the easiest is to wait until the
+   real A is down (possibly as a result of enemy action, of course).  In
+   actual practice, X can gag A by exploiting a very common
+   implementation bug; this is described below.
+
+The Fix
+
+   The choice of initial sequence numbers for a connection is not
+   random.  Rather, it must be chosen so as to minimize the probability
+   of old stale packets being accepted by new incarnations of the same
+   connection [6, Appendix A].  Furthermore, implementations of TCP
+   derived from 4.2BSD contain special code to deal with such
+   reincarnations when the server end of the original connection is
+   still in TIMEWAIT state [7, pp. 945].  Accordingly, simple
+   randomization, as suggested in [8], will not work well.
+
+   But duplicate packets, and hence the restrictions on the initial
+   sequence number for reincarnations, are peculiar to individual
+   connections.  That is, there is no connection, syntactic or semantic,
+   between the sequence numbers used for two different connections.  We
+   can prevent sequence number guessing attacks by giving each
+   connection -- that is, each 4-tuple of <localhost, localport,
+   remotehost, remoteport> -- a separate sequence number space.  Within
+   each space, the initial sequence number is incremented according to
+   [2]; however, there is no obvious relationship between the numbering
+   in different spaces.
+
+   The obvious way to do this is to maintain state for dead connections,
+   and the easiest way to do that is to change the TCP state transition
+   diagram so that both ends of all connections go to TIMEWAIT state.
+   That would work, but it's inelegant and consumes storage space.
+   Instead, we use the current 4 microsecond timer M and set
+
+           ISN = M + F(localhost, localport, remotehost, remoteport).
+
+
+
+Bellovin                     Informational                      [Page 3]
+
+RFC 1948                Sequence Number Attacks                 May 1996
+
+
+   It is vital that F not be computable from the outside, or an attacker
+   could still guess at sequence numbers from the initial sequence
+   number used for some other connection.  We therefore suggest that F
+   be a cryptographic hash function of the connection-id and some secret
+   data.  MD5 [9] is a good choice, since the code is widely available.
+   The secret data can either be a true random number [10], or it can be
+   the combination of some per-host secret and the boot time of the
+   machine.  The boot time is included to ensure that the secret is
+   changed on occasion.  Other data, such as the host's IP address and
+   name, may be included in the hash as well; this eases administration
+   by permitting a network of workstations to share the same secret data
+   while still giving them separate sequence number spaces.  Our
+   recommendation, in fact, is to use all three of these items: as
+   random a number as the hardware can generate, an administratively-
+   installed pass phrase, and the machine's IP address.  This allows for
+   local choice on how secure the secret is.
+
+   Note that the secret cannot easily be changed on a live machine.
+   Doing so would change the initial sequence numbers used for
+   reincarnated connections; to maintain safety, either dead connection
+   state must be kept or a quiet time observed for two maximum segment
+   lifetimes after such a change.
+
+A Common TCP Bug
+
+   As mentioned earlier, attackers using sequence number guessing have
+   to "gag" the trusted machine first.  While a number of strategies are
+   possible, most of the attacks detected thus far rely on an
+   implementation bug.
+
+   When SYN packets are received for a connection, the receiving system
+   creates a new TCB in SYN-RCVD state.  To avoid overconsumption of
+   resources, 4.2BSD-derived systems permit only a limited number of
+   TCBs in this state per connection.  Once this limit is reached,
+   future SYN packets for new connections are discarded; it is assumed
+   that the client will retransmit them as needed.
+
+   When a packet is received, the first thing that must be done is a
+   search for the TCB for that connection.  If no TCB is found, the
+   kernel searches for a "wild card" TCB used by servers to accept
+   connections from all clients.  Unfortunately, in many kernels this
+   code is invoked for any incoming packets, not just for initial SYN
+   packets.  If the SYN-RCVD queue is full for the wildcard TCB, any new
+   packets specifying just that host and port number will be discarded,
+   even if they aren't SYN packets.
+
+
+
+
+
+
+Bellovin                     Informational                      [Page 4]
+
+RFC 1948                Sequence Number Attacks                 May 1996
+
+
+   To gag a host, then, the attacker sends a few dozen SYN packets to
+   the rlogin port from different port numbers on some non-existent
+   machine.  This fills up the SYN-RCVD queue, while the SYN+ACK packets
+   go off to the bit bucket.  The attack on the target machine then
+   appears to come from the rlogin port on the trusted machine.  The
+   replies -- the SYN+ACKs from the target -- will be perceived as
+   packets belonging to a full queue, and will be dropped silently.
+   This could be avoided if the full queue code checked for the ACK bit,
+   which cannot legally be on for legitimate open requests.  If it is
+   on, RST should be sent in reply.
+
+Security Considerations
+
+   Good sequence numbers are not a replacement for cryptographic
+   authentication.  At best, they're a palliative measure.
+
+   An eavesdropper who can observe the initial messages for a connection
+   can determine its sequence number state, and may still be able to
+   launch sequence number guessing attacks by impersonating that
+   connection.  However, such an eavesdropper can also hijack existing
+   connections [11], so the incremental threat isn't that high.  Still,
+   since the offset between a fake connection and a given real
+   connection will be more or less constant for the lifetime of the
+   secret, it is important to ensure that attackers can never capture
+   such packets.  Typical attacks that could disclose them include both
+   eavesdropping and the variety of routing attacks discussed in [8].
+
+   If random numbers are used as the sole source of the secret, they
+   MUST be chosen in accordance with the recommendations given in [10].
+
+Acknowledgments
+
+   Matt Blaze and Jim Ellis contributed some crucial ideas to this RFC.
+   Frank Kastenholz contributed constructive comments to this memo.
+
+References
+
+   [1]  R.T. Morris, "A Weakness in the 4.2BSD UNIX TCP/IP Software",
+        CSTR 117, 1985, AT&T Bell Laboratories, Murray Hill, NJ.
+
+   [2]  Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
+        September 1981.
+
+   [3]  Kohl, J., and C. Neuman, "The Kerberos Network Authentication
+        Service (V5)", RFC 1510, September 1993.
+
+   [4]  Atkinson, R., "Security Architecture for the Internet
+        Protocol", RFC 1825, August 1995.
+
+
+
+Bellovin                     Informational                      [Page 5]
+
+RFC 1948                Sequence Number Attacks                 May 1996
+
+
+   [5]  Postel, J., and J. Reynolds, "Telnet Protocol Specification",
+        STD 8, RFC 854, May 1983.
+
+   [6]  Jacobson, V., Braden, R., and L. Zhang, "TCP Extension for
+        High-Speed Paths", RFC 1885, October 1990.
+
+   [7]  G.R. Wright, W. R. Stevens, "TCP/IP Illustrated, Volume 2",
+        1995.  Addison-Wesley.
+
+   [8]  S. Bellovin, "Security Problems in the TCP/IP Protocol Suite",
+        April 1989, Computer Communications Review, vol. 19, no. 2, pp.
+        32-48.
+
+   [9]  Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
+        April 1992.
+
+   [10] Eastlake, D., Crocker, S., and J. Schiller, "Randomness
+        Recommendations for Security", RFC 1750, December 1994.
+
+   [11] L. Joncheray, "A Simple Active Attack Against TCP, 1995, Proc.
+        Fifth Usenix UNIX Security Symposium.
+
+Author's Address
+
+   Steven M. Bellovin
+   AT&T Research
+   600 Mountain Avenue
+   Murray Hill, NJ  07974
+
+   Phone: (908) 582-5886
+   EMail: smb@research.att.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Bellovin                     Informational                      [Page 6]
+
--- a/kernel/picotcp/RFC/rfc1994.txt
+++ b/kernel/picotcp/RFC/rfc1994.txt
@ -0,0 +1,732 @@
+
+
+
+
+
+
+Network Working Group                                         W. Simpson
+Request for Comments: 1994                                    DayDreamer
+Obsoletes: 1334                                              August 1996
+Category: Standards Track
+
+
+         PPP Challenge Handshake Authentication Protocol (CHAP)
+
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Abstract
+
+   The Point-to-Point Protocol (PPP) [1] provides a standard method for
+   transporting multi-protocol datagrams over point-to-point links.
+
+   PPP also defines an extensible Link Control Protocol, which allows
+   negotiation of an Authentication Protocol for authenticating its peer
+   before allowing Network Layer protocols to transmit over the link.
+
+   This document defines a method for Authentication using PPP, which
+   uses a random Challenge, with a cryptographically hashed Response
+   which depends upon the Challenge and a secret key.
+
+Table of Contents
+
+     1.     Introduction ..........................................    1
+        1.1       Specification of Requirements ...................    1
+        1.2       Terminology .....................................    2
+     2.     Challenge-Handshake Authentication Protocol ...........    2
+        2.1       Advantages ......................................    3
+        2.2       Disadvantages ...................................    3
+        2.3       Design Requirements .............................    4
+     3.     Configuration Option Format ...........................    5
+     4.     Packet Format .........................................    6
+        4.1       Challenge and Response ..........................    7
+        4.2       Success and Failure .............................    9
+     SECURITY CONSIDERATIONS ......................................   10
+     ACKNOWLEDGEMENTS .............................................   11
+     REFERENCES ...................................................   12
+     CONTACTS .....................................................   12
+
+
+
+
+Simpson                                                         [Page i]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+1.  Introduction
+
+   In order to establish communications over a point-to-point link, each
+   end of the PPP link must first send LCP packets to configure the data
+   link during Link Establishment phase.  After the link has been
+   established, PPP provides for an optional Authentication phase before
+   proceeding to the Network-Layer Protocol phase.
+
+   By default, authentication is not mandatory.  If authentication of
+   the link is desired, an implementation MUST specify the
+   Authentication-Protocol Configuration Option during Link
+   Establishment phase.
+
+   These authentication protocols are intended for use primarily by
+   hosts and routers that connect to a PPP network server via switched
+   circuits or dial-up lines, but might be applied to dedicated links as
+   well.  The server can use the identification of the connecting host
+   or router in the selection of options for network layer negotiations.
+
+   This document defines a PPP authentication protocol.  The Link
+   Establishment and Authentication phases, and the Authentication-
+   Protocol Configuration Option, are defined in The Point-to-Point
+   Protocol (PPP) [1].
+
+
+1.1.  Specification of Requirements
+
+   In this document, several words are used to signify the requirements
+   of the specification.  These words are often capitalized.
+
+   MUST      This word, or the adjective "required", means that the
+             definition is an absolute requirement of the specification.
+
+   MUST NOT  This phrase means that the definition is an absolute
+             prohibition of the specification.
+
+   SHOULD    This word, or the adjective "recommended", means that there
+             may exist valid reasons in particular circumstances to
+             ignore this item, but the full implications must be
+             understood and carefully weighed before choosing a
+             different course.
+
+   MAY       This word, or the adjective "optional", means that this
+             item is one of an allowed set of alternatives.  An
+             implementation which does not include this option MUST be
+             prepared to interoperate with another implementation which
+             does include the option.
+
+
+
+
+Simpson                                                         [Page 1]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+1.2.  Terminology
+
+   This document frequently uses the following terms:
+
+   authenticator
+             The end of the link requiring the authentication.  The
+             authenticator specifies the authentication protocol to be
+             used in the Configure-Request during Link Establishment
+             phase.
+
+   peer      The other end of the point-to-point link; the end which is
+             being authenticated by the authenticator.
+
+   silently discard
+             This means the implementation discards the packet without
+             further processing.  The implementation SHOULD provide the
+             capability of logging the error, including the contents of
+             the silently discarded packet, and SHOULD record the event
+             in a statistics counter.
+
+
+
+
+2.  Challenge-Handshake Authentication Protocol
+
+   The Challenge-Handshake Authentication Protocol (CHAP) is used to
+   periodically verify the identity of the peer using a 3-way handshake.
+   This is done upon initial link establishment, and MAY be repeated
+   anytime after the link has been established.
+
+   1.    After the Link Establishment phase is complete, the
+         authenticator sends a "challenge" message to the peer.
+
+   2.    The peer responds with a value calculated using a "one-way
+         hash" function.
+
+   3.    The authenticator checks the response against its own
+         calculation of the expected hash value.  If the values match,
+         the authentication is acknowledged; otherwise the connection
+         SHOULD be terminated.
+
+   4.    At random intervals, the authenticator sends a new challenge to
+         the peer, and repeats steps 1 to 3.
+
+
+
+
+
+
+
+
+Simpson                                                         [Page 2]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+2.1.  Advantages
+
+   CHAP provides protection against playback attack by the peer through
+   the use of an incrementally changing identifier and a variable
+   challenge value.  The use of repeated challenges is intended to limit
+   the time of exposure to any single attack.  The authenticator is in
+   control of the frequency and timing of the challenges.
+
+   This authentication method depends upon a "secret" known only to the
+   authenticator and that peer.  The secret is not sent over the link.
+
+   Although the authentication is only one-way, by negotiating CHAP in
+   both directions the same secret set may easily be used for mutual
+   authentication.
+
+   Since CHAP may be used to authenticate many different systems, name
+   fields may be used as an index to locate the proper secret in a large
+   table of secrets.  This also makes it possible to support more than
+   one name/secret pair per system, and to change the secret in use at
+   any time during the session.
+
+
+2.2.  Disadvantages
+
+   CHAP requires that the secret be available in plaintext form.
+   Irreversably encrypted password databases commonly available cannot
+   be used.
+
+   It is not as useful for large installations, since every possible
+   secret is maintained at both ends of the link.
+
+      Implementation Note: To avoid sending the secret over other links
+      in the network, it is recommended that the challenge and response
+      values be examined at a central server, rather than each network
+      access server.  Otherwise, the secret SHOULD be sent to such
+      servers in a reversably encrypted form.  Either case requires a
+      trusted relationship, which is outside the scope of this
+      specification.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Simpson                                                         [Page 3]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+2.3.  Design Requirements
+
+   The CHAP algorithm requires that the length of the secret MUST be at
+   least 1 octet.  The secret SHOULD be at least as large and
+   unguessable as a well-chosen password.  It is preferred that the
+   secret be at least the length of the hash value for the hashing
+   algorithm chosen (16 octets for MD5).  This is to ensure a
+   sufficiently large range for the secret to provide protection against
+   exhaustive search attacks.
+
+   The one-way hash algorithm is chosen such that it is computationally
+   infeasible to determine the secret from the known challenge and
+   response values.
+
+   Each challenge value SHOULD be unique, since repetition of a
+   challenge value in conjunction with the same secret would permit an
+   attacker to reply with a previously intercepted response.  Since it
+   is expected that the same secret MAY be used to authenticate with
+   servers in disparate geographic regions, the challenge SHOULD exhibit
+   global and temporal uniqueness.
+
+   Each challenge value SHOULD also be unpredictable, least an attacker
+   trick a peer into responding to a predicted future challenge, and
+   then use the response to masquerade as that peer to an authenticator.
+
+   Although protocols such as CHAP are incapable of protecting against
+   realtime active wiretapping attacks, generation of unique
+   unpredictable challenges can protect against a wide range of active
+   attacks.
+
+   A discussion of sources of uniqueness and probability of divergence
+   is included in the Magic-Number Configuration Option [1].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Simpson                                                         [Page 4]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+3.  Configuration Option Format
+
+   A summary of the Authentication-Protocol Configuration Option format
+   to negotiate the Challenge-Handshake Authentication Protocol is shown
+   below.  The fields are transmitted from left to right.
+
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Type      |    Length     |     Authentication-Protocol   |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |   Algorithm   |
+   +-+-+-+-+-+-+-+-+
+
+   Type
+
+      3
+
+   Length
+
+      5
+
+   Authentication-Protocol
+
+      c223 (hex) for Challenge-Handshake Authentication Protocol.
+
+   Algorithm
+
+      The Algorithm field is one octet and indicates the authentication
+      method to be used.  Up-to-date values are specified in the most
+      recent "Assigned Numbers" [2].  One value is required to be
+      implemented:
+
+         5       CHAP with MD5 [3]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Simpson                                                         [Page 5]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+4.  Packet Format
+
+   Exactly one Challenge-Handshake Authentication Protocol packet is
+   encapsulated in the Information field of a PPP Data Link Layer frame
+   where the protocol field indicates type hex c223 (Challenge-Handshake
+   Authentication Protocol).  A summary of the CHAP packet format is
+   shown below.  The fields are transmitted from left to right.
+
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |    Data ...
+   +-+-+-+-+
+
+   Code
+
+      The Code field is one octet and identifies the type of CHAP
+      packet.  CHAP Codes are assigned as follows:
+
+         1       Challenge
+         2       Response
+         3       Success
+         4       Failure
+
+   Identifier
+
+      The Identifier field is one octet and aids in matching challenges,
+      responses and replies.
+
+   Length
+
+      The Length field is two octets and indicates the length of the
+      CHAP packet including the Code, Identifier, Length and Data
+      fields.  Octets outside the range of the Length field should be
+      treated as Data Link Layer padding and should be ignored on
+      reception.
+
+   Data
+
+      The Data field is zero or more octets.  The format of the Data
+      field is determined by the Code field.
+
+
+
+
+
+
+
+
+
+
+Simpson                                                         [Page 6]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+4.1.  Challenge and Response
+
+   Description
+
+      The Challenge packet is used to begin the Challenge-Handshake
+      Authentication Protocol.  The authenticator MUST transmit a CHAP
+      packet with the Code field set to 1 (Challenge).  Additional
+      Challenge packets MUST be sent until a valid Response packet is
+      received, or an optional retry counter expires.
+
+      A Challenge packet MAY also be transmitted at any time during the
+      Network-Layer Protocol phase to ensure that the connection has not
+      been altered.
+
+      The peer SHOULD expect Challenge packets during the Authentication
+      phase and the Network-Layer Protocol phase.  Whenever a Challenge
+      packet is received, the peer MUST transmit a CHAP packet with the
+      Code field set to 2 (Response).
+
+      Whenever a Response packet is received, the authenticator compares
+      the Response Value with its own calculation of the expected value.
+      Based on this comparison, the authenticator MUST send a Success or
+      Failure packet (described below).
+
+         Implementation Notes: Because the Success might be lost, the
+         authenticator MUST allow repeated Response packets during the
+         Network-Layer Protocol phase after completing the
+         Authentication phase.  To prevent discovery of alternative
+         Names and Secrets, any Response packets received having the
+         current Challenge Identifier MUST return the same reply Code
+         previously returned for that specific Challenge (the message
+         portion MAY be different).  Any Response packets received
+         during any other phase MUST be silently discarded.
+
+         When the Failure is lost, and the authenticator terminates the
+         link, the LCP Terminate-Request and Terminate-Ack provide an
+         alternative indication that authentication failed.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Simpson                                                         [Page 7]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+   A summary of the Challenge and Response packet format is shown below.
+   The fields are transmitted from left to right.
+
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |  Value-Size   |  Value ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |  Name ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Code
+
+      1 for Challenge;
+
+      2 for Response.
+
+   Identifier
+
+      The Identifier field is one octet.  The Identifier field MUST be
+      changed each time a Challenge is sent.
+
+      The Response Identifier MUST be copied from the Identifier field
+      of the Challenge which caused the Response.
+
+   Value-Size
+
+      This field is one octet and indicates the length of the Value
+      field.
+
+   Value
+
+      The Value field is one or more octets.  The most significant octet
+      is transmitted first.
+
+      The Challenge Value is a variable stream of octets.  The
+      importance of the uniqueness of the Challenge Value and its
+      relationship to the secret is described above.  The Challenge
+      Value MUST be changed each time a Challenge is sent.  The length
+      of the Challenge Value depends upon the method used to generate
+      the octets, and is independent of the hash algorithm used.
+
+      The Response Value is the one-way hash calculated over a stream of
+      octets consisting of the Identifier, followed by (concatenated
+      with) the "secret", followed by (concatenated with) the Challenge
+      Value.  The length of the Response Value depends upon the hash
+      algorithm used (16 octets for MD5).
+
+
+
+
+Simpson                                                         [Page 8]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+   Name
+
+      The Name field is one or more octets representing the
+      identification of the system transmitting the packet.  There are
+      no limitations on the content of this field.  For example, it MAY
+      contain ASCII character strings or globally unique identifiers in
+      ASN.1 syntax.  The Name should not be NUL or CR/LF terminated.
+      The size is determined from the Length field.
+
+
+4.2.  Success and Failure
+
+   Description
+
+      If the Value received in a Response is equal to the expected
+      value, then the implementation MUST transmit a CHAP packet with
+      the Code field set to 3 (Success).
+
+      If the Value received in a Response is not equal to the expected
+      value, then the implementation MUST transmit a CHAP packet with
+      the Code field set to 4 (Failure), and SHOULD take action to
+      terminate the link.
+
+   A summary of the Success and Failure packet format is shown below.
+   The fields are transmitted from left to right.
+
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |     Code      |  Identifier   |            Length             |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |  Message  ...
+   +-+-+-+-+-+-+-+-+-+-+-+-+-
+
+   Code
+
+      3 for Success;
+
+      4 for Failure.
+
+   Identifier
+
+      The Identifier field is one octet and aids in matching requests
+      and replies.  The Identifier field MUST be copied from the
+      Identifier field of the Response which caused this reply.
+
+
+
+
+
+
+
+
+Simpson                                                         [Page 9]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+   Message
+
+      The Message field is zero or more octets, and its contents are
+      implementation dependent.  It is intended to be human readable,
+      and MUST NOT affect operation of the protocol.  It is recommended
+      that the message contain displayable ASCII characters 32 through
+      126 decimal.  Mechanisms for extension to other character sets are
+      the topic of future research.  The size is determined from the
+      Length field.
+
+
+
+Security Considerations
+
+   Security issues are the primary topic of this RFC.
+
+   The interaction of the authentication protocols within PPP are highly
+   implementation dependent.  This is indicated by the use of SHOULD
+   throughout the document.
+
+   For example, upon failure of authentication, some implementations do
+   not terminate the link.  Instead, the implementation limits the kind
+   of traffic in the Network-Layer Protocols to a filtered subset, which
+   in turn allows the user opportunity to update secrets or send mail to
+   the network administrator indicating a problem.
+
+   There is no provision for re-tries of failed authentication.
+   However, the LCP state machine can renegotiate the authentication
+   protocol at any time, thus allowing a new attempt.  It is recommended
+   that any counters used for authentication failure not be reset until
+   after successful authentication, or subsequent termination of the
+   failed link.
+
+   There is no requirement that authentication be full duplex or that
+   the same protocol be used in both directions.  It is perfectly
+   acceptable for different protocols to be used in each direction.
+   This will, of course, depend on the specific protocols negotiated.
+
+   The secret SHOULD NOT be the same in both directions.  This allows an
+   attacker to replay the peer's challenge, accept the computed
+   response, and use that response to authenticate.
+
+   In practice, within or associated with each PPP server, there is a
+   database which associates "user" names with authentication
+   information ("secrets").  It is not anticipated that a particular
+   named user would be authenticated by multiple methods.  This would
+   make the user vulnerable to attacks which negotiate the least secure
+   method from among a set (such as PAP rather than CHAP).  If the same
+
+
+
+Simpson                                                        [Page 10]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+   secret was used, PAP would reveal the secret to be used later with
+   CHAP.
+
+   Instead, for each user name there should be an indication of exactly
+   one method used to authenticate that user name.  If a user needs to
+   make use of different authentication methods under different
+   circumstances, then distinct user names SHOULD be employed, each of
+   which identifies exactly one authentication method.
+
+   Passwords and other secrets should be stored at the respective ends
+   such that access to them is as limited as possible.  Ideally, the
+   secrets should only be accessible to the process requiring access in
+   order to perform the authentication.
+
+   The secrets should be distributed with a mechanism that limits the
+   number of entities that handle (and thus gain knowledge of) the
+   secret.  Ideally, no unauthorized person should ever gain knowledge
+   of the secrets.  Such a mechanism is outside the scope of this
+   specification.
+
+
+Acknowledgements
+
+   David Kaufman, Frank Heinrich, and Karl Auerbach used a challenge
+   handshake at SDC when designing one of the protocols for a "secure"
+   network in the mid-1970s.  Tom Bearson built a prototype Sytek
+   product ("Poloneous"?) on the challenge-response notion in the 1982-
+   83 timeframe.  Another variant is documented in the various IBM SNA
+   manuals.  Yet another variant was implemented by Karl Auerbach in the
+   Telebit NetBlazer circa 1991.
+
+   Kim Toms and Barney Wolff provided useful critiques of earlier
+   versions of this document.
+
+   Special thanks to Dave Balenson, Steve Crocker, James Galvin, and
+   Steve Kent, for their extensive explanations and suggestions.  Now,
+   if only we could get them to agree with each other.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Simpson                                                        [Page 11]
+
+RFC 1994                        PPP CHAP                     August 1996
+
+
+References
+
+   [1]   Simpson, W., Editor, "The Point-to-Point Protocol (PPP)", STD
+         51, RFC 1661, DayDreamer, July 1994.
+
+   [2]   Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
+         1700, USC/Information Sciences Institute, October 1994.
+
+   [3]   Rivest, R., and S. Dusse, "The MD5 Message-Digest Algorithm",
+         MIT Laboratory for Computer Science and RSA Data Security,
+         Inc., RFC 1321, April 1992.
+
+
+
+Contacts
+
+   Comments should be submitted to the ietf-ppp@merit.edu mailing list.
+
+   This document was reviewed by the Point-to-Point Protocol Working
+   Group of the Internet Engineering Task Force (IETF).  The working
+   group can be contacted via the current chair:
+
+      Karl Fox
+      Ascend Communications
+      3518 Riverside Drive, Suite 101
+      Columbus, Ohio 43221
+
+          karl@MorningStar.com
+          karl@Ascend.com
+
+
+   Questions about this memo can also be directed to:
+
+      William Allen Simpson
+      DayDreamer
+      Computer Systems Consulting Services
+      1384 Fontaine
+      Madison Heights, Michigan  48071
+
+          wsimpson@UMich.edu
+          wsimpson@GreenDragon.com (preferred)
+
+
+
+
+
+
+
+
+
+
+Simpson                                                        [Page 12]
+
+
--- a/kernel/picotcp/RFC/rfc2012.txt
+++ b/kernel/picotcp/RFC/rfc2012.txt
@ -0,0 +1,563 @@
+
+
+
+
+
+
+Network Working Group                              K. McCloghrie, Editor
+Request for Comments: 2012                                 Cisco Systems
+Updates: 1213                                              November 1996
+Category: Standards Track
+
+
+                   SNMPv2 Management Information Base
+           for the Transmission Control Protocol using SMIv2
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+IESG Note:
+
+   The IP, UDP, and TCP MIB modules currently support only IPv4.  These
+   three modules use the IpAddress type defined as an OCTET STRING of
+   length 4 to represent the IPv4 32-bit internet addresses.  (See RFC
+   1902, SMI for SNMPv2.)  They do not support the new 128-bit IPv6
+   internet addresses.
+
+Table of Contents
+
+   1. Introduction ................................................    1
+   2. Definitions .................................................    2
+   2.1 The TCP Group ..............................................    3
+   2.2 Conformance Information ....................................    8
+   2.2.1 Compliance Statements ....................................    8
+   2.2.2 Units of Conformance .....................................    9
+   3. Acknowledgements ............................................   10
+   4. References ..................................................   10
+   5. Security Considerations .....................................   10
+   6. Editor's Address ............................................   10
+
+1.  Introduction
+
+   A management system contains: several (potentially many) nodes, each
+   with a processing entity, termed an agent, which has access to
+   management instrumentation; at least one management station; and, a
+   management protocol, used to convey management information between
+   the agents and management stations.  Operations of the protocol are
+   carried out under an administrative framework which defines
+   authentication, authorization, access control, and privacy policies.
+
+
+
+
+McCloghrie                  Standards Track                     [Page 1]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+   Management stations execute management applications which monitor and
+   control managed elements.  Managed elements are devices such as
+   hosts, routers, terminal servers, etc., which are monitored and
+   controlled via access to their management information.
+
+   Management information is viewed as a collection of managed objects,
+   residing in a virtual information store, termed the Management
+   Information Base (MIB).  Collections of related objects are defined
+   in MIB modules.  These modules are written using a subset of OSI's
+   Abstract Syntax Notation One (ASN.1) [1], termed the Structure of
+   Management Information (SMI) [2].
+
+   This document is the MIB module which defines managed objects for
+   managing implementations of the Transmission Control Protocol (TCP)
+   [3].
+
+   The managed objects in this MIB module were originally defined using
+   the SNMPv1 framework as a part of MIB-II [4].  This document defines
+   the same objects for TCP using the SNMPv2 framework.
+
+2.  Definitions
+
+TCP-MIB DEFINITIONS ::= BEGIN
+
+IMPORTS
+    MODULE-IDENTITY, OBJECT-TYPE, Integer32, Gauge32,
+    Counter32, IpAddress, mib-2        FROM SNMPv2-SMI
+    MODULE-COMPLIANCE, OBJECT-GROUP    FROM SNMPv2-CONF;
+
+tcpMIB MODULE-IDENTITY
+    LAST-UPDATED "9411010000Z"
+    ORGANIZATION "IETF SNMPv2 Working Group"
+    CONTACT-INFO
+            "        Keith McCloghrie
+
+             Postal: Cisco Systems, Inc.
+                     170 West Tasman Drive
+                     San Jose, CA  95134-1706
+                     US
+
+             Phone:  +1 408 526 5260
+             Email:  kzm@cisco.com"
+
+
+
+
+
+
+
+
+
+McCloghrie                  Standards Track                     [Page 2]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+    DESCRIPTION
+            "The MIB module for managing TCP implementations."
+    REVISION      "9103310000Z"
+    DESCRIPTION
+            "The initial revision of this MIB module was part of MIB-
+            II."
+    ::= { mib-2 49 }
+
+-- the TCP group
+
+tcp      OBJECT IDENTIFIER ::= { mib-2 6 }
+
+tcpRtoAlgorithm OBJECT-TYPE
+    SYNTAX      INTEGER {
+                    other(1),    -- none of the following
+                    constant(2), -- a constant rto
+                    rsre(3),     -- MIL-STD-1778, Appendix B
+                    vanj(4)      -- Van Jacobson's algorithm [5]
+                }
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The algorithm used to determine the timeout value used for
+            retransmitting unacknowledged octets."
+    ::= { tcp 1 }
+
+tcpRtoMin OBJECT-TYPE
+    SYNTAX      Integer32
+    UNITS       "milliseconds"
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The minimum value permitted by a TCP implementation for the
+            retransmission timeout, measured in milliseconds.  More
+            refined semantics for objects of this type depend upon the
+            algorithm used to determine the retransmission timeout.  In
+            particular, when the timeout algorithm is rsre(3), an object
+            of this type has the semantics of the LBOUND quantity
+            described in RFC 793."
+    ::= { tcp 2 }
+
+tcpRtoMax OBJECT-TYPE
+    SYNTAX      Integer32
+    UNITS       "milliseconds"
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The maximum value permitted by a TCP implementation for the
+
+
+
+McCloghrie                  Standards Track                     [Page 3]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+            retransmission timeout, measured in milliseconds.  More
+            refined semantics for objects of this type depend upon the
+            algorithm used to determine the retransmission timeout.  In
+            particular, when the timeout algorithm is rsre(3), an object
+            of this type has the semantics of the UBOUND quantity
+            described in RFC 793."
+    ::= { tcp 3 }
+
+tcpMaxConn OBJECT-TYPE
+    SYNTAX      Integer32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The limit on the total number of TCP connections the entity
+            can support.  In entities where the maximum number of
+            connections is dynamic, this object should contain the value
+            -1."
+    ::= { tcp 4 }
+
+tcpActiveOpens OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The number of times TCP connections have made a direct
+            transition to the SYN-SENT state from the CLOSED state."
+    ::= { tcp 5 }
+
+tcpPassiveOpens OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The number of times TCP connections have made a direct
+            transition to the SYN-RCVD state from the LISTEN state."
+    ::= { tcp 6 }
+
+tcpAttemptFails OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The number of times TCP connections have made a direct
+            transition to the CLOSED state from either the SYN-SENT
+            state or the SYN-RCVD state, plus the number of times TCP
+            connections have made a direct transition to the LISTEN
+            state from the SYN-RCVD state."
+    ::= { tcp 7 }
+
+
+
+McCloghrie                  Standards Track                     [Page 4]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+tcpEstabResets OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The number of times TCP connections have made a direct
+            transition to the CLOSED state from either the ESTABLISHED
+            state or the CLOSE-WAIT state."
+    ::= { tcp 8 }
+
+tcpCurrEstab OBJECT-TYPE
+    SYNTAX      Gauge32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The number of TCP connections for which the current state
+            is either ESTABLISHED or CLOSE- WAIT."
+    ::= { tcp 9 }
+
+
+tcpInSegs OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The total number of segments received, including those
+            received in error.  This count includes segments received on
+            currently established connections."
+    ::= { tcp 10 }
+
+tcpOutSegs OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The total number of segments sent, including those on
+            current connections but excluding those containing only
+            retransmitted octets."
+    ::= { tcp 11 }
+
+tcpRetransSegs OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The total number of segments retransmitted - that is, the
+            number of TCP segments transmitted containing one or more
+            previously transmitted octets."
+
+
+
+McCloghrie                  Standards Track                     [Page 5]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+    ::= { tcp 12 }
+
+
+-- the TCP Connection table
+
+-- The TCP connection table contains information about this
+-- entity's existing TCP connections.
+
+tcpConnTable OBJECT-TYPE
+    SYNTAX      SEQUENCE OF TcpConnEntry
+    MAX-ACCESS  not-accessible
+    STATUS      current
+    DESCRIPTION
+            "A table containing TCP connection-specific information."
+    ::= { tcp 13 }
+
+tcpConnEntry OBJECT-TYPE
+    SYNTAX      TcpConnEntry
+    MAX-ACCESS  not-accessible
+    STATUS      current
+    DESCRIPTION
+            "A conceptual row of the tcpConnTable containing information
+            about a particular current TCP connection.  Each row of this
+            table is transient, in that it ceases to exist when (or soon
+            after) the connection makes the transition to the CLOSED
+            state."
+    INDEX   { tcpConnLocalAddress,
+              tcpConnLocalPort,
+              tcpConnRemAddress,
+              tcpConnRemPort }
+    ::= { tcpConnTable 1 }
+
+TcpConnEntry ::= SEQUENCE {
+        tcpConnState          INTEGER,
+        tcpConnLocalAddress   IpAddress,
+        tcpConnLocalPort      INTEGER,
+        tcpConnRemAddress     IpAddress,
+        tcpConnRemPort        INTEGER
+    }
+
+tcpConnState OBJECT-TYPE
+    SYNTAX      INTEGER {
+                    closed(1),
+                    listen(2),
+                    synSent(3),
+                    synReceived(4),
+                    established(5),
+                    finWait1(6),
+
+
+
+McCloghrie                  Standards Track                     [Page 6]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+                    finWait2(7),
+                    closeWait(8),
+                    lastAck(9),
+                    closing(10),
+                    timeWait(11),
+                    deleteTCB(12)
+                }
+    MAX-ACCESS  read-write
+    STATUS      current
+    DESCRIPTION
+            "The state of this TCP connection.
+
+            The only value which may be set by a management station is
+            deleteTCB(12).  Accordingly, it is appropriate for an agent
+            to return a `badValue' response if a management station
+            attempts to set this object to any other value.
+
+            If a management station sets this object to the value
+            deleteTCB(12), then this has the effect of deleting the TCB
+            (as defined in RFC 793) of the corresponding connection on
+            the managed node, resulting in immediate termination of the
+            connection.
+
+            As an implementation-specific option, a RST segment may be
+            sent from the managed node to the other TCP endpoint (note
+            however that RST segments are not sent reliably)."
+    ::= { tcpConnEntry 1 }
+
+tcpConnLocalAddress OBJECT-TYPE
+    SYNTAX      IpAddress
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The local IP address for this TCP connection.  In the case
+            of a connection in the listen state which is willing to
+            accept connections for any IP interface associated with the
+            node, the value 0.0.0.0 is used."
+    ::= { tcpConnEntry 2 }
+
+tcpConnLocalPort OBJECT-TYPE
+    SYNTAX      INTEGER (0..65535)
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The local port number for this TCP connection."
+    ::= { tcpConnEntry 3 }
+
+tcpConnRemAddress OBJECT-TYPE
+
+
+
+McCloghrie                  Standards Track                     [Page 7]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+    SYNTAX      IpAddress
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The remote IP address for this TCP connection."
+    ::= { tcpConnEntry 4 }
+
+tcpConnRemPort OBJECT-TYPE
+    SYNTAX      INTEGER (0..65535)
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The remote port number for this TCP connection."
+    ::= { tcpConnEntry 5 }
+
+tcpInErrs OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The total number of segments received in error (e.g., bad
+            TCP checksums)."
+    ::= { tcp 14 }
+
+tcpOutRsts OBJECT-TYPE
+    SYNTAX      Counter32
+    MAX-ACCESS  read-only
+    STATUS      current
+    DESCRIPTION
+            "The number of TCP segments sent containing the RST flag."
+    ::= { tcp 15 }
+
+-- conformance information
+
+tcpMIBConformance OBJECT IDENTIFIER ::= { tcpMIB 2 }
+
+tcpMIBCompliances OBJECT IDENTIFIER ::= { tcpMIBConformance 1 }
+tcpMIBGroups      OBJECT IDENTIFIER ::= { tcpMIBConformance 2 }
+
+
+-- compliance statements
+
+tcpMIBCompliance MODULE-COMPLIANCE
+    STATUS  current
+    DESCRIPTION
+            "The compliance statement for SNMPv2 entities which
+            implement TCP."
+    MODULE  -- this module
+
+
+
+McCloghrie                  Standards Track                     [Page 8]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+        MANDATORY-GROUPS { tcpGroup
+                           }
+    ::= { tcpMIBCompliances 1 }
+
+-- units of conformance
+
+tcpGroup OBJECT-GROUP
+    OBJECTS   { tcpRtoAlgorithm, tcpRtoMin, tcpRtoMax,
+                tcpMaxConn, tcpActiveOpens,
+                tcpPassiveOpens, tcpAttemptFails,
+                tcpEstabResets, tcpCurrEstab, tcpInSegs,
+                tcpOutSegs, tcpRetransSegs, tcpConnState,
+                tcpConnLocalAddress, tcpConnLocalPort,
+                tcpConnRemAddress, tcpConnRemPort,
+                tcpInErrs, tcpOutRsts }
+    STATUS    current
+    DESCRIPTION
+            "The tcp group of objects providing for management of TCP
+            entities."
+    ::= { tcpMIBGroups 1 }
+
+END
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+McCloghrie                  Standards Track                     [Page 9]
+
+RFC 2012                   SNMPv2 MIB for TCP              November 1996
+
+
+3.  Acknowledgements
+
+   This document contains a modified subset of RFC 1213.
+
+4.  References
+
+   [1]  Information processing systems - Open Systems Interconnection -
+        Specification of Abstract Syntax Notation One (ASN.1),
+        International Organization for Standardization.  International
+        Standard 8824, (December, 1987).
+
+   [2]  McCloghrie, K., Editor, "Structure of Management Information
+        for version 2 of the Simple Network Management Protocol
+        (SNMPv2)", RFC 1902, Cisco Systems, January 1996.
+
+   [3]  Postel, J., "Transmission Control Protocol - DARPA Internet
+        Program Protocol Specification", STD 7, RFC 793, DARPA,
+        September 1981.
+
+   [4]  McCloghrie, K., and M. Rose, "Management Information Base for
+        Network Management of TCP/IP-based internets: MIB-II", STD 17,
+        RFC 1213, March 1991.
+
+   [5]  Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 1988,
+        Stanford, California.
+
+5.  Security Considerations
+
+   Security issues are not discussed in this memo.
+
+6.  Editor's Address
+
+   Keith McCloghrie
+   Cisco Systems, Inc.
+   170 West Tasman Drive
+   San Jose, CA  95134-1706
+   US
+
+   Phone: +1 408 526 5260
+   EMail: kzm@cisco.com
+
+
+
+
+
+
+
+
+
+
+
+McCloghrie                  Standards Track                    [Page 10]
+
--- a/kernel/picotcp/RFC/rfc2018.txt
+++ b/kernel/picotcp/RFC/rfc2018.txt
@ -0,0 +1,675 @@
+
+
+
+
+
+
+Network Working Group                                        M. Mathis
+Request for Comments: 2018                                  J. Mahdavi
+Category: Standards Track                                          PSC
+                                                              S. Floyd
+                                                                  LBNL
+                                                            A. Romanow
+                                                      Sun Microsystems
+                                                          October 1996
+
+
+                  TCP Selective Acknowledgment Options
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Abstract
+
+   TCP may experience poor performance when multiple packets are lost
+   from one window of data.   With the limited information available
+   from cumulative acknowledgments, a TCP sender can only learn about a
+   single lost packet per round trip time.  An aggressive sender could
+   choose to retransmit packets early, but such retransmitted segments
+   may have already been successfully received.
+
+   A Selective Acknowledgment (SACK) mechanism, combined with a
+   selective repeat retransmission policy, can help to overcome these
+   limitations.  The receiving TCP sends back SACK packets to the sender
+   informing the sender of data that has been received. The sender can
+   then retransmit only the missing data segments.
+
+   This memo proposes an implementation of SACK and discusses its
+   performance and related issues.
+
+Acknowledgements
+
+   Much of the text in this document is taken directly from RFC1072 "TCP
+   Extensions for Long-Delay Paths" by Bob Braden and Van Jacobson.  The
+   authors would like to thank Kevin Fall (LBNL), Christian Huitema
+   (INRIA), Van Jacobson (LBNL), Greg Miller (MITRE), Greg Minshall
+   (Ipsilon), Lixia Zhang (XEROX PARC and UCLA), Dave Borman (BSDI),
+   Allison Mankin (ISI) and others for their review and constructive
+   comments.
+
+
+
+
+Mathis, et. al.             Standards Track                     [Page 1]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+1.  Introduction
+
+   Multiple packet losses from a window of data can have a catastrophic
+   effect on TCP throughput. TCP [Postel81] uses a cumulative
+   acknowledgment scheme in which received segments that are not at the
+   left edge of the receive window are not acknowledged.  This forces
+   the sender to either wait a roundtrip time to find out about each
+   lost packet, or to unnecessarily retransmit segments which have been
+   correctly received [Fall95].  With the cumulative acknowledgment
+   scheme, multiple dropped segments generally cause TCP to lose its
+   ACK-based clock, reducing overall throughput.
+
+   Selective Acknowledgment (SACK) is a strategy which corrects this
+   behavior in the face of multiple dropped segments.  With selective
+   acknowledgments, the data receiver can inform the sender about all
+   segments that have arrived successfully, so the sender need
+   retransmit only the segments that have actually been lost.
+
+   Several transport protocols, including NETBLT [Clark87], XTP
+   [Strayer92], RDP [Velten84], NADIR [Huitema81], and VMTP [Cheriton88]
+   have used selective acknowledgment.  There is some empirical evidence
+   in favor of selective acknowledgments -- simple experiments with RDP
+   have shown that disabling the selective acknowledgment facility
+   greatly increases the number of retransmitted segments over a lossy,
+   high-delay Internet path [Partridge87]. A recent simulation study by
+   Kevin Fall and Sally Floyd [Fall95], demonstrates the strength of TCP
+   with SACK over the non-SACK Tahoe and Reno TCP implementations.
+
+   RFC1072 [VJ88] describes one possible implementation of SACK options
+   for TCP.  Unfortunately, it has never been deployed in the Internet,
+   as there was disagreement about how SACK options should be used in
+   conjunction with the TCP window shift option (initially described
+   RFC1072 and revised in [Jacobson92]).
+
+   We propose slight modifications to the SACK options as proposed in
+   RFC1072.  Specifically, sending a selective acknowledgment for the
+   most recently received data reduces the need for long SACK options
+   [Keshav94, Mathis95].  In addition, the SACK option now carries full
+   32 bit sequence numbers.  These two modifications represent the only
+   changes to the proposal in RFC1072.  They make SACK easier to
+   implement and address concerns about robustness.
+
+   The selective acknowledgment extension uses two TCP options. The
+   first is an enabling option, "SACK-permitted", which may be sent in a
+   SYN segment to indicate that the SACK option can be used once the
+   connection is established.  The other is the SACK option itself,
+   which may be sent over an established connection once permission has
+   been given by SACK-permitted.
+
+
+
+Mathis, et. al.             Standards Track                     [Page 2]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+   The SACK option is to be included in a segment sent from a TCP that
+   is receiving data to the TCP that is sending that data; we will refer
+   to these TCP's as the data receiver and the data sender,
+   respectively.  We will consider a particular simplex data flow; any
+   data flowing in the reverse direction over the same connection can be
+   treated independently.
+
+2.  Sack-Permitted Option
+
+   This two-byte option may be sent in a SYN by a TCP that has been
+   extended to receive (and presumably process) the SACK option once the
+   connection has opened.  It MUST NOT be sent on non-SYN segments.
+
+       TCP Sack-Permitted Option:
+
+       Kind: 4
+
+       +---------+---------+
+       | Kind=4  | Length=2|
+       +---------+---------+
+
+3.  Sack Option Format
+
+   The SACK option is to be used to convey extended acknowledgment
+   information from the receiver to the sender over an established TCP
+   connection.
+
+       TCP SACK Option:
+
+       Kind: 5
+
+       Length: Variable
+
+                         +--------+--------+
+                         | Kind=5 | Length |
+       +--------+--------+--------+--------+
+       |      Left Edge of 1st Block       |
+       +--------+--------+--------+--------+
+       |      Right Edge of 1st Block      |
+       +--------+--------+--------+--------+
+       |                                   |
+       /            . . .                  /
+       |                                   |
+       +--------+--------+--------+--------+
+       |      Left Edge of nth Block       |
+       +--------+--------+--------+--------+
+       |      Right Edge of nth Block      |
+       +--------+--------+--------+--------+
+
+
+
+Mathis, et. al.             Standards Track                     [Page 3]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+   The SACK option is to be sent by a data receiver to inform the data
+   sender of non-contiguous blocks of data that have been received and
+   queued.  The data receiver awaits the receipt of data (perhaps by
+   means of retransmissions) to fill the gaps in sequence space between
+   received blocks.  When missing segments are received, the data
+   receiver acknowledges the data normally by advancing the left window
+   edge in the Acknowledgement Number Field of the TCP header.  The SACK
+   option does not change the meaning of the Acknowledgement Number
+   field.
+
+   This option contains a list of some of the blocks of contiguous
+   sequence space occupied by data that has been received and queued
+   within the window.
+
+   Each contiguous block of data queued at the data receiver is defined
+   in the SACK option by two 32-bit unsigned integers in network byte
+   order:
+
+   *    Left Edge of Block
+
+        This is the first sequence number of this block.
+
+   *    Right Edge of Block
+
+        This is the sequence number immediately following the last
+        sequence number of this block.
+
+   Each block represents received bytes of data that are contiguous and
+   isolated; that is, the bytes just below the block, (Left Edge of
+   Block - 1), and just above the block, (Right Edge of Block), have not
+   been received.
+
+   A SACK option that specifies n blocks will have a length of 8*n+2
+   bytes, so the 40 bytes available for TCP options can specify a
+   maximum of 4 blocks.  It is expected that SACK will often be used in
+   conjunction with the Timestamp option used for RTTM [Jacobson92],
+   which takes an additional 10 bytes (plus two bytes of padding); thus
+   a maximum of 3 SACK blocks will be allowed in this case.
+
+   The SACK option is advisory, in that, while it notifies the data
+   sender that the data receiver has received the indicated segments,
+   the data receiver is permitted to later discard data which have been
+   reported in a SACK option.  A discussion appears below in Section 8
+   of the consequences of advisory SACK, in particular that the data
+   receiver may renege, or drop already SACKed data.
+
+
+
+
+
+
+Mathis, et. al.             Standards Track                     [Page 4]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+4.  Generating Sack Options: Data Receiver Behavior
+
+   If the data receiver has received a SACK-Permitted option on the SYN
+   for this connection, the data receiver MAY elect to generate SACK
+   options as described below.  If the data receiver generates SACK
+   options under any circumstance, it SHOULD generate them under all
+   permitted circumstances.  If the data receiver has not received a
+   SACK-Permitted option for a given connection, it MUST NOT send SACK
+   options on that connection.
+
+   If sent at all, SACK options SHOULD be included in all ACKs which do
+   not ACK the highest sequence number in the data receiver's queue.  In
+   this situation the network has lost or mis-ordered data, such that
+   the receiver holds non-contiguous data in its queue.  RFC 1122,
+   Section 4.2.2.21, discusses the reasons for the receiver to send ACKs
+   in response to additional segments received in this state.  The
+   receiver SHOULD send an ACK for every valid segment that arrives
+   containing new data, and each of these "duplicate" ACKs SHOULD bear a
+   SACK option.
+
+   If the data receiver chooses to send a SACK option, the following
+   rules apply:
+
+      * The first SACK block (i.e., the one immediately following the
+      kind and length fields in the option) MUST specify the contiguous
+      block of data containing the segment which triggered this ACK,
+      unless that segment advanced the Acknowledgment Number field in
+      the header.  This assures that the ACK with the SACK option
+      reflects the most recent change in the data receiver's buffer
+      queue.
+
+      * The data receiver SHOULD include as many distinct SACK blocks as
+      possible in the SACK option.  Note that the maximum available
+      option space may not be sufficient to report all blocks present in
+      the receiver's queue.
+
+      * The SACK option SHOULD be filled out by repeating the most
+      recently reported SACK blocks (based on first SACK blocks in
+      previous SACK options) that are not subsets of a SACK block
+      already included in the SACK option being constructed.  This
+      assures that in normal operation, any segment remaining part of a
+      non-contiguous block of data held by the data receiver is reported
+      in at least three successive SACK options, even for large-window
+      TCP implementations [RFC1323]).  After the first SACK block, the
+      following SACK blocks in the SACK option may be listed in
+      arbitrary order.
+
+
+
+
+
+Mathis, et. al.             Standards Track                     [Page 5]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+   It is very important that the SACK option always reports the block
+   containing the most recently received segment, because this provides
+   the sender with the most up-to-date information about the state of
+   the network and the data receiver's queue.
+
+5.  Interpreting the Sack Option and Retransmission Strategy: Data
+   Sender Behavior
+
+   When receiving an ACK containing a SACK option, the data sender
+   SHOULD record the selective acknowledgment for future reference.  The
+   data sender is assumed to have a retransmission queue that contains
+   the segments that have been transmitted but not yet acknowledged, in
+   sequence-number order.  If the data sender performs re-packetization
+   before retransmission, the block boundaries in a SACK option that it
+   receives may not fall on boundaries of segments in the retransmission
+   queue; however, this does not pose a serious difficulty for the
+   sender.
+
+   One possible implementation of the sender's behavior is as follows.
+   Let us suppose that for each segment in the retransmission queue
+   there is a (new) flag bit "SACKed", to be used to indicate that this
+   particular segment has been reported in a SACK option.
+
+   When an acknowledgment segment arrives containing a SACK option, the
+   data sender will turn on the SACKed bits for segments that have been
+   selectively acknowledged.  More specifically, for each block in the
+   SACK option, the data sender will turn on the SACKed flags for all
+   segments in the retransmission queue that are wholly contained within
+   that block.  This requires straightforward sequence number
+   comparisons.
+
+   After the SACKed bit is turned on (as the result of processing a
+   received SACK option), the data sender will skip that segment during
+   any later retransmission.  Any segment that has the SACKed bit turned
+   off and is less than the highest SACKed segment is available for
+   retransmission.
+
+   After a retransmit timeout the data sender SHOULD turn off all of the
+   SACKed bits, since the timeout might indicate that the data receiver
+   has reneged.  The data sender MUST retransmit the segment at the left
+   edge of the window after a retransmit timeout, whether or not the
+   SACKed bit is on for that segment.  A segment will not be dequeued
+   and its buffer freed until the left window edge is advanced over it.
+
+
+
+
+
+
+
+
+Mathis, et. al.             Standards Track                     [Page 6]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+5.1  Congestion Control Issues
+
+   This document does not attempt to specify in detail the congestion
+   control algorithms for implementations of TCP with SACK.  However,
+   the congestion control algorithms present in the de facto standard
+   TCP implementations MUST be preserved [Stevens94].  In particular, to
+   preserve robustness in the presence of packets reordered by the
+   network, recovery is not triggered by a single ACK reporting out-of-
+   order packets at the receiver.  Further, during recovery the data
+   sender limits the number of segments sent in response to each ACK.
+   Existing implementations limit the data sender to sending one segment
+   during Reno-style fast recovery, or to two segments during slow-start
+   [Jacobson88].  Other aspects of congestion control, such as reducing
+   the congestion window in response to congestion, must similarly be
+   preserved.
+
+   The use of time-outs as a fall-back mechanism for detecting dropped
+   packets is unchanged by the SACK option.  Because the data receiver
+   is allowed to discard SACKed data, when a retransmit timeout occurs
+   the data sender MUST ignore prior SACK information in determining
+   which data to retransmit.
+
+   Future research into congestion control algorithms may take advantage
+   of the additional information provided by SACK.  One such area for
+   future research concerns modifications to TCP for a wireless or
+   satellite environment where packet loss is not necessarily an
+   indication of congestion.
+
+6.  Efficiency and Worst Case Behavior
+
+   If the return path carrying ACKs and SACK options were lossless, one
+   block per SACK option packet would always be sufficient.  Every
+   segment arriving while the data receiver holds discontinuous data
+   would cause the data receiver to send an ACK with a SACK option
+   containing the one altered block in the receiver's queue.  The data
+   sender is thus able to construct a precise replica of the receiver's
+   queue by taking the union of all the first SACK blocks.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Mathis, et. al.             Standards Track                     [Page 7]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+   Since the return path is not lossless, the SACK option is defined to
+   include more than one SACK block in a single packet.  The redundant
+   blocks in the SACK option packet increase the robustness of SACK
+   delivery in the presence of lost ACKs.  For a receiver that is also
+   using the time stamp option [Jacobson92], the SACK option has room to
+   include three SACK blocks.  Thus each SACK block will generally be
+   repeated at least three times, if necessary, once in each of three
+   successive ACK packets.  However, if all of the ACK packets reporting
+   a particular SACK block are dropped, then the sender might assume
+   that the data in that SACK block has not been received, and
+   unnecessarily retransmit those segments.
+
+   The deployment of other TCP options may reduce the number of
+   available SACK blocks to 2 or even to 1.  This will reduce the
+   redundancy of SACK delivery in the presence of lost ACKs.  Even so,
+   the exposure of TCP SACK in regard to the unnecessary retransmission
+   of packets is strictly less than the exposure of current
+   implementations of TCP.  The worst-case conditions necessary for the
+   sender to needlessly retransmit data is discussed in more detail in a
+   separate document [Floyd96].
+
+   Older TCP implementations which do not have the SACK option will not
+   be unfairly disadvantaged when competing against SACK-capable TCPs.
+   This issue is discussed in more detail in [Floyd96].
+
+7.  Sack Option Examples
+
+   The following examples attempt to demonstrate the proper behavior of
+   SACK generation by the data receiver.
+
+   Assume the left window edge is 5000 and that the data transmitter
+   sends a burst of 8 segments, each containing 500 data bytes.
+
+      Case 1: The first 4 segments are received but the last 4 are
+      dropped.
+
+      The data receiver will return a normal TCP ACK segment
+      acknowledging sequence number 7000, with no SACK option.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Mathis, et. al.             Standards Track                     [Page 8]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+      Case 2:  The first segment is dropped but the remaining 7 are
+      received.
+
+         Upon receiving each of the last seven packets, the data
+         receiver will return a TCP ACK segment that acknowledges
+         sequence number 5000 and contains a SACK option specifying
+         one block of queued data:
+
+             Triggering    ACK      Left Edge   Right Edge
+             Segment
+
+             5000         (lost)
+             5500         5000     5500       6000
+             6000         5000     5500       6500
+             6500         5000     5500       7000
+             7000         5000     5500       7500
+             7500         5000     5500       8000
+             8000         5000     5500       8500
+             8500         5000     5500       9000
+
+
+      Case 3:  The 2nd, 4th, 6th, and 8th (last) segments are
+      dropped.
+
+      The data receiver ACKs the first packet normally.  The
+      third, fifth, and seventh packets trigger SACK options as
+      follows:
+
+          Triggering  ACK    First Block   2nd Block     3rd Block
+          Segment            Left   Right  Left   Right  Left   Right
+                             Edge   Edge   Edge   Edge   Edge   Edge
+
+          5000       5500
+          5500       (lost)
+          6000       5500    6000   6500
+          6500       (lost)
+          7000       5500    7000   7500   6000   6500
+          7500       (lost)
+          8000       5500    8000   8500   7000   7500   6000   6500
+          8500       (lost)
+
+
+
+
+
+
+
+
+
+
+
+Mathis, et. al.             Standards Track                     [Page 9]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+      Suppose at this point, the 4th packet is received out of order.
+      (This could either be because the data was badly misordered in the
+      network, or because the 2nd packet was retransmitted and lost, and
+      then the 4th packet was retransmitted). At this point the data
+      receiver has only two SACK blocks to report.  The data receiver
+      replies with the following Selective Acknowledgment:
+
+          Triggering  ACK    First Block   2nd Block     3rd Block
+          Segment            Left   Right  Left   Right  Left   Right
+                             Edge   Edge   Edge   Edge   Edge   Edge
+
+          6500       5500    6000   7500   8000   8500
+
+      Suppose at this point, the 2nd segment is received.  The data
+      receiver then replies with the following Selective Acknowledgment:
+
+          Triggering  ACK    First Block   2nd Block     3rd Block
+          Segment            Left   Right  Left   Right  Left   Right
+                             Edge   Edge   Edge   Edge   Edge   Edge
+
+          5500       7500    8000   8500
+
+8.  Data Receiver Reneging
+
+   Note that the data receiver is permitted to discard data in its queue
+   that has not been acknowledged to the data sender, even if the data
+   has already been reported in a SACK option.  Such discarding of
+   SACKed packets is discouraged, but may be used if the receiver runs
+   out of buffer space.
+
+   The data receiver MAY elect not to keep data which it has reported in
+   a SACK option.  In this case, the receiver SACK generation is
+   additionally qualified:
+
+      * The first SACK block MUST reflect the newest segment.  Even if
+      the newest segment is going to be discarded and the receiver has
+      already discarded adjacent segments, the first SACK block MUST
+      report, at a minimum, the left and right edges of the newest
+      segment.
+
+      * Except for the newest segment, all SACK blocks MUST NOT report
+      any old data which is no longer actually held by the receiver.
+
+   Since the data receiver may later discard data reported in a SACK
+   option, the sender MUST NOT discard data before it is acknowledged by
+   the Acknowledgment Number field in the TCP header.
+
+
+
+
+
+Mathis, et. al.             Standards Track                    [Page 10]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+9.  Security Considerations
+
+   This document neither strengthens nor weakens TCP's current security
+   properties.
+
+10. References
+
+   [Cheriton88]  Cheriton, D., "VMTP: Versatile Message Transaction
+   Protocol", RFC 1045, Stanford University, February 1988.
+
+   [Clark87] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A Bulk Data
+   Transfer Protocol", RFC 998, MIT, March 1987.
+
+   [Fall95]  Fall, K. and Floyd, S., "Comparisons of Tahoe, Reno, and
+   Sack TCP", ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z, December 1995.
+
+   [Floyd96]  Floyd, S.,  "Issues of TCP with SACK",
+   ftp://ftp.ee.lbl.gov/papers/issues_sa.ps.Z, January 1996.
+
+   [Huitema81] Huitema, C., and Valet, I., An Experiment on High Speed
+   File Transfer using Satellite Links, 7th Data Communication
+   Symposium, Mexico, October 1981.
+
+   [Jacobson88] Jacobson, V., "Congestion Avoidance and Control",
+   Proceedings of SIGCOMM '88, Stanford, CA., August 1988.
+
+   [Jacobson88}, Jacobson, V. and R. Braden, "TCP Extensions for Long-
+   Delay Paths", RFC 1072, October 1988.
+
+   [Jacobson92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
+   for High Performance", RFC 1323, May 1992.
+
+   [Keshav94]  Keshav, presentation to the Internet End-to-End Research
+   Group, November 1994.
+
+   [Mathis95]  Mathis, M., and Mahdavi, J., TCP Forward Acknowledgment
+   Option, presentation to the Internet End-to-End Research Group, June
+   1995.
+
+   [Partridge87]  Partridge, C., "Private Communication", February 1987.
+
+   [Postel81]  Postel, J., "Transmission Control Protocol - DARPA
+   Internet Program Protocol Specification", RFC 793, DARPA, September
+   1981.
+
+   [Stevens94] Stevens, W., TCP/IP Illustrated, Volume 1: The Protocols,
+   Addison-Wesley, 1994.
+
+
+
+
+Mathis, et. al.             Standards Track                    [Page 11]
+
+RFC 2018         TCP Selective Acknowledgement Options      October 1996
+
+
+   [Strayer92] Strayer, T., Dempsey, B., and Weaver, A., XTP -- the
+   xpress transfer protocol. Addison-Wesley Publishing Company, 1992.
+
+   [Velten84] Velten, D., Hinden, R., and J. Sax, "Reliable Data
+   Protocol", RFC 908, BBN, July 1984.
+
+11. Authors' Addresses
+
+    Matt Mathis and Jamshid Mahdavi
+    Pittsburgh Supercomputing Center
+    4400 Fifth Ave
+    Pittsburgh, PA 15213
+    mathis@psc.edu
+    mahdavi@psc.edu
+
+    Sally Floyd
+    Lawrence Berkeley National Laboratory
+    One Cyclotron Road
+    Berkeley, CA 94720
+    floyd@ee.lbl.gov
+
+    Allyn Romanow
+    Sun Microsystems, Inc.
+    2550 Garcia Ave., MPK17-202
+    Mountain View, CA 94043
+    allyn@eng.sun.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Mathis, et. al.             Standards Track                    [Page 12]
+
--- a/kernel/picotcp/RFC/rfc2026.txt
+++ b/kernel/picotcp/RFC/rfc2026.txt
--- a/kernel/picotcp/RFC/rfc2131.txt
+++ b/kernel/picotcp/RFC/rfc2131.txt
--- a/kernel/picotcp/RFC/rfc2132.txt
+++ b/kernel/picotcp/RFC/rfc2132.txt
--- a/kernel/picotcp/RFC/rfc2140.txt
+++ b/kernel/picotcp/RFC/rfc2140.txt
@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group                                           J. Touch
+Request for Comments: 2140                                           ISI
+Category: Informational                                       April 1997
+
+
+                   TCP Control Block Interdependence
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  This memo
+   does not specify an Internet standard of any kind.  Distribution of
+   this memo is unlimited.
+
+
+Abstract
+
+   This memo makes the case for interdependent TCP control blocks, where
+   part of the TCP state is shared among similar concurrent connections,
+   or across similar connection instances. TCP state includes a
+   combination of parameters, such as connection state, current round-
+   trip time estimates, congestion control information, and process
+   information.  This state is currently maintained on a per-connection
+   basis in the TCP control block, but should be shared across
+   connections to the same host. The goal is to improve transient
+   transport performance, while maintaining backward-compatibility with
+   existing implementations.
+
+   This document is a product of the LSAM project at ISI.
+
+
+Introduction
+
+   TCP is a connection-oriented reliable transport protocol layered over
+   IP [9]. Each TCP connection maintains state, usually in a data
+   structure called the TCP Control Block (TCB). The TCB contains
+   information about the connection state, its associated local process,
+   and feedback parameters about the connection's transmission
+   properties. As originally specified and usually implemented, the TCB
+   is maintained on a per-connection basis. This document discusses the
+   implications of that decision, and argues for an alternate
+   implementation that shares some of this state across similar
+   connection instances and among similar simultaneous connections. The
+   resulting implementation can have better transient performance,
+   especially for numerous short-lived and simultaneous connections, as
+   often used in the World-Wide Web [1]. These changes affect only the
+   TCB initialization, and so have no effect on the long-term behavior
+   of TCP after a connection has been established.
+
+
+
+
+Touch                        Informational                      [Page 1]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+The TCP Control Block (TCB)
+
+   A TCB is associated with each connection, i.e., with each association
+   of a pair of applications across the network. The TCB can be
+   summarized as containing [9]:
+
+
+        Local process state
+
+            pointers to send and receive buffers
+            pointers to retransmission queue and current segment
+            pointers to Internet Protocol (IP) PCB
+
+        Per-connection shared state
+
+            macro-state
+
+                connection state
+                timers
+                flags
+                local and remote host numbers and ports
+
+            micro-state
+
+                send and receive window state (size*, current number)
+                round-trip time and variance
+                cong. window size*
+                cong. window size threshold*
+                max windows seen*
+                MSS#
+                round-trip time and variance#
+
+
+   The per-connection information is shown as split into macro-state and
+   micro-state, terminology borrowed from [5]. Macro-state describes the
+   finite state machine; we include the endpoint numbers and components
+   (timers, flags) used to help maintain that state. This includes the
+   protocol for establishing and maintaining shared state about the
+   connection. Micro-state describes the protocol after a connection has
+   been established, to maintain the reliability and congestion control
+   of the data transferred in the connection.
+
+   We further distinguish two other classes of shared micro-state that
+   are associated more with host-pairs than with application pairs. One
+   class is clearly host-pair dependent (#, e.g., MSS, RTT), and the
+   other is host-pair dependent in its aggregate (*, e.g., cong. window
+   info., curr. window sizes).
+
+
+
+
+Touch                        Informational                      [Page 2]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+TCB Interdependence
+
+   The observation that some TCB state is host-pair specific rather than
+   application-pair dependent is not new, and is a common engineering
+   decision in layered protocol implementations. A discussion of sharing
+   RTT information among protocols layered over IP, including UDP and
+   TCP, occurred in [8]. T/TCP uses caches to maintain TCB information
+   across instances, e.g., smoothed RTT, RTT variance, congestion
+   avoidance threshold, and MSS [3].  These values are in addition to
+   connection counts used by T/TCP to accelerate data delivery prior to
+   the full three-way handshake during an OPEN. The goal is to aggregate
+   TCB components where they reflect one association - that of the
+   host-pair, rather than artificially separating those components by
+   connection.
+
+   At least one current T/TCP implementation saves the MSS and
+   aggregates the RTT parameters across multiple connections, but omits
+   caching the congestion window information [4], as originally
+   specified in [2]. There may be other values that may be cached, such
+   as current window size, to permit new connections full access to
+   accumulated channel resources.
+
+   We observe that there are two cases of TCB interdependence. Temporal
+   sharing occurs when the TCB of an earlier (now CLOSED) connection to
+   a host is used to initialize some parameters of a new connection to
+   that same host. Ensemble sharing occurs when a currently active
+   connection to a host is used to initialize another (concurrent)
+   connection to that host. T/TCP documents considered the temporal
+   case; we consider both.
+
+An Example of Temporal Sharing
+
+   Temporal sharing of cached TCB data has been implemented in the SunOS
+   4.1.3 T/TCP extensions [4] and the FreeBSD port of same [7]. As
+   mentioned before, only the MSS and RTT parameters are cached, as
+   originally specified in [2]. Later discussion of T/TCP suggested
+   including congestion control parameters in this cache [3].
+
+   The cache is accessed in two ways: it is read to initialize new TCBs,
+   and written when more current per-host state is available. New TCBs
+   are initialized as follows; snd_cwnd reuse is not yet implemented,
+   although discussed in the T/TCP concepts [2]:
+
+
+
+
+
+
+
+
+
+Touch                        Informational                      [Page 3]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+               TEMPORAL SHARING - TCB Initialization
+
+             Cached TCB           New TCB
+             ----------------------------------------
+             old-MSS              old-MSS
+
+             old-RTT              old-RTT
+
+             old-RTTvar           old-RTTvar
+
+             old-snd_cwnd         old-snd_cwnd    (not yet impl.)
+
+
+   Most cached TCB values are updated when a connection closes.  An
+   exception is MSS, which is updated whenever the MSS option is
+   received in a TCP header.
+
+
+                 TEMPORAL SHARING - Cache Updates
+
+    Cached TCB   Current TCB     when?   New Cached TCB
+    ---------------------------------------------------------------
+    old-MSS      curr-MSS        MSSopt  curr-MSS
+
+    old-RTT      curr-RTT        CLOSE   old += (curr - old) >> 2
+
+    old-RTTvar   curr-RTTvar     CLOSE   old += (curr - old) >> 2
+
+    old-snd_cwnd curr-snd_cwnd   CLOSE   curr-snd_cwnd   (not yet impl.)
+
+   MSS caching is trivial; reported values are cached, and the most
+   recent value is used. The cache is updated when the MSS option is
+   received, so the cache always has the most recent MSS value from any
+   connection. The cache is consulted only at connection establishment,
+   and not otherwise updated, which means that MSS options do not affect
+   current connections. The default MSS is never saved; only reported
+   MSS values update the cache, so an explicit override is required to
+   reduce the MSS.
+
+   RTT values are updated by a more complicated mechanism [3], [8].
+   Dynamic RTT estimation requires a sequence of RTT measurements, even
+   though a single T/TCP transaction may not accumulate enough samples.
+   As a result, the cached RTT (and its variance) is an average of its
+   previous value with the contents of the currently active TCB for that
+   host, when a TCB is closed. RTT values are updated only when a
+   connection is closed. Further, the method for averaging the RTT
+   values is not the same as the method for computing the RTT values
+   within a connection, so that the cached value may not be appropriate.
+
+
+
+Touch                        Informational                      [Page 4]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+   For temporal sharing, the cache requires updating only when a
+   connection closes, because the cached values will not yet be used to
+   initialize a new TCB. For the ensemble sharing, this is not the case,
+   as discussed below.
+
+   Other TCB variables may also be cached between sequential instances,
+   such as the congestion control window information. Old cache values
+   can be overwritten with the current TCB estimates, or a MAX or MIN
+   function can be used to merge the results, depending on the optimism
+   or pessimism of the reused values. For example, the congestion window
+   can be reused if there are no concurrent connections.
+
+An Example of Ensemble Sharing
+
+   Sharing cached TCB data across concurrent connections requires
+   attention to the aggregate nature of some of the shared state.
+   Although MSS and RTT values can be shared by copying, it may not be
+   appropriate to copy congestion window information. At this point, we
+   present only the MSS and RTT rules:
+
+
+               ENSEMBLE SHARING - TCB Initialization
+
+               Cached TCB           New TCB
+               ----------------------------------
+               old-MSS              old-MSS
+
+               old-RTT              old-RTT
+
+               old-RTTvar           old-RTTvar
+
+
+
+                    ENSEMBLE SHARING - Cache Updates
+
+      Cached TCB   Current TCB     when?   New Cached TCB
+      -----------------------------------------------------------
+      old-MSS      curr-MSS        MSSopt  curr-MSS
+
+      old-RTT      curr-RTT        update  rtt_update(old,curr)
+
+      old-RTTvar   curr-RTTvar     update  rtt_update(old,curr)
+
+
+   For ensemble sharing, TCB information should be cached as early as
+   possible, sometimes before a connection is closed. Otherwise, opening
+   multiple concurrent connections may not result in TCB data sharing if
+   no connection closes before others open. An optimistic solution would
+
+
+
+Touch                        Informational                      [Page 5]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+   be to update cached data as early as possible, rather than only when
+   a connection is closing. Some T/TCP implementations do this for MSS
+   when the TCP MSS header option is received [4], although it is not
+   addressed specifically in the concepts or functional specification
+   [2][3].
+
+   In current T/TCP, RTT values are updated only after a CLOSE, which
+   does not benefit concurrent sessions. As mentioned in the temporal
+   case, averaging values between concurrent connections requires
+   incorporating new RTT measurements. The amount of work involved in
+   updating the aggregate average should be minimized, but the resulting
+   value should be equivalent to having all values measured within a
+   single connection. The function "rtt_update" in the ensemble sharing
+   table indicates this operation, which occurs whenever the RTT would
+   have been updated in the individual TCP connection. As a result, the
+   cache contains the shared RTT variables, which no longer need to
+   reside in the TCB [8].
+
+   Congestion window size aggregation is more complicated in the
+   concurrent case.  When there is an ensemble of connections, we need
+   to decide how that ensemble would have shared the congestion window,
+   in order to derive initial values for new TCBs. Because concurrent
+   connections between two hosts share network paths (usually), they
+   also share whatever capacity exists along that path.  With regard to
+   congestion, the set of connections might behave as if it were
+   multiplexed prior to TCP, as if all data were part of a single
+   connection. As a result, the current window sizes would maintain a
+   constant sum, presuming sufficient offered load. This would go beyond
+   caching to truly sharing state, as in the RTT case.
+
+   We pause to note that any assumption of this sharing can be
+   incorrect, including this one. In current implementations, new
+   congestion windows are set at an initial value of one segment, so
+   that the sum of the current windows is increased for any new
+   connection. This can have detrimental consequences where several
+   connections share a highly congested link, such as in trans-Atlantic
+   Web access.
+
+   There are several ways to initialize the congestion window in a new
+   TCB among an ensemble of current connections to a host, as shown
+   below. Current TCP implementations initialize it to one segment [9],
+   and T/TCP hinted that it should be initialized to the old window size
+   [3]. In the former, the assumption is that new connections should
+   behave as conservatively as possible. In the latter, no accommodation
+   is made to concurrent aggregate behavior.
+
+   In either case, the sum of window sizes can increase, rather than
+   remain constant. Another solution is to give each pending connection
+
+
+
+Touch                        Informational                      [Page 6]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+   its "fair share" of the available congestion window, and let the
+   connections balance from there. The assumption we make here is that
+   new connections are implicit requests for an equal share of available
+   link bandwidth which should be granted at the expense of current
+   connections. This may or may not be the appropriate function; we
+   propose that it be examined further.
+
+
+                ENSEMBLE SHARING - TCB Initialization
+                Some Options for Sharing Window-size
+
+    Cached TCB                           New TCB
+    -----------------------------------------------------------------
+    old-snd_cwnd         (current)       one segment
+
+                         (T/TCP hint)    old-snd_cwnd
+
+                         (proposed)      old-snd_cwnd/(N+1)
+                                         subtract old-snd_cwnd/(N+1)/N
+                                         from each concurrent
+
+
+                 ENSEMBLE SHARING - Cache Updates
+
+    Cached TCB   Current TCB     when?   New Cached TCB
+    ----------------------------------------------------------------
+    old-snd_cwnd curr-snd_cwnd   update  (adjust sum as appropriate)
+
+
+Compatibility Issues
+
+   Current TCP implementations do not use TCB caching, with the
+   exception of T/TCP variants [4][7]. New connections use the default
+   initial values of all non-instantiated TCB variables. As a result,
+   each connection calculates its own RTT measurements, MSS value, and
+   congestion information. Eventually these values are updated for each
+   connection.
+
+   For the congestion and current window information, the initial values
+   may not be consistent with the long-term aggregate behavior of a set
+   of concurrent connections. If a single connection has a window of 4
+   segments, new connections assume initial windows of 1 segment (the
+   minimum), although the current connection's window doesn't decrease
+   to accommodate this additional load. As a result, connections can
+   mutually interfere. One example of this has been seen on trans-
+   Atlantic links, where concurrent connections supporting Web traffic
+   can collide because their initial windows are too large, even when
+   set at one segment.
+
+
+
+Touch                        Informational                      [Page 7]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+   Because this proposal attempts to anticipate the aggregate steady-
+   state values of TCB state among a group or over time, it should avoid
+   the transient effects of new connections. In addition, because it
+   considers the ensemble and temporal properties of those aggregates,
+   it should also prevent the transients of short-lived or multiple
+   concurrent connections from adversely affecting the overall network
+   performance. We are performing analysis and experiments to validate
+   these assumptions.
+
+Performance Considerations
+
+   Here we attempt to optimize transient behavior of TCP without
+   modifying its long-term properties. The predominant expense is in
+   maintaining the cached values, or in using per-host state rather than
+   per-connection state. In cases where performance is affected,
+   however, we note that the per-host information can be kept in per-
+   connection copies (as done now), because with higher performance
+   should come less interference between concurrent connections.
+
+   Sharing TCB state can occur only at connection establishment and
+   close (to update the cache), to minimize overhead, optimize transient
+   behavior, and minimize the effect on the steady-state. It is possible
+   that sharing state during a connection, as in the RTT or window-size
+   variables, may be of benefit, provided its implementation cost is not
+   high.
+
+Implications
+
+   There are several implications to incorporating TCB interdependence
+   in TCP implementations. First, it may prevent the need for
+   application-layer multiplexing for performance enhancement [6].
+   Protocols like persistent-HTTP avoid connection reestablishment costs
+   by serializing or multiplexing a set of per-host connections across a
+   single TCP connection. This avoids TCP's per-connection OPEN
+   handshake, and also avoids recomputing MSS, RTT, and congestion
+   windows. By avoiding the so-called, "slow-start restart," performance
+   can be optimized. Our proposal provides the MSS, RTT, and OPEN
+   handshake avoidance of T/TCP, and the "slow-start restart avoidance"
+   of multiplexing, without requiring a multiplexing mechanism at the
+   application layer. This multiplexing will be complicated when
+   quality-of-service mechanisms (e.g., "integrated services
+   scheduling") are provided later.
+
+   Second, we are attempting to push some of the TCP implementation from
+   the traditional transport layer (in the ISO model [10]), to the
+   network layer. This acknowledges that some state currently maintained
+   as per-connection is in fact per-path, which we simplify as per-
+   host-pair. Transport protocols typically manage per-application-pair
+
+
+
+Touch                        Informational                      [Page 8]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+   associations (per stream), and network protocols manage per-path
+   associations (routing). Round-trip time, MSS, and congestion
+   information is more appropriately handled in a network-layer fashion,
+   aggregated among concurrent connections, and shared across connection
+   instances.
+
+   An earlier version of RTT sharing suggested implementing RTT state at
+   the IP layer, rather than at the TCP layer [8]. Our observations are
+   for sharing state among TCP connections, which avoids some of the
+   difficulties in an IP-layer solution. One such problem is determining
+   the associated prior outgoing packet for an incoming packet, to infer
+   RTT from the exchange. Because RTTs are still determined inside the
+   TCP layer, this is simpler than at the IP layer. This is a case where
+   information should be computed at the transport layer, but shared at
+   the network layer.
+
+   We also note that per-host-pair associations are not the limit of
+   these techniques. It is possible that TCBs could be similarly shared
+   between hosts on a LAN, because the predominant path can be LAN-LAN,
+   rather than host-host.
+
+   There may be other information that can be shared between concurrent
+   connections. For example, knowing that another connection has just
+   tried to expand its window size and failed, a connection may not
+   attempt to do the same for some period. The idea is that existing TCP
+   implementations infer the behavior of all competing connections,
+   including those within the same host or LAN. One possible
+   optimization is to make that implicit feedback explicit, via extended
+   information in the per-host TCP area.
+
+Security Considerations
+
+   These suggested implementation enhancements do not have additional
+   ramifications for direct attacks. These enhancements may be
+   susceptible to denial-of-service attacks if not otherwise secured.
+   For example, an application can open a connection and set its window
+   size to 0, denying service to any other subsequent connection between
+   those hosts.
+
+   TCB sharing may be susceptible to denial-of-service attacks, wherever
+   the TCB is shared, between connections in a single host, or between
+   hosts if TCB sharing is implemented on the LAN (see Implications
+   section).  Some shared TCB parameters are used only to create new
+   TCBs, others are shared among the TCBs of ongoing connections. New
+   connections can join the ongoing set, e.g., to optimize send window
+   size among a set of connections to the same host.
+
+
+
+
+
+Touch                        Informational                      [Page 9]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+   Attacks on parameters used only for initialization affect only the
+   transient performance of a TCP connection.  For short connections,
+   the performance ramification can approach that of a denial-of-service
+   attack.  E.g., if an application changes its TCB to have a false and
+   small window size, subsequent connections would experience
+   performance degradation until their window grew appropriately.
+
+   The solution is to limit the effect of compromised TCB values.  TCBs
+   are compromised when they are modified directly by an application or
+   transmitted between hosts via unauthenticated means (e.g., by using a
+   dirty flag). TCBs that are not compromised by application
+   modification do not have any unique security ramifications. Note that
+   the proposed parameters for TCB sharing are not currently modifiable
+   by an application.
+
+   All shared TCBs MUST be validated against default minimum parameters
+   before used for new connections. This validation would not impact
+   performance, because it occurs only at TCB initialization.  This
+   limits the effect of attacks on new connections, to reducing the
+   benefit of TCB sharing, resulting in the current default TCP
+   performance. For ongoing connections, the effect of incoming packets
+   on shared information should be both limited and validated against
+   constraints before use. This is a beneficial precaution for existing
+   TCP implementations as well.
+
+   TCBs modified by an application SHOULD not be shared, unless the new
+   connection sharing the compromised information has been given
+   explicit permission to use such information by the connection API. No
+   mechanism for that indication currently exists, but it could be
+   supported by an augmented API. This sharing restriction SHOULD be
+   implemented in both the host and the LAN. Sharing on a LAN SHOULD
+   utilize authentication to prevent undetected tampering of shared TCB
+   parameters. These restrictions limit the security impact of modified
+   TCBs both for connection initialization and for ongoing connections.
+
+   Finally, shared values MUST be limited to performance factors only.
+   Other information, such as TCP sequence numbers, when shared, are
+   already known to compromise security.
+
+Acknowledgements
+
+   The author would like to thank the members of the High-Performance
+   Computing and Communications Division at ISI, notably Bill Manning,
+   Bob Braden, Jon Postel, Ted Faber, and Cliff Neuman for their
+   assistance in the development of this memo.
+
+
+
+
+
+
+Touch                        Informational                     [Page 10]
+
+RFC 2140           TCP Control Block Interdependence          April 1997
+
+
+References
+
+   [1] Berners-Lee, T., et al., "The World-Wide Web," Communications of
+       the ACM, V37, Aug. 1994, pp. 76-82.
+
+   [2] Braden, R., "Transaction TCP -- Concepts," RFC-1379,
+       USC/Information Sciences Institute, September 1992.
+
+   [3] Braden, R., "T/TCP -- TCP Extensions for Transactions Functional
+       Specification," RFC-1644, USC/Information Sciences Institute,
+       July 1994.
+
+   [4] Braden, B., "T/TCP -- Transaction TCP: Source Changes for Sun OS
+       4.1.3,", Release 1.0, USC/ISI, September 14, 1994.
+
+   [5] Comer, D., and Stevens, D., Internetworking with TCP/IP, V2,
+       Prentice-Hall, NJ, 1991.
+
+   [6] Fielding, R., et al., "Hypertext Transfer Protocol -- HTTP/1.1,"
+       Work in Progress.
+
+   [7] FreeBSD source code, Release 2.10, <http://www.freebsd.org/>.
+
+   [8] Jacobson, V., (mail to public list "tcp-ip", no archive found),
+       1986.
+
+   [9] Postel, Jon, "Transmission Control Protocol," Network Working
+       Group RFC-793/STD-7, ISI, Sept. 1981.
+
+   [10] Tannenbaum, A., Computer Networks, Prentice-Hall, NJ, 1988.
+
+Author's Address
+
+   Joe Touch
+   University of Southern California/Information Sciences Institute
+   4676 Admiralty Way
+   Marina del Rey, CA 90292-6695
+   USA
+   Phone: +1 310-822-1511 x151
+   Fax:   +1 310-823-6714
+   URL:   http://www.isi.edu/~touch
+   Email: touch@isi.edu
+
+
+
+
+
+
+
+
+
+Touch                        Informational                     [Page 11]
+
--- a/kernel/picotcp/RFC/rfc2347.txt
+++ b/kernel/picotcp/RFC/rfc2347.txt
@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group                                          G. Malkin
+Request for Commments: 2347                                 Bay Networks
+Updates: 1350                                                  A. Harkin
+Obsoletes: 1782                                      Hewlett Packard Co.
+Category: Standards Track                                       May 1998
+
+
+                         TFTP Option Extension
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+Abstract
+
+   The Trivial File Transfer Protocol [1] is a simple, lock-step, file
+   transfer protocol which allows a client to get or put a file onto a
+   remote host.  This document describes a simple extension to TFTP to
+   allow option negotiation prior to the file transfer.
+
+Introduction
+
+   The option negotiation mechanism proposed in this document is a
+   backward-compatible extension to the TFTP protocol.  It allows file
+   transfer options to be negotiated prior to the transfer using a
+   mechanism which is consistent with TFTP's Request Packet format.  The
+   mechanism is kept simple by enforcing a request-respond-acknowledge
+   sequence, similar to the lock-step approach taken by TFTP itself.
+
+   While the option negotiation mechanism is general purpose, in that
+   many types of options may be negotiated, it was created to support
+   the Blocksize option defined in [2].  Additional options are defined
+   in [3].
+
+Packet Formats
+
+   TFTP options are appended to the Read Request and Write Request
+   packets.  A new type of TFTP packet, the Option Acknowledgment
+   (OACK), is used to acknowledge a client's option negotiation request.
+   A new error code, 8, is hereby defined to indicate that a transfer
+
+
+
+Malkin & Harkin             Standards Track                     [Page 1]
+
+RFC 2347                 TFTP Option Extension                  May 1998
+
+
+   should be terminated due to option negotiation.
+
+   Options are appended to a TFTP Read Request or Write Request packet
+   as follows:
+
+      +-------+---~~---+---+---~~---+---+---~~---+---+---~~---+---+-->
+      |  opc  |filename| 0 |  mode  | 0 |  opt1  | 0 | value1 | 0 | <
+      +-------+---~~---+---+---~~---+---+---~~---+---+---~~---+---+-->
+
+       >-------+---+---~~---+---+
+      <  optN  | 0 | valueN | 0 |
+       >-------+---+---~~---+---+
+
+      opc
+         The opcode field contains either a 1, for Read Requests, or 2,
+         for Write Requests, as defined in [1].
+
+      filename
+         The name of the file to be read or written, as defined in [1].
+         This is a NULL-terminated field.
+
+      mode
+         The mode of the file transfer: "netascii", "octet", or "mail",
+         as defined in [1].  This is a NULL-terminated field.
+
+      opt1
+         The first option, in case-insensitive ASCII (e.g., blksize).
+         This is a NULL-terminated field.
+
+      value1
+         The value associated with the first option, in case-
+         insensitive ASCII.  This is a NULL-terminated field.
+
+      optN, valueN
+         The final option/value pair.  Each NULL-terminated field is
+         specified in case-insensitive ASCII.
+
+   The options and values are all NULL-terminated, in keeping with the
+   original request format.  If multiple options are to be negotiated,
+   they are appended to each other.  The order in which options are
+   specified is not significant.  The maximum size of a request packet
+   is 512 octets.
+
+   The OACK packet has the following format:
+
+
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 2]
+
+RFC 2347                 TFTP Option Extension                  May 1998
+
+
+      +-------+---~~---+---+---~~---+---+---~~---+---+---~~---+---+
+      |  opc  |  opt1  | 0 | value1 | 0 |  optN  | 0 | valueN | 0 |
+      +-------+---~~---+---+---~~---+---+---~~---+---+---~~---+---+
+
+      opc
+         The opcode field contains a 6, for Option Acknowledgment.
+
+      opt1
+         The first option acknowledgment, copied from the original
+         request.
+
+      value1
+         The acknowledged value associated with the first option.  If
+         and how this value may differ from the original request is
+         detailed in the specification for the option.
+
+      optN, valueN
+         The final option/value acknowledgment pair.
+
+Negotiation Protocol
+
+   The client appends options at the end of the Read Request or Write
+   request packet, as shown above.  Any number of options may be
+   specified; however, an option may only be specified once.  The order
+   of the options is not significant.
+
+   If the server supports option negotiation, and it recognizes one or
+   more of the options specified in the request packet, the server may
+   respond with an Options Acknowledgment (OACK).  Each option the
+   server recognizes, and accepts the value for, is included in the
+   OACK.  Some options may allow alternate values to be proposed, but
+   this is an option specific feature.  The server must not include in
+   the OACK any option which had not been specifically requested by the
+   client; that is, only the client may initiate option negotiation.
+   Options which the server does not support should be omitted from the
+   OACK; they should not cause an ERROR packet to be generated.  If the
+   value of a supported option is invalid, the specification for that
+   option will indicate whether the server should simply omit the option
+   from the OACK, respond with an alternate value, or send an ERROR
+   packet, with error code 8, to terminate the transfer.
+
+   An option not acknowledged by the server must be ignored by the
+   client and server as if it were never requested.  If multiple options
+   were requested, the client must use those options which were
+   acknowledged by the server and must not use those options which were
+   not acknowledged by the server.
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 3]
+
+RFC 2347                 TFTP Option Extension                  May 1998
+
+
+   When the client appends options to the end of a Read Request packet,
+   three possible responses may be returned by the server:
+
+      OACK  - acknowledge of Read Request and the options;
+
+      DATA  - acknowledge of Read Request, but not the options;
+
+      ERROR - the request has been denied.
+
+   When the client appends options to the end of a Write Request packet,
+   three possible responses may be returned by the server:
+
+      OACK  - acknowledge of Write Request and the options;
+
+      ACK   - acknowledge of Write Request, but not the options;
+
+      ERROR - the request has been denied.
+
+   If a server implementation does not support option negotiation, it
+   will likely ignore any options appended to the client's request.  In
+   this case, the server will return a DATA packet for a Read Request
+   and an ACK packet for a Write Request establishing normal TFTP data
+   transfer.  In the event that a server returns an error for a request
+   which carries an option, the client may attempt to repeat the request
+   without appending any options.  This implementation option would
+   handle servers which consider extraneous data in the request packet
+   to be erroneous.
+
+   Depending on the original transfer request there are two ways for a
+   client to confirm acceptance of a server's OACK.  If the transfer was
+   initiated with a Read Request, then an ACK (with the data block
+   number set to 0) is sent by the client to confirm the values in the
+   server's OACK packet.  If the transfer was initiated with a Write
+   Request, then the client begins the transfer with the first DATA
+   packet, using the negotiated values.  If the client rejects the OACK,
+   then it sends an ERROR packet, with error code 8, to the server and
+   the transfer is terminated.
+
+   Once a client acknowledges an OACK, with an appropriate non-error
+   response, that client has agreed to use only the options and values
+   returned by the server.  Remember that the server cannot request an
+   option; it can only respond to them.  If the client receives an OACK
+   containing an unrequested option, it should respond with an ERROR
+   packet, with error code 8, and terminate the transfer.
+
+
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 4]
+
+RFC 2347                 TFTP Option Extension                  May 1998
+
+
+Examples
+
+   Read Request
+
+      client                                           server
+      -------------------------------------------------------
+      |1|foofile|0|octet|0|blksize|0|1432|0|  -->               RRQ
+                                    <--  |6|blksize|0|1432|0|   OACK
+      |4|0|  -->                                                ACK
+                             <--  |3|1| 1432 octets of data |   DATA
+      |4|1|  -->                                                ACK
+                             <--  |3|2| 1432 octets of data |   DATA
+      |4|2|  -->                                                ACK
+                             <--  |3|3|<1432 octets of data |   DATA
+      |4|3|  -->                                                ACK
+
+   Write Request
+
+      client                                           server
+      -------------------------------------------------------
+      |2|barfile|0|octet|0|blksize|0|2048|0|  -->               RRQ
+                                    <--  |6|blksize|0|2048|0|   OACK
+      |3|1| 2048 octets of data |  -->                          DATA
+                                                   <--  |4|1|   ACK
+      |3|2| 2048 octets of data |  -->                          DATA
+                                                   <--  |4|2|   ACK
+      |3|3|<2048 octets of data |  -->                          DATA
+                                                      <--  |4|3|   ACK
+
+Security Considerations
+
+   The basic TFTP protocol has no security mechanism.  This is why it
+   has no rename, delete, or file overwrite capabilities.  This document
+   does not add any security to TFTP; however, the specified extensions
+   do not add any additional security risks.
+
+References
+
+   [1] Sollins, K., "The TFTP Protocol (Revision 2)", STD 33, RFC 1350,
+       October 1992.
+
+   [2] Malkin, G., and A. Harkin, "TFTP Blocksize Option", RFC 2348,
+       May 1998.
+
+   [3] Malkin, G., and A. Harkin, "TFTP Timeout Interval and Transfer
+       Size Options", RFC 2349, May 1998.
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 5]
+
+RFC 2347                 TFTP Option Extension                  May 1998
+
+
+Authors' Addresses
+
+   Gary Scott Malkin
+   Bay Networks
+   8 Federal Street
+   Billerica, MA  01821
+
+   Phone:  (978) 916-4237
+   EMail:  gmalkin@baynetworks.com
+
+
+   Art Harkin
+   Internet Services Project
+   Information Networks Division
+   19420 Homestead Road MS 43LN
+   Cupertino, CA  95014
+
+   Phone: (408) 447-3755
+   EMail: ash@cup.hp.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 6]
+
+RFC 2347                 TFTP Option Extension                  May 1998
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 7]
+
--- a/kernel/picotcp/RFC/rfc2349.txt
+++ b/kernel/picotcp/RFC/rfc2349.txt
@ -0,0 +1,283 @@
+
+
+
+
+
+
+Network Working Group                                          G. Malkin
+Request for Commments: 2349                                 Bay Networks
+Updates: 1350                                                  A. Harkin
+Obsoletes: 1784                                      Hewlett Packard Co.
+Category: Standards Track                                       May 1998
+
+
+            TFTP Timeout Interval and Transfer Size Options
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+Abstract
+
+   The Trivial File Transfer Protocol [1] is a simple, lock-step, file
+   transfer protocol which allows a client to get or put a file onto a
+   remote host.
+
+   This document describes two TFTP options. The first allows the client
+   and server to negotiate the Timeout Interval.  The second allows the
+   side receiving the file to determine the ultimate size of the
+   transfer before it begins.  The TFTP Option Extension mechanism is
+   described in [2].
+
+Timeout Interval Option Specification
+
+   The TFTP Read Request or Write Request packet is modified to include
+   the timeout option as follows:
+
+      +-------+---~~---+---+---~~---+---+---~~---+---+---~~---+---+
+      |  opc  |filename| 0 |  mode  | 0 | timeout| 0 |  #secs | 0 |
+      +-------+---~~---+---+---~~---+---+---~~---+---+---~~---+---+
+
+      opc
+         The opcode field contains either a 1, for Read Requests, or 2,
+         for Write Requests, as defined in [1].
+
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 1]
+
+RFC 2349    TFTP Timeout Interval and Transfer Size Options     May 1998
+
+
+      filename
+         The name of the file to be read or written, as defined in [1].
+         This is a NULL-terminated field.
+
+      mode
+         The mode of the file transfer: "netascii", "octet", or "mail",
+         as defined in [1].  This is a NULL-terminated field.
+
+      timeout
+         The Timeout Interval option, "timeout" (case in-sensitive).
+         This is a NULL-terminated field.
+
+      #secs
+         The number of seconds to wait before retransmitting, specified
+         in ASCII.  Valid values range between "1" and "255" seconds,
+         inclusive.  This is a NULL-terminated field.
+
+   For example:
+
+      +-------+--------+---+--------+---+--------+---+-------+---+
+      |   1   | foobar | 0 | octet  | 0 | timeout| 0 |   1   | 0 |
+      +-------+--------+---+--------+---+--------+---+-------+---+
+
+   is a Read Request, for the file named "foobar", in octet (binary)
+   transfer mode, with a timeout interval of 1 second.
+
+   If the server is willing to accept the timeout option, it sends an
+   Option Acknowledgment (OACK) to the client.  The specified timeout
+   value must match the value specified by the client.
+
+Transfer Size Option Specification
+
+   The TFTP Read Request or Write Request packet is modified to include
+   the tsize option as follows:
+
+      +-------+---~~---+---+---~~---+---+---~~---+---+---~~---+---+
+      |  opc  |filename| 0 |  mode  | 0 | tsize  | 0 |  size  | 0 |
+      +-------+---~~---+---+---~~---+---+---~~---+---+---~~---+---+
+
+      opc
+         The opcode field contains either a 1, for Read Requests, or 2,
+         for Write Requests, as defined in [1].
+
+      filename
+         The name of the file to be read or written, as defined in [1].
+         This is a NULL-terminated field.
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 2]
+
+RFC 2349    TFTP Timeout Interval and Transfer Size Options     May 1998
+
+
+      mode
+         The mode of the file transfer: "netascii", "octet", or "mail",
+         as defined in [1].  This is a NULL-terminated field.
+
+      tsize
+         The Transfer Size option, "tsize" (case in-sensitive).  This is
+         a NULL-terminated field.
+
+      size
+         The size of the file to be transfered.  This is a NULL-
+         terminated field.
+
+   For example:
+
+      +-------+--------+---+--------+---+--------+---+--------+---+
+      |   2   | foobar | 0 | octet  | 0 | tsize  | 0 | 673312 | 0 |
+      +-------+--------+---+--------+---+--------+---+--------+---+
+
+   is a Write Request, with the 673312-octet file named "foobar", in
+   octet (binary) transfer mode.
+
+   In Read Request packets, a size of "0" is specified in the request
+   and the size of the file, in octets, is returned in the OACK.  If the
+   file is too large for the client to handle, it may abort the transfer
+   with an Error packet (error code 3).  In Write Request packets, the
+   size of the file, in octets, is specified in the request and echoed
+   back in the OACK.  If the file is too large for the server to handle,
+   it may abort the transfer with an Error packet (error code 3).
+
+Security Considerations
+
+   The basic TFTP protocol has no security mechanism.  This is why it
+   has no rename, delete, or file overwrite capabilities.  This document
+   does not add any security to TFTP; however, the specified extensions
+   do not add any additional security risks.
+
+References
+
+   [1] Sollins, K., "The TFTP Protocol (Revision 2)", STD 33, RFC 1350,
+       October 92.
+
+   [2] Malkin, G., and A. Harkin, "TFTP Option Extension", RFC 2347,
+       May 1998.
+
+
+
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 3]
+
+RFC 2349    TFTP Timeout Interval and Transfer Size Options     May 1998
+
+
+Authors' Addresses
+
+   Gary Scott Malkin
+   Bay Networks
+   8 Federal Street
+   Billerica, MA  01821
+
+   Phone:  (978) 916-4237
+   EMail:  gmalkin@baynetworks.com
+
+
+   Art Harkin
+   Internet Services Project
+   Information Networks Division
+   19420 Homestead Road MS 43LN
+   Cupertino, CA  95014
+
+   Phone: (408) 447-3755
+   EMail: ash@cup.hp.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 4]
+
+RFC 2349    TFTP Timeout Interval and Transfer Size Options     May 1998
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Malkin & Harkin             Standards Track                     [Page 5]
+
--- a/kernel/picotcp/RFC/rfc2385.txt
+++ b/kernel/picotcp/RFC/rfc2385.txt
@ -0,0 +1,339 @@
+
+
+
+
+
+
+Network Working Group                                       A. Heffernan
+Request for Comments: 2385                                 cisco Systems
+Category: Standards Track                                    August 1998
+
+
+      Protection of BGP Sessions via the TCP MD5 Signature Option
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+IESG Note
+
+   This document describes currrent existing practice for securing BGP
+   against certain simple attacks.  It is understood to have security
+   weaknesses against concerted attacks.
+
+Abstract
+
+   This memo describes a TCP extension to enhance security for BGP.  It
+   defines a new TCP option for carrying an MD5 [RFC1321] digest in a
+   TCP segment.  This digest acts like a signature for that segment,
+   incorporating information known only to the connection end points.
+   Since BGP uses TCP as its transport, using this option in the way
+   described in this paper significantly reduces the danger from certain
+   security attacks on BGP.
+
+1.0  Introduction
+
+   The primary motivation for this option is to allow BGP to protect
+   itself against the introduction of spoofed TCP segments into the
+   connection stream.  Of particular concern are TCP resets.
+
+   To spoof a connection using the scheme described in this paper, an
+   attacker would not only have to guess TCP sequence numbers, but would
+   also have had to obtain the password included in the MD5 digest.
+   This password never appears in the connection stream, and the actual
+   form of the password is up to the application.  It could even change
+
+
+
+
+
+Heffernan                   Standards Track                     [Page 1]
+
+RFC 2385                TCP MD5 Signature Option             August 1998
+
+
+   during the lifetime of a particular connection so long as this change
+   was synchronized on both ends (although retransmission can become
+   problematical in some TCP implementations with changing passwords).
+
+   Finally, there is no negotiation for the use of this option in a
+   connection, rather it is purely a matter of site policy whether or
+   not its connections use the option.
+
+2.0  Proposal
+
+   Every segment sent on a TCP connection to be protected against
+   spoofing will contain the 16-byte MD5 digest produced by applying the
+   MD5 algorithm to these items in the following order:
+
+       1. the TCP pseudo-header (in the order: source IP address,
+          destination IP address, zero-padded protocol number, and
+          segment length)
+       2. the TCP header, excluding options, and assuming a checksum of
+          zero
+       3. the TCP segment data (if any)
+       4. an independently-specified key or password, known to both TCPs
+          and presumably connection-specific
+
+   The header and pseudo-header are in network byte order.  The nature
+   of the key is deliberately left unspecified, but it must be known by
+   both ends of the connection.  A particular TCP implementation will
+   determine what the application may specify as the key.
+
+   Upon receiving a signed segment, the receiver must validate it by
+   calculating its own digest from the same data (using its own key) and
+   comparing the two digest.  A failing comparison must result in the
+   segment being dropped and must not produce any response back to the
+   sender.  Logging the failure is probably advisable.
+
+   Unlike other TCP extensions (e.g., the Window Scale option
+   [RFC1323]), the absence of the option in the SYN,ACK segment must not
+   cause the sender to disable its sending of signatures.  This
+   negotiation is typically done to prevent some TCP implementations
+   from misbehaving upon receiving options in non-SYN segments.  This is
+   not a problem for this option, since the SYN,ACK sent during
+   connection negotiation will not be signed and will thus be ignored.
+   The connection will never be made, and non-SYN segments with options
+   will never be sent.  More importantly, the sending of signatures must
+   be under the complete control of the application, not at the mercy of
+   the remote host not understanding the option.
+
+
+
+
+
+
+Heffernan                   Standards Track                     [Page 2]
+
+RFC 2385                TCP MD5 Signature Option             August 1998
+
+
+3.0  Syntax
+
+   The proposed option has the following format:
+
+             +---------+---------+-------------------+
+             | Kind=19 |Length=18|   MD5 digest...   |
+             +---------+---------+-------------------+
+             |                                       |
+             +---------------------------------------+
+             |                                       |
+             +---------------------------------------+
+             |                                       |
+             +-------------------+-------------------+
+             |                   |
+             +-------------------+
+
+   The MD5 digest is always 16 bytes in length, and the option would
+   appear in every segment of a connection.
+
+4.0  Some Implications
+
+4.1  Connectionless Resets
+
+   A connectionless reset will be ignored by the receiver of the reset,
+   since the originator of that reset does not know the key, and so
+   cannot generate the proper signature for the segment.  This means,
+   for example, that connection attempts by a TCP which is generating
+   signatures to a port with no listener will time out instead of being
+   refused.  Similarly, resets generated by a TCP in response to
+   segments sent on a stale connection will also be ignored.
+   Operationally this can be a problem since resets help BGP recover
+   quickly from peer crashes.
+
+4.2  Performance
+
+   The performance hit in calculating digests may inhibit the use of
+   this option.  Some measurements of a sample implementation showed
+   that on a 100 MHz R4600, generating a signature for simple ACK
+   segment took an average of 0.0268 ms, while generating a signature
+   for a data segment carrying 4096 bytes of data took 0.8776 ms on
+   average.  These times would be applied to both the input and output
+   paths, with the input path also bearing the cost of a 16-byte
+   compare.
+
+
+
+
+
+
+
+
+Heffernan                   Standards Track                     [Page 3]
+
+RFC 2385                TCP MD5 Signature Option             August 1998
+
+
+4.3  TCP Header Size
+
+   As with other options that are added to every segment, the size of
+   the MD5 option must be factored into the MSS offered to the other
+   side during connection negotiation.  Specifically, the size of the
+   header to subtract from the MTU (whether it is the MTU of the
+   outgoing interface or IP's minimal MTU of 576 bytes) is now at least
+   18 bytes larger.
+
+   The total header size is also an issue.  The TCP header specifies
+   where segment data starts with a 4-bit field which gives the total
+   size of the header (including options) in 32-byte words.  This means
+   that the total size of the header plus option must be less than or
+   equal to 60 bytes -- this leaves 40 bytes for options.
+
+   As a concrete example, 4.4BSD defaults to sending window-scaling and
+   timestamp information for connections it initiates.  The most loaded
+   segment will be the initial SYN packet to start the connection.  With
+   MD5 signatures, the SYN packet will contain the following:
+
+       -- 4 bytes MSS option
+       -- 4 bytes window scale option (3 bytes padded to 4 in 4.4BSD)
+       -- 12 bytes for timestamp (4.4BSD pads the option as recommended
+          in RFC 1323 Appendix A)
+       -- 18 bytes for MD5 digest
+       -- 2 bytes for end-of-option-list, to pad to a 32-bit boundary.
+
+       This sums to 40 bytes, which just makes it.
+
+4.4  MD5 as a Hashing Algorithm
+
+   Since this memo was first issued (under a different title), the MD5
+   algorithm has been found to be vulnerable to collision search attacks
+   [Dobb], and is considered by some to be insufficiently strong for
+   this type of application.
+
+   This memo still specifies the MD5 algorithm, however, since the
+   option has already been deployed operationally, and there was no
+   "algorithm type" field defined to allow an upgrade using the same
+   option number.  The original document did not specify a type field
+   since this would require at least one more byte, and it was felt at
+   the time that taking 19 bytes for the complete option (which would
+   probably be padded to 20 bytes in TCP implementations) would be too
+   much of a waste of the already limited option space.
+
+
+
+
+
+
+
+Heffernan                   Standards Track                     [Page 4]
+
+RFC 2385                TCP MD5 Signature Option             August 1998
+
+
+   This does not prevent the deployment of another similar option which
+   uses another hashing algorithm (like SHA-1).  Also, if most
+   implementations pad the 18 byte option as defined to 20 bytes anyway,
+   it would be just as well to define a new option which contains an
+   algorithm type field.
+
+   This would need to be addressed in another document, however.
+
+4.5 Key configuration
+
+   It should be noted that the key configuration mechanism of routers
+   may restrict the possible keys that may be used between peers.  It is
+   strongly recommended that an implementation be able to support at
+   minimum a key composed of a string of printable ASCII of 80 bytes or
+   less, as this is current practice.
+
+5.0 Security Considerations
+
+   This document defines a weak but currently practiced security
+   mechanism for BGP.  It is anticipated that future work will provide
+   different stronger mechanisms for dealing with these issues.
+
+6.0  References
+
+   [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm," RFC 1321,
+             April 1992.
+
+   [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
+             for High Performance", RFC 1323, May 1992.
+
+   [Dobb] H. Dobbertin, "The Status of MD5 After a Recent Attack", RSA
+             Labs' CryptoBytes, Vol. 2 No. 2, Summer 1996.
+             http://www.rsa.com/rsalabs/pubs/cryptobytes.html
+
+Author's Address
+
+   Andy Heffernan
+   cisco Systems
+   170 West Tasman Drive
+   San Jose, CA  95134  USA
+
+   Phone:  +1 408 526-8115
+   EMail:  ahh@cisco.com
+
+
+
+
+
+
+
+
+Heffernan                   Standards Track                     [Page 5]
+
+RFC 2385                TCP MD5 Signature Option             August 1998
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Heffernan                   Standards Track                     [Page 6]
+
--- a/kernel/picotcp/RFC/rfc2398.txt
+++ b/kernel/picotcp/RFC/rfc2398.txt
@ -0,0 +1,843 @@
+
+
+
+
+
+
+Network Working Group                                          S. Parker
+Request for Comments: 2398                                  C. Schmechel
+FYI: 33                                           Sun Microsystems, Inc.
+Category: Informational                                      August 1998
+
+
+                Some Testing Tools for TCP Implementors
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+1. Introduction
+
+   Available tools for testing TCP implementations are catalogued by
+   this memo.  Hopefully disseminating this information will encourage
+   those responsible for building and maintaining TCP to make the best
+   use of available tests.  The type of testing the tool provides, the
+   type of tests it is capable of doing, and its availability is
+   enumerated.  This document lists only tools which can evaluate one or
+   more TCP implementations, or which can privde some specific results
+   which describe or evaluate the TCP being tested. A number of these
+   tools produce time-sequence plots, see
+
+   Tim Shepard's thesis [She91] for a general discussion of these plots.
+
+   Each tools is defined as follows:
+
+ Name
+
+   The name associated with the testing tool.
+
+ Category
+
+   One or more categories of tests which the tools are capable of
+   providing.  Categories used are: functional correctness, performance,
+   stress.  Functional correctness tests how stringent a TCP
+   implementation is to the RFC specifications.  Performance tests how
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                      [Page 1]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+   quickly a TCP implementation can send and receive data, etc.  Stress
+   tests how a TCP implementation is effected under high load
+   conditions.
+
+ Description
+
+   A description of the tools construction, and the implementation
+   methodology of the tests.
+
+ Automation
+
+   What steps are required to complete the test?  What human
+   intervention is required?
+
+ Availability
+
+   How do you retrieve this tool and get more information about it?
+
+ Required Environment
+
+   Compilers, OS version, etc. required to build and/or run the
+   associated tool.
+
+ References
+
+   A list of publications relating to the tool, if any.
+
+2. Tools
+
+2.1.  Dbs
+
+ Author
+   Yukio Murayama
+
+ Category
+   Performance / Stress
+
+ Description
+   Dbs is a tool which allows multiple data transfers to be coordinated,
+   and the resulting TCP behavior to be reviewed.  Results are presented
+   as ASCII log files.
+
+ Automation
+   Command of execution is driven by a script file.
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                      [Page 2]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+ Availability
+   See http://www.ai3.net/products/dbs for details of precise OS
+   versions supported, and for download of the source code.  Current
+   implementation supports BSDI BSD/OS, Linux, mkLinux, SunOS, IRIX,
+   Ultrix, NEWS OS, HP-UX. Other environments are likely easy to add.
+
+ Required Environment
+   C language compiler, UNIX-style socket API support.
+
+2.2.  Dummynet
+
+ Author
+   Luigi Rizzo
+
+ Category
+   Functional Correctness / Performance
+
+ Description
+   Dummynet is a tool which simulates the presence of finite size
+   queues, bandwidth limitations, and communication delays.  Dummynet
+   inserts between two layers of the protocol stack (in the current
+   implementation between TCP and IP), simulating the above effects in
+   an operational system.  This way experiments can be done using real
+   protocol implementations and real applications, even running on the
+   same host (dummynet also intercepts communications on the loopback
+   interface).  Reconfiguration of dummynet parameters (delay, queue
+   size, bandwidth) can be done on the fly by using a sysctl call. The
+   overhead of dummynet is extremely low.
+
+ Automation
+   Requires merging diff files with kernel source code.  Command-line
+   driven through the sysctl command to modify kernel variables.
+
+ Availability
+   See http://www.iet.unipi.it/~luigi/research.html or e-mail Luigi
+   Rizzo (l.rizzo@iet.unipi.it).  Source code is available for FreeBSD
+   2.1 and FreeBSD 2.2 (easily adaptable to other BSD-derived systems).
+
+ Required Environment
+   C language compiler, BSD-derived system, kernel source code.
+
+ References
+   [Riz97]
+
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                      [Page 3]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+2.3.  Netperf
+
+ Author
+   Rick Jones
+
+ Category
+   Performance
+
+ Description
+   Single connection bandwidth or latency tests for TCP, UDP, and DLPI.
+   Includes provisions for CPU utilization measurement.
+
+ Automation
+   Requires compilation (K&R C sufficient for all but-DHISTOGRAM, may
+   require ANSI C in the future) if starting from source. Execution as
+   child of inetd requires editing of /etc/services and /etc/inetd.conf.
+   Scripts are provided for a quick look (snapshot_script), bulk
+   throughput of TCP and UDP, and latency for TCP and UDP.  It is
+   command-line driven.
+
+ Availability
+   See http://www.cup.hp.com/netperf/NetperfPage.html or e-mail Rick
+   Jones (raj@cup.hp.com). Binaries are available here for HP/UX Irix,
+   Solaris, and Win32.
+
+ Required Environment
+   C language compiler, POSIX.1, sockets.
+
+2.4.  NIST Net
+
+ Author
+   Mark Carson
+
+ Category
+   Functional Correctness / Performance
+
+ Description
+   NIST Net is a network emulator. The tool is packaged as a Linux
+   kernel patch, a kernel module, a set of programming APIs, and
+   command-line and X-based user interfaces.
+
+   NIST Net works by turning the system into a "selectively bad" router
+   - incoming packets may be delayed, dropped, duplicated, bandwidth-
+   constrained, etc.  Packet delays may be fixed or randomly
+   distributed, with loadable probability distributions.  Packet loss
+   may be uniformly distributed (constant loss probability) or
+   congestion-dependent (probability of loss increases with packet queue
+   lengths).  Explicit congestion notifications may optionally be sent
+
+
+
+Parker & Schmechel           Informational                      [Page 4]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+   in place of congestion-dependent loss.
+
+ Automation
+   To control the operation of the emulator, there is an interactive
+   user interface, a non-interactive command-line interface, and a set
+   of APIs.  Any or all of these may be used in concert.  The
+   interactive interface is suitable for simple, spur-of-the-moment
+   testing, while the command-line or APIs may be used to create
+   scripted, non-interactive tests.
+
+ Availability
+   NIST Net is available for public download from the NIST Net web site,
+   http://www.antd.nist.gov/itg/nistnet/.  The web site also has
+   installation instructions and documentation.
+
+ Required Environment
+   NIST Net requires a Linux installtion, with kernel version 2.0.27 -
+   2.0.33.  A kernel source tree and build tools are required to build
+   and install the NIST Net components.  Building the X interface
+   requires a version of XFree86 (Current Version is 3.3.2).  An
+   Athena-replacement widget set such as neXtaw
+   (http://www.inf.ufrgs.br/~kojima/nextaw/) is also desirable for an
+   improved user interface.
+
+   NIST Net should run on any i386-compatible machine capable of running
+   Linux, with one or more interfaces.
+
+2.5.  Orchestra
+
+ Author
+   Scott Dawson, Farnam Jahanian, and Todd Mitton
+
+ Category
+   Functional Correctness / Performance
+
+ Description
+   This tool is a library which provides the user with an ability to
+   build a protocol layer capable of performing fault injection on
+   protocols.  Several fault injection layers have been built using this
+   library, one of which has been used to test different vendor
+   implementations of TCP. This is accomplished by probing the vendor
+   implementation from one machine containing a protocol stack that has
+   been instrumented with Orchestra.  A connection is opened from the
+   vendor TCP implementation to the machine which has been instrumented.
+   Faults may then be injected at the Orchestra side of the connection
+   and the vendor TCP's response may be monitored.  The most recent
+   version of Orchestra runs inside the X-kernel protocol stack on the
+   OSF MK operating system.
+
+
+
+Parker & Schmechel           Informational                      [Page 5]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+   When using Orchestra to test a protocol, the fault injection layer is
+   placed below the target protocol in the protocol stack.  This can
+   either be done on one machine on the network, if protocol stacks on
+   the other machines cannot be modified (as in the case of testing
+   TCP), or can be done on all machines on the network (as in the case
+   of testing a protocol under development).  Once the fault injection
+   layer is in the protocol stack, all messages sent by and destined for
+   the target protocol pass through it on their way to/from the network.
+   The Orchestra fault injection layer can manipulate these messages.
+   In particular, it can drop, delay, re-order, duplicate, or modify
+   messages.  It can also introduce new messages into the system if
+   desired.
+
+   The actions of the Orchestra fault injection layer on each message
+   are determined by a script, written in Tcl.  This script is
+   interpreted by the fault injection layer when the message enters the
+   layer.  The script has access to the header information about the
+   message, and can make decisions based on header values.  It can also
+   keep information about previous messages, counters, or any other data
+   which the script writer deems useful.  Users of Orchestra may also
+   define their own actions to be taken on messages, written in C, that
+   may be called from the fault injection scripts.
+
+ Automation
+   Scripts can be specified either using a graphical user interface
+   which generates Tcl, or by writing Tcl directly.  At this time,
+   post-analysis of the results of the test must also be performed by
+   the user.  Essentially this consists of looking at a packet trace
+   that Orchestra generates for (in)correct behavior.  Must compile and
+   link fault generated layer with the protocol stack.
+
+ Availability
+   See http://www.eecs.umich.edu/RTCL/projects/orchestra/ or e-mail
+   Scott Dawson (sdawson@eecs.umich.edu).
+
+ Required Environment OSF MK operating system, or X-kernel like network
+   architecture, or adapted to network stack.
+
+ References
+   [DJ94], [DJM96a], [DJM96b]
+
+
+
+
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                      [Page 6]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+2.6.  Packet Shell
+
+ Author
+   Steve Parker and Chris Schmechel
+
+ Category
+   Functional Correctness / Performance
+
+ Description
+   An extensible Tcl/Tk based software toolset for protocol development
+   and testing. Tcl (Tool Command Language) is an embeddable scripting
+   language and Tk is a graphical user interface toolkit based on Tcl.
+   The Packet Shell creates Tcl commands that allow you to create,
+   modify, send, and receive packets on networks.  The operations for
+   each protocol are supplied by a dynamic linked library called a
+   protocol library.  These libraries are silently linked in from a
+   special directory when the Packet Shell begins execution. The current
+   protocol libraries are: IP, IPv6, IPv6 extensions, ICMP, ICMPv6,
+   Ethernet layer, data layer, file layer (snoop and tcpdump support),
+   socket layer, TCP, TLI.
+
+   It includes harness, which is a Tk based graphical user interface for
+   creating test scripts within the Packet Shell.  It includes tests for
+   no initial slow start, and retain out of sequence data as TCP test
+   cases mentioned in [PADHV98].
+
+   It includes tcpgraph, which is used with a snoop or tcpdump capture
+   file to produce a TCP time-sequence plot using xplot.
+
+ Automation
+   Command-line driven through Tcl commands, or graphical user interface
+   models are available through the harness format.
+
+ Availability
+   See http://playground.sun.com/psh/ or e-mail owner-packet-
+   shell@sunroof.eng.sun.com.
+
+ Required Environment
+
+   Solaris 2.4 or higher.  Porting required for other operating systems.
+
+
+
+
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                      [Page 7]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+2.7.  Tcpanaly
+
+ Author
+   Vern Paxson
+
+ Category
+   Functional Correctness / Performance
+
+ Description
+   This is a tool for automatically analyzing a TCP implementation's
+   behavior by inspecting packet traces of the TCP's activity. It does
+   so through packet filter traces produced by tcpdump.  It has coded
+   within it knowledge of a large number of TCP implementations.  Using
+   this, it can determine whether a given trace appears consistent with
+   a given implementation, and, if so, exactly why the TCP chose to
+   transmit each packet at the time it did.  If a trace is found
+   inconsistent with a TCP, tcpanaly either diagnoses a likely
+   measurement error present in the trace, or indicates exactly whether
+   the activity in the trace deviates from that of the TCP, which can
+   greatly aid in determining how the traced implementation behaves.
+
+   Tcpanaly's category is somewhat difficult to classify, since it
+   attempts to profile the behavior of an implementation, rather than to
+   explicitly test specific correctness or performance issues. However,
+   this profile identifies correctness and performance problems.
+
+   Adding new implementations of TCP behavior is possible with tcpanaly
+   through the use of C++ classes.
+
+ Automation
+   Command-line driven and only the traces of the TCP sending and
+   receiving bulk data transfers are needed as input.
+
+ Availability
+   Contact Vern Paxson (vern@ee.lbl.gov).
+
+ Required Environment
+   C++ compiler.
+
+ References
+   [Pax97a]
+
+
+
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                      [Page 8]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+2.8.  Tcptrace
+
+ Author
+   Shawn Ostermann
+
+ Category
+   Functional Correctness / Performance
+
+ Description
+   This is a TCP trace file analysis tool. It reads output trace files
+   in the formats of : tcpdump, snoop, etherpeek, and netm.
+
+   For each connection, it keeps track of elapsed time, bytes/segments
+   sent and received, retransmissions, round trip times, window
+   advertisements, throughput, etc from simple to very detailed output.
+
+   It can also produce three different types of graphs:
+
+   Time Sequence Graph (shows the segments sent and ACKs returned as a
+   function of time)
+
+   Instantaneous Throughput (shows the instantaneous, averaged over a
+   few segments, throughput of the connection as a function of time).
+
+   Round Trip Times (shows the round trip times for the ACKs as a
+   function of time)
+
+ Automation
+   Command-line driven, and uses the xplot program to view the graphs.
+
+ Availability
+   Source code is available, and Solaris binary along with sample
+   traces. See http://jarok.cs.ohiou.edu/software/tcptrace/tcptrace.html
+   or e-mail Shawn Ostermann (ostermann@cs.ohiou.edu).
+
+ Required Environment
+   C compiler, Solaris, FreeBSD, NetBSD, HPUX, Linux.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                      [Page 9]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+2.9.  Tracelook
+
+ Author
+   Greg Minshall
+
+ Category
+   Functional Correctness / Performance
+
+ Description
+   This is a Tcl/Tk program for graphically viewing the contents of
+   tcpdump trace files.  When plotting a connection, a user can select
+   various variables to be plotted. In each direction of the connection,
+   the user can plot the advertised window in each packet, the highest
+   sequence number in each packet, the lowest sequence number in each
+   packet, and the acknowledgement number in each packet.
+
+ Automation
+   Command-line driven with a graphical user interface for the graph.
+
+ Availability
+   See http://www.ipsilon.com/~minshall/sw/tracelook/tracelook.html or
+   e-mail Greg Minshall (minshall@ipsilon.com).
+
+ Required Environment
+   A modern version of awk, and Tcl/Tk (Tk version 3.6 or higher).  The
+   program xgraph is required to view the graphs under X11.
+
+2.10.  TReno
+
+ Author
+   Matt Mathis and Jamshid Mahdavi
+
+ Category
+   Performance
+
+ Description
+   This is a TCP throughput measurement tool based on sending UDP or
+   ICMP packets in patterns that are controlled at the user-level so
+   that their timing reflects what would be sent by a TCP that observes
+   proper congestion control (and implements SACK).  This allows it to
+   measure throughput independent of the TCP implementation of end hosts
+   and serve as a useful platform for prototyping TCP changes.
+
+ Automation
+   Command-line driven.  No "server" is required, and it only requires a
+   single argument of the machine to run the test to.
+
+
+
+
+
+Parker & Schmechel           Informational                     [Page 10]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+ Availability
+   See http://www.psc.edu/networking/treno_info.html or e-mail Matt
+   Mathis (mathis@psc.edu) or Jamshid Mahdavi (mahdavi@psc.edu).
+
+ Required Environment
+   C compiler, POSIX.1, raw sockets.
+
+2.11.  Ttcp
+
+ Author
+   Unknown
+
+ Category
+   Performance
+
+ Description
+   Originally written to move files around, ttcp became the classic
+   throughput benchmark or load generator, with the addition of support
+   for sourcing to/from memory. It can also be used as a traffic
+   absorber. It has spawned many variants, recent ones include support
+   for UDP, data pattern generation, page alignment, and even alignment
+   offset control.
+
+ Automation
+   Command-line driven.
+
+ Availability
+   See ftp://ftp.arl.mil/pub/ttcp/ or e-mail ARL (ftp@arl.mil) which
+   includes the most common variants available.
+
+ Required Environment
+   C compiler, BSD sockets.
+
+2.12.  Xplot
+
+ Author
+   Tim Shepard
+
+ Category
+   Functional Correctness / Performance
+
+ Description
+   This is a fairly conventional graphing/plotting tool (xplot itself),
+   a script to turn tcpdump output into xplot input, and some sample
+   code to generate xplot commands to plot the TCP time-sequence graph).
+
+ Automation
+   Command-line driven with a graphical user interface for the plot.
+
+
+
+Parker & Schmechel           Informational                     [Page 11]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+ Availability
+   See ftp://mercury.lcs.mit.edu/pub/shep/xplot.tar.gz or e-mail Tim
+   Shepard (shep@lcs.mit.edu).
+
+ Required Environment
+   C compiler, X11.
+
+ References
+   [She91]
+
+3. Summary
+
+   This memo lists all TCP tests and testing tools reported to the
+   authors as part of TCP Implementer's working group and is not
+   exhaustive.  These tools have been verified as available by the
+   authors.
+
+4. Security Considerations
+
+   Network analysis tools are improving at a steady pace.  The
+   continuing improvement in these tools such as the ones described make
+   security concerns significant.
+
+   Some of the tools could be used to create rogue packets or denial-
+   of-service attacks against other hosts.  Also, some of the tools
+   require changes to the kernel (foreign code) and might require root
+   privileges to execute.  So you are trusting code that you have
+   fetched from some perhaps untrustworthy remote site.  This code could
+   contain malicious code that could present any kind of attack.
+
+   None of the listed tools evaluate security in any way or form.
+
+   There are privacy concerns when grabbing packets from the network in
+   that you are now able to read other people's mail, files, etc.  This
+   impacts more than just the host running the tool but all traffic
+   crossing the host's physical network.
+
+5. References
+
+   [DJ94]    Scott Dawson and Farnam Jahanian, "Probing and Fault
+             Injection of Distributed Protocol Implementations",
+             University of Michigan Technical Report CSE-TR-217-94, EECS
+             Department.
+
+   [DJM96a]  Scott Dawson, Farnam Jahanian, and Todd Mitton, "ORCHESTRA:
+             A Fault Injection Environment for Distributed Systems",
+             University of Michigan Technical Report CSE-TR-318-96, EECS
+             Department.
+
+
+
+Parker & Schmechel           Informational                     [Page 12]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+   [DJM96b]  Scott Dawson, Farnam Jahanian, and Todd Mitton,
+             "Experiments on Six Commercial TCP Implementations Using a
+             Software Fault Injection Tool", University of Michigan
+             Technical Report CSE-TR-298-96, EECS Department.
+
+   [Pax97a]  Vern Paxson, "Automated Packet Trace Analysis of TCP
+             Implementations", ACM SIGCOMM '97, September 1997, Cannes,
+             France.
+
+   [PADHV98] Paxson, V., Allman, M., Dawson, S., Heavens, I., and B.
+             Volz, "Known TCP Implementation Problems", Work In
+             Progress.
+
+   [Riz97]   Luigi Rizzo, "Dummynet: a simple approach to the evaluation
+             of network protocols", ACM Computer Communication Review,
+             Vol. 27, N. 1, January 1997, pp.  31-41.
+
+   [She91]   Tim Shepard, "TCP Packet Trace Analysis", MIT Laboratory
+             for Computer Science MIT-LCS-TR-494, February, 1991.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                     [Page 13]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+6. Authors' Addresses
+
+   Steve Parker
+   Sun Microsystems, Inc.
+   901 San Antonio Road, UMPK17-202
+   Palo Alto, CA 94043
+   USA
+
+   Phone: (650) 786-5176
+   EMail: sparker@eng.sun.com
+
+
+   Chris Schmechel
+   Sun Microsystems, Inc.
+   901 San Antonio Road, UMPK17-202
+   Palo Alto, CA, 94043
+   USA
+
+   Phone: (650) 786-4053
+   EMail: cschmec@eng.sun.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                     [Page 14]
+
+RFC 2398        Some Testing Tools for TCP Implementors      August 1998
+
+
+7.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Parker & Schmechel           Informational                     [Page 15]
+
--- a/kernel/picotcp/RFC/rfc2415.txt
+++ b/kernel/picotcp/RFC/rfc2415.txt
@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group                                            K. Poduri
+Request for Comments: 2415                                      K. Nichols
+Category: Informational                                       Bay Networks
+                                                            September 1998
+
+
+        Simulation Studies of Increased Initial TCP Window Size
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+Abstract
+
+   An increase in the permissible initial window size of a TCP
+   connection, from one segment to three or four segments, has been
+   under discussion in the tcp-impl working group. This document covers
+   some simulation studies of the effects of increasing the initial
+   window size of TCP. Both long-lived TCP connections (file transfers)
+   and short-lived web-browsing style connections were modeled. The
+   simulations were performed using the publicly available ns-2
+   simulator and our custom models and files are also available.
+
+1. Introduction
+
+   We present results from a set of simulations with increased TCP
+   initial window (IW). The main objectives were to explore the
+   conditions under which the larger IW was a "win" and to determine the
+   effects, if any, the larger IW might have on other traffic flows
+   using an IW of one segment.
+
+   This study was inspired by discussions at the Munich IETF tcp-impl
+   and tcp-sat meetings. A proposal to increase the IW size to about 4K
+   bytes (4380 bytes in the case of 1460 byte segments) was discussed.
+   Concerns about both the utility of the increase and its effect on
+   other traffic were raised. Some studies were presented showing the
+   positive effects of increased IW on individual connections, but no
+   studies were shown with a wide variety of simultaneous traffic flows.
+   It appeared that some of the questions being raised could be
+   addressed in an ns-2 simulation. Early results from our simulations
+   were previously posted to the tcp-impl mailing list and presented at
+   the tcp-impl WG meeting at the December 1997 IETF.
+
+
+
+Poduri & Nichols             Informational                      [Page 1]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+2. Model and Assumptions
+
+   We simulated a network topology with a bottleneck link as shown:
+
+           10Mb,                                    10Mb,
+           (all 4 links)                          (all 4 links)
+
+      C   n2_________                               ______ n6     S
+      l   n3_________\                             /______ n7     e
+      i              \\              1.5Mb, 50ms   //             r
+      e               n0 ------------------------ n1              v
+      n   n4__________//                          \ \_____ n8     e
+      t   n5__________/                            \______ n9     r
+      s                                                           s
+
+                    URLs -->          <--- FTP & Web data
+
+   File downloading and web-browsing clients are attached to the nodes
+   (n2-n5) on the left-hand side. These clients are served by the FTP
+   and Web servers attached to the nodes (n6-n9) on the right-hand side.
+   The links to and from those nodes are at 10 Mbps. The bottleneck link
+   is between n1 and n0. All links are bi-directional, but only ACKs,
+   SYNs, FINs, and URLs are flowing from left to right. Some simulations
+   were also performed with data traffic flowing from right to left
+   simultaneously, but it had no effect on the results.
+
+   In the simulations we assumed that all ftps transferred 1-MB files
+   and that all web pages had exactly three embedded URLs. The web
+   clients are browsing quite aggressively, requesting a new page after
+   a random delay uniformly distributed between 1 and 5 seconds. This is
+   not meant to realistically model a single user's web-browsing
+   pattern, but to create a reasonably heavy traffic load whose
+   individual tcp connections accurately reflect real web traffic. Some
+   discussion of these models as used in earlier studies is available in
+   references [3] and [4].
+
+   The maximum tcp window was set to 11 packets, maximum packet (or
+   segment) size to 1460 bytes, and buffer sizes were set at 25 packets.
+   (The ns-2 TCPs require setting window sizes and buffer sizes in
+   number of packets. In our tcp-full code some of the internal
+   parameters have been set to be byte-oriented, but external values
+   must still be set in number of packets.)  In our simulations, we
+   varied the number of data segments sent into a new TCP connection (or
+   initial window) from one to four, keeping all segments at 1460 bytes.
+   A dropped packet causes a restart window of one segment to be used,
+   just as in current practice.
+
+
+
+
+
+Poduri & Nichols             Informational                      [Page 2]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+   For ns-2 users: The tcp-full code was modified to use an
+   "application" class and three application client-server pairs were
+   written: a simple file transfer (ftp), a model of http1.0 style web
+   connection and a very rough model of http1.1 style web connection.
+   The required files and scripts for these simulations are available
+   under the contributed code section on the ns-simulator web page at
+   the sites ftp://ftp.ee.lbl.gov/IW.{tar, tar.Z} or http://www-
+   nrg.ee.lbl.gov/floyd/tcp_init_win.html.
+
+   Simulations were run with 8, 16, 32 web clients and a number of ftp
+   clients ranging from 0 to 3. The IW was varied from 1 to 4, though
+   the 4-packet case lies beyond what is currently recommended. The
+   figures of merit used were goodput, the median page delay seen by the
+   web clients and the median file transfer delay seen by the ftp
+   clients. The simulated run time was rather large, 360 seconds, to
+   ensure an adequate sample. (Median values remained the same for
+   simulations with larger run times and can be considered stable)
+
+3. Results
+
+   In our simulations, we varied the number of file transfer clients in
+   order to change the congestion of the link. Recall that our ftp
+   clients continuously request 1 Mbyte transfers, so the link
+   utilization is over 90% when even a single ftp client is present.
+   When three file transfer clients are running simultaneously, the
+   resultant congestion is somewhat pathological, making the values
+   recorded stable. Though all connections use the same initial window,
+   the effect of increasing the IW on a 1 Mbyte file transfer is not
+   detectable, thus we focus on the web browsing connections.  (In the
+   tables, we use "webs" to indicate number of web clients and "ftps" to
+   indicate the number of file transfer clients attached.) Table 1 shows
+   the median delays experienced by the web transfers with an increase
+   in the TCP IW.  There is clearly an improvement in transfer delays
+   for the web connections with increase in the IW, in many cases on the
+   order of 30%.  The steepness of the performance improvement going
+   from an IW of 1 to an IW of 2 is mainly due to the distribution of
+   files fetched by each URL (see references [1] and [2]); the median
+   size of both primary and in-line URLs fits completely into two
+   packets. If file distributions change, the shape of this curve may
+   also change.
+
+
+
+
+
+
+
+
+
+
+
+Poduri & Nichols             Informational                      [Page 3]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+   Table 1. Median web page delay
+
+   #Webs   #FTPs   IW=1    IW=2    IW=3    IW=4
+                   (s)        (% decrease)
+   ----------------------------------------------
+     8      0      0.56    14.3  17.9   16.1
+     8      1      1.06    18.9  25.5   32.1
+     8      2      1.18    16.1  17.1   28.9
+     8      3      1.26    11.9  19.0   27.0
+    16      0      0.64    11.0  15.6   18.8
+    16      1      1.04    17.3  24.0   35.6
+    16      2      1.22    17.2  20.5   25.4
+    16      3      1.31    10.7  21.4   22.1
+    32      0      0.92    17.6  28.6   21.0
+    32      1      1.19    19.6  25.0   26.1
+    32      2      1.43    23.8  35.0   33.6
+    32      3      1.56    19.2  29.5   33.3
+
+   Table 2 shows the bottleneck link utilization and packet drop
+   percentage of the same experiment. Packet drop rates did increase
+   with IW, but in all cases except that of the single most pathological
+   overload, the increase in drop percentage was less than 1%. A
+   decrease in packet drop percentage is observed in some overloaded
+   situations, specifically when ftp transfers consumed most of the link
+   bandwidth and a large number of web transfers shared the remaining
+   bandwidth of the link. In this case, the web transfers experience
+   severe packet loss and some of the IW=4 web clients suffer multiple
+   packet losses from the same window, resulting in longer recovery
+   times than when there is a single packet loss in a window. During the
+   recovery time, the connections are inactive which alleviates
+   congestion and thus results in a decrease in the packet drop
+   percentage. It should be noted that such observations were made only
+   in extremely overloaded scenarios.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Poduri & Nichols             Informational                      [Page 4]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+Table 2. Link utilization and packet drop rates
+
+         Percentage Link Utilization            |      Packet drop rate
+#Webs   #FTPs   IW=1    IW=2    IW=3  IW=4      |IW=1  IW=2  IW=3  IW=4
+-----------------------------------------------------------------------
+  8     0        34     37      38      39      | 0.0   0.0  0.0   0.0
+  8     1        95     92      93      92      | 0.6   1.2  1.4   1.3
+  8     2        98     97      97      96      | 1.8   2.3  2.3   2.7
+  8     3        98     98      98      98      | 2.6   3.0  3.5   3.5
+-----------------------------------------------------------------------
+ 16     0        67     69      69      67      | 0.1   0.5  0.8   1.0
+ 16     1        96     95      93      92      | 2.1   2.6  2.9   2.9
+ 16     2        98     98      97      96      | 3.5   3.6  4.2   4.5
+ 16     3        99     99      98      98      | 4.5   4.7  5.2   4.9
+-----------------------------------------------------------------------
+ 32     0        92     87      85      84      | 0.1   0.5  0.8   1.0
+ 32     1        98     97      96      96      | 2.1   2.6  2.9   2.9
+ 32     2        99     99      98      98      | 3.5   3.6  4.2   4.5
+ 32     3       100     99      99      98      | 9.3   8.4  7.7   7.6
+
+   To get a more complete picture of performance, we computed the
+   network power, goodput divided by median delay (in Mbytes/ms), and
+   plotted it against IW for all scenarios. (Each scenario is uniquely
+   identified by its number of webs and number of file transfers.) We
+   plot these values in Figure 1 (in the pdf version), illustrating a
+   general advantage to increasing IW. When a large number of web
+   clients is combined with ftps, particularly multiple ftps,
+   pathological cases result from the extreme congestion. In these
+   cases, there appears to be no particular trend to the results of
+   increasing the IW, in fact simulation results are not particularly
+   stable.
+
+   To get a clearer picture of what is happening across all the tested
+   scenarios, we normalized the network power values for the non-
+   pathological scenario by the network power for that scenario at IW of
+   one. These results are plotted in Figure 2. As IW is increased from
+   one to four, network power increased by at least 15%, even in a
+   congested scenario dominated by bulk transfer traffic. In simulations
+   where web traffic has a dominant share of the available bandwidth,
+   the increase in network power was up to 60%.
+
+   The increase in network power at higher initial window sizes is due
+   to an increase in throughput and a decrease in the delay. Since the
+   (slightly) increased drop rates were accompanied by better
+   performance, drop rate is clearly not an indicator of user level
+   performance.
+
+
+
+
+
+Poduri & Nichols             Informational                      [Page 5]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+   The gains in performance seen by the web clients need to be balanced
+   against the performance the file transfers are seeing. We computed
+   ftp network power and show this in Table 3.  It appears that the
+   improvement in network power seen by the web connections has
+   negligible effect on the concurrent file transfers. It can be
+   observed from the table that there is a small variation in the
+   network power of file transfers with an increase in the size of IW
+   but no particular trend can be seen. It can be concluded that the
+   network power of file transfers essentially remained the same.
+   However, it should be noted that a larger IW does allow web transfers
+   to gain slightly more bandwidth than with a smaller IW. This could
+   mean fewer bytes transferred for FTP applications or a slight
+   decrease in network power as computed by us.
+
+   Table 3. Network power of file transfers with an increase in the TCP
+            IW size
+
+   #Webs   #FTPs   IW=1    IW=2    IW=3    IW=4
+   --------------------------------------------
+     8      1      4.7     4.2     4.2     4.2
+     8      2      3.0     2.8     3.0     2.8
+     8      3      2.2     2.2     2.2     2.2
+    16      1      2.3     2.4     2.4     2.5
+    16      2      1.8     2.0     1.8     1.9
+    16      3      1.4     1.6     1.5     1.7
+    32      1      0.7     0.9     1.3     0.9
+    32      2      0.8     1.0     1.3     1.1
+    32      3      0.7     1.0     1.2     1.0
+
+   The above simulations all used http1.0 style web connections, thus, a
+   natural question is to ask how results are affected by migration to
+   http1.1. A rough model of this behavior was simulated by using one
+   connection to send all of the information from both the primary URL
+   and the three embedded, or in-line, URLs. Since the transfer size is
+   now made up of four web files, the steep improvement in performance
+   between an IW of 1 and an IW of two, noted in the previous results,
+   has been smoothed. Results are shown in Tables 4 & 5 and Figs. 3 & 4.
+   Occasionally an increase in IW from 3 to 4 decreases the network
+   power owing to a non-increase or a slight decrease in the throughput.
+   TCP connections opening up with a higher window size into a very
+   congested network might experience some packet drops and consequently
+   a slight decrease in the throughput. This indicates that increase of
+   the initial window sizes to further higher values (>4) may not always
+   result in a favorable network performance. This can be seen clearly
+   in Figure 4 where the network power shows a decrease for the two
+   highly congested cases.
+
+
+
+
+
+Poduri & Nichols             Informational                      [Page 6]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+   Table 4. Median web page delay for http1.1
+
+   #Webs   #FTPs   IW=1    IW=2    IW=3    IW=4
+                   (s)       (% decrease)
+   ----------------------------------------------
+     8      0      0.47   14.9   19.1   21.3
+     8      1      0.84   17.9   19.0   25.0
+     8      2      0.99   11.5   17.3   23.0
+     8      3      1.04   12.1   20.2   28.3
+    16      0      0.54   07.4   14.8   20.4
+    16      1      0.89   14.6   21.3   27.0
+    16      2      1.02   14.7   19.6   25.5
+    16      3      1.11   09.0   17.0   18.9
+    32      0      0.94   16.0   29.8   36.2
+    32      1      1.23   12.2   28.5   21.1
+    32      2      1.39   06.5   13.7   12.2
+    32      3      1.46   04.0   11.0   15.0
+
+
+   Table 5. Network power of file transfers with an increase in the
+            TCP IW size
+
+   #Webs   #FTPs   IW=1    IW=2    IW=3    IW=4
+   --------------------------------------------
+     8      1      4.2     4.2     4.2     3.7
+     8      2      2.7     2.5     2.6     2.3
+     8      3      2.1     1.9     2.0     2.0
+    16      1      1.8     1.8     1.5     1.4
+    16      2      1.5     1.2     1.1     1.5
+    16      3      1.0     1.0     1.0     1.0
+    32      1      0.3     0.3     0.5     0.3
+    32      2      0.4     0.3     0.4     0.4
+    32      3      0.4     0.3     0.4     0.5
+
+   For further insight, we returned to the http1.0 model and mixed some
+   web-browsing connections with IWs of one with those using IWs of
+   three. In this experiment, we first simulated a total of 16 web-
+   browsing connections, all using IW of one. Then the clients were
+   split into two groups of 8 each, one of which uses IW=1 and the other
+   used IW=3.
+
+   We repeated the simulations for a total of 32 and 64 web-browsing
+   clients, splitting those into groups of 16 and 32 respectively. Table
+   6 shows these results.  We report the goodput (in Mbytes), the web
+   page delays (in milli seconds), the percent utilization of the link
+   and the percent of packets dropped.
+
+
+
+
+
+Poduri & Nichols             Informational                      [Page 7]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+Table 6. Results for half-and-half scenario
+
+Median Page Delays and Goodput (MB)   | Link Utilization (%) & Drops (%)
+#Webs     IW=1    |     IW=3          |       IW=1    |    IW=3
+      G.put   dly |  G.put   dly      |  L.util  Drops| L.util   Drops
+------------------|-------------------|---------------|---------------
+16      35.5  0.64|  36.4   0.54      |   67      0.1 |   69       0.7
+8/8     16.9  0.67|  18.9   0.52      |   68      0.5 |
+------------------|-------------------|---------------|---------------
+32      48.9  0.91|  44.7   0.68      |   92      3.5 |   85       4.3
+16/16   22.8  0.94|  22.9   0.71      |   89      4.6 |
+------------------|-------------------|---------------|----------------
+64      51.9  1.50|  47.6   0.86      |   98     13.0 |   91       8.6
+32/32   29.0  1.40|  22.0   1.20      |   98     12.0 |
+
+   Unsurprisingly, the non-split experiments are consistent with our
+   earlier results, clients with IW=3 outperform clients with IW=1. The
+   results of running the 8/8 and 16/16 splits show that running a
+   mixture of IW=3 and IW=1 has no negative effect on the IW=1
+   conversations, while IW=3 conversations maintain their performance.
+   However, the 32/32 split shows that web-browsing connections with
+   IW=3 are adversely affected. We believe this is due to the
+   pathological dynamics of this extremely congested scenario. Since
+   embedded URLs open their connections simultaneously, very large
+   number of TCP connections are arriving at the bottleneck link
+   resulting in multiple packet losses for the IW=3 conversations. The
+   myriad problems of this simultaneous opening strategy is, of course,
+   part of the motivation for the development of http1.1.
+
+4. Discussion
+
+   The indications from these results are that increasing the initial
+   window size to 3 packets (or 4380 bytes) helps to improve perceived
+   performance. Many further variations on these simulation scenarios
+   are possible and we've made our simulation models and scripts
+   available in order to facilitate others' experiments.
+
+   We also used the RED queue management included with ns-2 to perform
+   some other simulation studies. We have not reported on those results
+   here since we don't consider the studies complete. We found that by
+   adding RED to the bottleneck link, we achieved similar performance
+   gains (with an IW of 1) to those we found with increased IWs without
+   RED. Others may wish to investigate this further.
+
+   Although the simulation sets were run for a T1 link, several
+   scenarios with varying levels of congestion and varying number of web
+   and ftp clients were analyzed. It is reasonable to expect that the
+   results would scale for links with higher bandwidth. However,
+
+
+
+Poduri & Nichols             Informational                      [Page 8]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+   interested readers could investigate this aspect further.
+
+   We also used the RED queue management included with ns-2 to perform
+   some other simulation studies. We have not reported on those results
+   here since we don't consider the studies complete. We found that by
+   adding RED to the bottleneck link, we achieved similar performance
+   gains (with an IW of 1) to those we found with increased IWs without
+   RED. Others may wish to investigate this further.
+
+5. References
+
+   [1] B. Mah, "An Empirical Model of HTTP Network Traffic", Proceedings
+       of INFOCOM '97, Kobe, Japan, April 7-11, 1997.
+
+   [2] C.R. Cunha, A. Bestavros, M.E. Crovella, "Characteristics of WWW
+       Client-based Traces", Boston University Computer Science
+       Technical Report BU-CS-95-010, July 18, 1995.
+
+   [3] K.M. Nichols and M. Laubach, "Tiers of Service for Data Access in
+       a HFC Architecture", Proceedings of SCTE Convergence Conference,
+       January, 1997.
+
+   [4] K.M. Nichols, "Improving Network Simulation with Feedback",
+       available from knichols@baynetworks.com
+
+6. Acknowledgements
+
+   This work benefited from discussions with and comments from Van
+   Jacobson.
+
+7. Security Considerations
+
+   This document discusses a simulation study of the effects of a
+   proposed change to TCP. Consequently, there are no security
+   considerations directly related to the document. There are also no
+   known security considerations associated with the proposed change.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Poduri & Nichols             Informational                      [Page 9]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+8. Authors' Addresses
+
+   Kedarnath Poduri
+   Bay Networks
+   4401 Great America Parkway
+   SC01-04
+   Santa Clara, CA 95052-8185
+
+   Phone: +1-408-495-2463
+   Fax:   +1-408-495-1299
+   EMail: kpoduri@Baynetworks.com
+
+
+   Kathleen Nichols
+   Bay Networks
+   4401 Great America Parkway
+   SC01-04
+   Santa Clara, CA 95052-8185
+
+   EMail: knichols@baynetworks.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Poduri & Nichols             Informational                     [Page 10]
+
+RFC 2415                    TCP Window Size               September 1998
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Poduri & Nichols             Informational                     [Page 11]
+
--- a/kernel/picotcp/RFC/rfc2416.txt
+++ b/kernel/picotcp/RFC/rfc2416.txt
@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group                                         T. Shepard
+Request for Comments: 2416                                  C. Partridge
+Category: Informational                                 BBN Technologies
+                                                          September 1998
+
+
+      When TCP Starts Up With Four Packets Into Only Three Buffers
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+Abstract
+
+   This memo is to document a simple experiment.  The experiment showed
+   that in the case of a TCP receiver behind a 9600 bps modem link at
+   the edge of a fast Internet where there are only 3 buffers before the
+   modem (and the fourth packet of a four-packet start will surely be
+   dropped), no significant degradation in performance is experienced by
+   a TCP sending with a four-packet start when compared with a normal
+   slow start (which starts with just one packet).
+
+Background
+
+   Sally Floyd has proposed that TCPs start their initial slow start by
+   sending as many as four packets (instead of the usual one packet) as
+   a means of getting TCP up-to-speed faster.  (Slow starts instigated
+   due to timeouts would still start with just one packet.)  Starting
+   with more than one packet might reduce the start-up latency over
+   long-fat pipes by two round-trip times.  This proposal is documented
+   further in [1], [2], and in [3] and we assume the reader is familiar
+   with the details of this proposal.
+
+   On the end2end-interest mailing list, concern was raised that in the
+   (allegedly common) case where a slow modem is served by a router
+   which only allocates three buffers per modem (one buffer being
+   transmitted while two packets are waiting), that starting with four
+   packets would not be good because the fourth packet is sure to be
+   dropped.
+
+
+
+
+
+
+Shepard & Partridge          Informational                      [Page 1]
+
+RFC 2416        TCP with Four Packets into Three Buffers  September 1998
+
+
+   Vern Paxson replied with the comment (among other things) that the
+   four-packet start is no worse than what happens after two round trip
+   times in normal slow start, hence no new problem is introduced by
+   starting with as many as four packets.  If there is a problem with a
+   four-packet start, then the problem already exists in a normal slow-
+   start startup after two round trip times when the slow-start
+   algorithm will release into the net four closely spaced packets.
+
+   The experiment reported here confirmed Vern Paxson's reasoning.
+
+Scenario and experimental setup
+
+
+--------+  100 Mbps  +---+  1.5 Mbps   +---+  9600 bps    +----------+
+| source +------------+ R +-------------+ R +--------------+ receiver |
+--------+  no delay  +---+ 25 ms delay +---+ 150 ms delay +----------+
+
+              |                             |
+              |                             |
+          (we spy here)              (this router has only 3 buffers
+                                      to hold packets going into the
+                                      9600 bps link)
+
+   The scenario studied and simulated consists of three links between
+   the source and sink.  The first link is a 100 Mbps link with no
+   delay.  It connects the sender to a router.  (It was included to have
+   a means of logging the returning ACKs at the time they would be seen
+   by the sender.)  The second link is a 1.5 Mbps link with a 25 ms
+   one-way delay.  (This link was included to roughly model traversing
+   an un-congested, intra-continental piece of the terrestrial
+   Internet.) The third link is a 9600 bps link with a 150 ms one-way
+   delay.  It connects the edge of the net to a receiver which is behind
+   the 9600 bps link.
+
+   The queue limits for the queues at each end of the first two links
+   were set to 100 (a value sufficiently large that this limit was never
+   a factor).  The queue limits at each end of the 9600 bps link were
+   set to 3 packets (which can hold at most two packets while one is
+   being sent).
+
+   Version 1.2a2 of the the NS simulator (available from LBL) was used
+   to simulate both one-packet and four-packet starts for each of the
+   available TCP algorithms (tahoe, reno, sack, fack) and the conclusion
+   reported here is independent of which TCP algorithm is used (in
+   general, we believe).  In this memo, the "tahoe" module will be used
+   to illustrate what happens.  In the 4-packet start cases, the
+   "window-init" variable was set to 4, and the TCP implementations were
+   modified to use the value of the window-init variable only on
+
+
+
+Shepard & Partridge          Informational                      [Page 2]
+
+RFC 2416        TCP with Four Packets into Three Buffers  September 1998
+
+
+   connection start, but to set cwnd to 1 on other instances of a slow-
+   start. (The tcp.cc module as shipped with ns-1.2a2 would use the
+   window-init value in all cases.)
+
+   The packets in simulation are 1024 bytes long for purposes of
+   determining the time it takes to transmit them through the links.
+   (The TCP modules included with the LBL NS simulator do not simulate
+   the TCP sequence number mechanisms.  They use just packet numbers.)
+
+   Observations are made of all packets and acknowledgements crossing
+   the 100 Mbps no-delay link, near the sender.  (All descriptions below
+   are from this point of view.)
+
+What happens with normal slow start
+
+   At time 0.0 packet number 1 is sent.
+
+   At time 1.222 an ack is received covering packet number 1, and
+   packets 2 and 3 are sent.
+
+   At time 2.444 an ack is received covering packet number 2, and
+   packets 4 and 5 are sent.
+
+   At time 3.278 an ack is received covering packet number 3, and
+   packets 6 and 7 are sent.
+
+   At time 4.111 an ack is received covering packet number 4, and
+   packets 8 and 9 are sent.
+
+   At time 4.944 an ack is received covering packet number 5, and
+   packets 10 and 11 are sent.
+
+   At time 5.778 an ack is received covering packet number 6, and
+   packets 12 and 13 are sent.
+
+   At time 6.111 a duplicate ack is recieved (covering packet number 6).
+
+   At time 7.444 another duplicate ack is received (covering packet
+   number 6).
+
+   At time 8.278 a third duplicate ack is received (covering packet
+   number 6) and packet number 7 is retransmitted.
+
+   (And the trace continues...)
+
+What happens with a four-packet start
+
+   At time 0.0, packets 1, 2, 3, and 4 are sent.
+
+
+
+Shepard & Partridge          Informational                      [Page 3]
+
+RFC 2416        TCP with Four Packets into Three Buffers  September 1998
+
+
+   At time 1.222 an ack is received covering packet number 1, and
+   packets 5 and 6 are sent.
+
+   At time 2.055 an ack is received covering packet number 2, and
+   packets 7 and 8 are sent.
+
+   At time 2.889 an ack is received covering packet number 3, and
+   packets 9 and 10 are sent.
+
+   At time 3.722 a duplicate ack is received (covering packet number 3).
+
+   At time 4.555 another duplicate ack is received (covering packet
+   number 3).
+
+   At time 5.389 a third duplicate ack is received (covering packet
+   number 3) and packet number 4 is retransmitted.
+
+   (And the trace continues...)
+
+Discussion
+
+   At the point left off in the two traces above, the two different
+   systems are in almost identical states.  The two traces from that
+   point on are almost the same, modulo a shift in time of (8.278 -
+   5.389) = 2.889 seconds and a shift of three packets.  If the normal
+   TCP (with the one-packet start) will deliver packet N at time T, then
+   the TCP with the four-packet start will deliver packet N - 3 at time
+   T - 2.889 (seconds).
+
+   Note that the time to send three 1024-byte TCP segments through a
+   9600 bps modem is 2.66 seconds.  So at what time does the four-
+   packet-start TCP deliver packet N?  At time T - 2.889 + 2.66 = T -
+   0.229 in most cases, and in some cases earlier, in some cases later,
+   because different packets (by number) experience loss in the two
+   traces.
+
+   Thus the four-packet-start TCP is in some sense 0.229 seconds (or
+   about one fifth of a packet) ahead of where the one-packet-start TCP
+   would be.  (This is due to the extra time the modem sits idle while
+   waiting for the dally timer to go off in the receiver in the case of
+   the one-packet-start TCP.)
+
+   The states of the two systems are not exactly identical.  They differ
+   slightly in the round-trip-time estimators because the behavior at
+   the start is not identical. (The observed round trip times may differ
+   by a small amount due to dally timers and due to that the one-packet
+   start experiences more round trip times before the first loss.)  In
+   the cases where a retransmit timer did later go off, the additional
+
+
+
+Shepard & Partridge          Informational                      [Page 4]
+
+RFC 2416        TCP with Four Packets into Three Buffers  September 1998
+
+
+   difference in timing was much smaller than the 0.229 second
+   difference discribed above.
+
+Conclusion
+
+   In this particular case, the four-packet start is not harmful.
+
+Non-conclusions, opinions, and future work
+
+   A four-packet start would be very helpful in situations where a
+   long-delay link is involved (as it would reduce transfer times for
+   moderately-sized transfers by as much as two round-trip times).  But
+   it remains (in the authors' opinions at this time) an open question
+   whether or not the four-packet start would be safe for the network.
+
+   It would be nice to see if this result could be duplicated with real
+   TCPs, real modems, and real three-buffer limits.
+
+Security Considerations
+
+   This document discusses a simulation study of the effects of a
+   proposed change to TCP.  Consequently, there are no security
+   considerations directly related to the document.  There are also no
+   known security considerations associated with the proposed change.
+
+References
+
+   1.   S. Floyd, Increasing TCP's Initial Window (January 29, 1997).
+        URL ftp://ftp.ee.lbl.gov/papers/draft.jan29.
+
+   2.   S. Floyd and M. Allman, Increasing TCP's Initial Window (July,
+        1997). URL http://gigahertz.lerc.nasa.gov/~mallman/share/draft-
+        ss.txt
+
+   3.   Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
+        Initial Window", RFC 2414, September 1998.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Shepard & Partridge          Informational                      [Page 5]
+
+RFC 2416        TCP with Four Packets into Three Buffers  September 1998
+
+
+Authors' Addresses
+
+   Tim Shepard
+   BBN Technologies
+   10 Moulton Street
+   Cambridge, MA 02138
+
+   EMail: shep@alum.mit.edu
+
+
+   Craig Partridge
+   BBN Technologies
+   10 Moulton Street
+   Cambridge, MA 02138
+
+   EMail: craig@bbn.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Shepard & Partridge          Informational                      [Page 6]
+
+RFC 2416        TCP with Four Packets into Three Buffers  September 1998
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Shepard & Partridge          Informational                      [Page 7]
+
--- a/kernel/picotcp/RFC/rfc2452.txt
+++ b/kernel/picotcp/RFC/rfc2452.txt
@ -0,0 +1,563 @@
+
+
+
+
+
+
+Network Working Group                                      M. Daniele
+Request for Comments: 2452                Compaq Computer Corporation
+Category: Standards Track                               December 1998
+
+
+               IP Version 6 Management Information Base
+                 for the Transmission Control Protocol
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+Abstract
+
+   This document is one in the series of documents that define various
+   MIB objects for IPv6.  Specifically, this document is the MIB module
+   which defines managed objects for implementations of the Transmission
+   Control Protocol (TCP) over IP Version 6 (IPv6).
+
+   This document also recommends a specific policy with respect to the
+   applicability of RFC 2012 for implementations of IPv6.  Namely, that
+   most of managed objects defined in RFC 2012 are independent of which
+   IP versions underlie TCP, and only the TCP connection information is
+   IP version-specific.
+
+   This memo defines an experimental portion of the Management
+   Information Base (MIB) for use with network management protocols in
+   IPv6-based internets.
+
+1.  Introduction
+
+   A management system contains: several (potentially many) nodes, each
+   with a processing entity, termed an agent, which has access to
+   management instrumentation; at least one management station; and, a
+   management protocol, used to convey management information between
+   the agents and management stations.  Operations of the protocol are
+   carried out under an administrative framework which defines
+   authentication, authorization, access control, and privacy policies.
+
+
+
+
+
+Daniele                     Standards Track                     [Page 1]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+   Management stations execute management applications which monitor and
+   control managed elements.  Managed elements are devices such as
+   hosts, routers, terminal servers, etc., which are monitored and
+   controlled via access to their management information.
+
+   Management information is viewed as a collection of managed objects,
+   residing in a virtual information store, termed the Management
+   Information Base (MIB).  Collections of related objects are defined
+   in MIB modules.  These modules are written using a subset of OSI's
+   Abstract Syntax Notation One (ASN.1) [1], termed the Structure of
+   Management Information (SMI) [2].
+
+2.  Overview
+
+   This document is one in the series of documents that define various
+   MIB objects, and statements of conformance, for IPv6.  This document
+   defines the required instrumentation for implementations of TCP over
+   IPv6.
+
+3.  Transparency of IP versions to TCP
+
+   The fact that a particular TCP connection uses IPv6 as opposed to
+   IPv4, is largely invisible to a TCP implementation.  A "TCPng" did
+   not need to be defined, implementations simply need to support IPv6
+   addresses.
+
+   As such, the managed objects already defined in [TCP MIB] are
+   sufficient for managing TCP in the presence of IPv6.  These objects
+   are equally applicable whether the managed node supports IPv4 only,
+   IPv6 only, or both IPv4 and IPv6.
+
+   For example, tcpActiveOpens counts "The number of times TCP
+   connections have made a direct transition to the SYN-SENT state from
+   the CLOSED state", regardless of which version of IP is used between
+   the connection endpoints.
+
+   Stated differently, TCP implementations don't need separate counters
+   for IPv4 and for IPv6.
+
+4.  Representing TCP Connections
+
+   The exception to the statements in section 3 is the tcpConnTable.
+   Since IPv6 addresses cannot be represented with the IpAddress syntax,
+   not all TCP connections can be represented in the tcpConnTable
+   defined in [TCP MIB].
+
+
+
+
+
+
+Daniele                     Standards Track                     [Page 2]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+   This memo defines a new, separate table to represent only those TCP
+   connections between IPv6 endpoints.  TCP connections between IPv4
+   endpoints continue to be represented in tcpConnTable [TCP MIB].  (It
+   is not possible to establish a TCP connection between an IPv4
+    endpoint and an IPv6 endpoint.)
+
+   A different approach would have been to define a new table to
+   represent all TCP connections regardless of IP version.  This would
+   require changes to [TCP MIB] and hence to existing (IPv4-only) TCP
+   implementations.  The approach suggested in this memo has the
+   advantage of leaving IPv4-only implementations intact.
+
+   It is assumed that the objects defined in this memo will eventually
+   be defined in an update to [TCP MIB].  For this reason, the module
+   identity is assigned under the experimental portion of the MIB.
+
+5.  Conformance
+
+   This memo contains conformance statements to define conformance to
+   this MIB for TCP over IPv6 implementations.
+
+6.  Definitions
+
+IPV6-TCP-MIB DEFINITIONS ::= BEGIN
+
+IMPORTS
+   MODULE-COMPLIANCE, OBJECT-GROUP      FROM SNMPv2-CONF
+   MODULE-IDENTITY, OBJECT-TYPE,
+   mib-2, experimental                  FROM SNMPv2-SMI
+   Ipv6Address, Ipv6IfIndexOrZero       FROM IPV6-TC;
+
+ipv6TcpMIB MODULE-IDENTITY
+   LAST-UPDATED "9801290000Z"
+   ORGANIZATION "IETF IPv6 MIB Working Group"
+   CONTACT-INFO
+        "       Mike Daniele
+
+                Postal: Compaq Computer Corporation
+                        110 Spitbrook Rd
+                        Nashua, NH 03062.
+                        US
+
+                Phone:  +1 603 884 1423
+                Email:  daniele@zk3.dec.com"
+   DESCRIPTION
+        "The MIB module for entities implementing TCP over IPv6."
+   ::= { experimental 86 }
+
+
+
+
+Daniele                     Standards Track                     [Page 3]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+-- objects specific to TCP for IPv6
+
+tcp      OBJECT IDENTIFIER ::= { mib-2 6 }
+
+-- the TCP over IPv6 Connection table
+
+-- This connection table contains information about this
+-- entity's existing TCP connections between IPv6 endpoints.
+-- Only connections between IPv6 addresses are contained in
+-- this table.  This entity's connections between IPv4
+-- endpoints are contained in tcpConnTable.
+
+ipv6TcpConnTable OBJECT-TYPE
+   SYNTAX      SEQUENCE OF Ipv6TcpConnEntry
+   MAX-ACCESS  not-accessible
+   STATUS      current
+   DESCRIPTION
+        "A table containing TCP connection-specific information,
+         for only those connections whose endpoints are IPv6 addresses."
+   ::= { tcp 16 }
+
+ipv6TcpConnEntry OBJECT-TYPE
+   SYNTAX      Ipv6TcpConnEntry
+   MAX-ACCESS  not-accessible
+   STATUS      current
+   DESCRIPTION
+        "A conceptual row of the ipv6TcpConnTable containing
+         information about a particular current TCP connection.
+         Each row of this table is transient, in that it ceases to
+         exist when (or soon after) the connection makes the transition
+         to the CLOSED state.
+
+         Note that conceptual rows in this table require an additional
+         index object compared to tcpConnTable, since IPv6 addresses
+         are not guaranteed to be unique on the managed node."
+   INDEX   { ipv6TcpConnLocalAddress,
+             ipv6TcpConnLocalPort,
+             ipv6TcpConnRemAddress,
+             ipv6TcpConnRemPort,
+             ipv6TcpConnIfIndex }
+   ::= { ipv6TcpConnTable 1 }
+
+Ipv6TcpConnEntry ::=
+   SEQUENCE { ipv6TcpConnLocalAddress    Ipv6Address,
+              ipv6TcpConnLocalPort       INTEGER (0..65535),
+              ipv6TcpConnRemAddress      Ipv6Address,
+              ipv6TcpConnRemPort         INTEGER (0..65535),
+              ipv6TcpConnIfIndex         Ipv6IfIndexOrZero,
+
+
+
+Daniele                     Standards Track                     [Page 4]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+              ipv6TcpConnState           INTEGER }
+
+ipv6TcpConnLocalAddress OBJECT-TYPE
+   SYNTAX     Ipv6Address
+   MAX-ACCESS not-accessible
+   STATUS     current
+   DESCRIPTION
+        "The local IPv6 address for this TCP connection. In
+         the case of a connection in the listen state which
+         is willing to accept connections for any IPv6
+         address associated with the managed node, the value
+         ::0 is used."
+   ::= { ipv6TcpConnEntry 1 }
+
+ipv6TcpConnLocalPort OBJECT-TYPE
+   SYNTAX     INTEGER (0..65535)
+   MAX-ACCESS not-accessible
+   STATUS     current
+   DESCRIPTION
+        "The local port number for this TCP connection."
+   ::= { ipv6TcpConnEntry 2 }
+
+ipv6TcpConnRemAddress OBJECT-TYPE
+   SYNTAX     Ipv6Address
+   MAX-ACCESS not-accessible
+   STATUS     current
+   DESCRIPTION
+        "The remote IPv6 address for this TCP connection."
+   ::= { ipv6TcpConnEntry 3 }
+
+ipv6TcpConnRemPort OBJECT-TYPE
+   SYNTAX     INTEGER (0..65535)
+   MAX-ACCESS not-accessible
+   STATUS     current
+   DESCRIPTION
+        "The remote port number for this TCP connection."
+   ::= { ipv6TcpConnEntry 4 }
+
+ipv6TcpConnIfIndex OBJECT-TYPE
+   SYNTAX     Ipv6IfIndexOrZero
+   MAX-ACCESS not-accessible
+   STATUS     current
+   DESCRIPTION
+        "An index object used to disambiguate conceptual rows in
+         the table, since the connection 4-tuple may not be unique.
+
+         If the connection's remote address (ipv6TcpConnRemAddress)
+         is a link-local address and the connection's local address
+
+
+
+Daniele                     Standards Track                     [Page 5]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+         (ipv6TcpConnLocalAddress) is not a link-local address, this
+         object identifies a local interface on the same link as
+         the connection's remote link-local address.
+
+         Otherwise, this object identifies the local interface that
+         is associated with the ipv6TcpConnLocalAddress for this
+         TCP connection.  If such a local interface cannot be determined,
+         this object should take on the value 0.  (A possible example
+         of this would be if the value of ipv6TcpConnLocalAddress is ::0.)
+
+         The interface identified by a particular non-0 value of this
+         index is the same interface as identified by the same value
+         of ipv6IfIndex.
+
+         The value of this object must remain constant during the life
+         of the TCP connection."
+   ::= { ipv6TcpConnEntry 5 }
+
+ipv6TcpConnState OBJECT-TYPE
+   SYNTAX     INTEGER {
+        closed(1),
+        listen(2),
+        synSent(3),
+        synReceived(4),
+        established(5),
+        finWait1(6),
+        finWait2(7),
+        closeWait(8),
+        lastAck(9),
+        closing(10),
+        timeWait(11),
+        deleteTCB(12) }
+   MAX-ACCESS read-write
+   STATUS     current
+   DESCRIPTION
+        "The state of this TCP connection.
+
+         The only value which may be set by a management station is
+         deleteTCB(12).  Accordingly, it is appropriate for an agent
+         to return an error response (`badValue' for SNMPv1, 'wrongValue'
+         for SNMPv2) if a management station attempts to set this
+         object to any other value.
+
+         If a management station sets this object to the value
+         deleteTCB(12), then this has the effect of deleting the TCB
+         (as defined in RFC 793) of the corresponding connection on
+         the managed node, resulting in immediate termination of the
+         connection.
+
+
+
+Daniele                     Standards Track                     [Page 6]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+         As an implementation-specific option, a RST segment may be
+         sent from the managed node to the other TCP endpoint (note
+         however that RST segments are not sent reliably)."
+   ::= { ipv6TcpConnEntry 6 }
+
+--
+-- conformance information
+--
+
+ipv6TcpConformance OBJECT IDENTIFIER ::= { ipv6TcpMIB 2 }
+
+ipv6TcpCompliances OBJECT IDENTIFIER ::= { ipv6TcpConformance 1 }
+ipv6TcpGroups      OBJECT IDENTIFIER ::= { ipv6TcpConformance 2 }
+
+-- compliance statements
+
+ipv6TcpCompliance MODULE-COMPLIANCE
+   STATUS  current
+   DESCRIPTION
+        "The compliance statement for SNMPv2 entities which
+         implement TCP over IPv6."
+   MODULE  -- this module
+   MANDATORY-GROUPS { ipv6TcpGroup }
+   ::= { ipv6TcpCompliances 1 }
+
+ipv6TcpGroup OBJECT-GROUP
+   OBJECTS   { -- these are defined in this module
+               -- ipv6TcpConnLocalAddress (not-accessible)
+               -- ipv6TcpConnLocalPort (not-accessible)
+               -- ipv6TcpConnRemAddress (not-accessible)
+               -- ipv6TcpConnRemPort (not-accessible)
+               -- ipv6TcpConnIfIndex (not-accessible)
+               ipv6TcpConnState }
+   STATUS    current
+   DESCRIPTION
+        "The group of objects providing management of
+         TCP over IPv6."
+   ::= { ipv6TcpGroups 1 }
+
+END
+
+
+
+
+
+
+
+
+
+
+
+Daniele                     Standards Track                     [Page 7]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+7.  Acknowledgments
+
+   This memo is a product of the IPng work group, and benefited
+   especially from the contributions of the following working group
+   members:
+
+      Dimitry Haskin          Bay Networks
+      Margaret Forsythe       Epilogue
+      Tim Hartrick            Mentat
+      Frank Solensky          FTP
+      Jack McCann             DEC
+
+8.  References
+
+   [1]           Information processing systems - Open Systems
+                 Interconnection - Specification of Abstract Syntax
+                 Notation One (ASN.1), International Organization for
+                 Standardization.  International Standard 8824,
+                 (December, 1987).
+
+   [2]           McCloghrie, K., Editor, "Structure of Management
+                 Information for version 2 of the Simple Network
+                 Management Protocol (SNMPv2)", RFC 1902, January 1996.
+
+   [TCP MIB]     SNMPv2 Working Group, McCloghrie, K., Editor, "SNMPv2
+                 Management Information Base for the Transmission
+                 Control Protocol using SMIv2", RFC 2012, November 1996.
+
+   [IPV6 MIB TC] Haskin, D., and S. Onishi, "Management Information
+                 Base for IP Version 6: Textual Conventions and General
+                 Group", RFC 2465, December 1998.
+
+   [IPV6]        Deering, S., and R. Hinden, "Internet Protocol, Version
+                 6 (IPv6) Specification", RFC 2460, December 1998.
+
+   [RFC2274]     Blumenthal, U., and B. Wijnen, "The User-Based Security
+                 Model for Version 3 of the Simple Network Management
+                 Protocol (SNMPv3)", RFC 2274, January 1998.
+
+   [RFC2275]     Wijnen, B., Presuhn, R., and K. McCloghrie, "View-based
+                 Access Control Model for the Simple Network Management
+                 Protocol (SNMP)", RFC 2275, January 1998.
+
+9.  Security Considerations
+
+   This MIB contains a management object that has a MAX-ACCESS clause of
+   read-write and/or read-create.  In particular, it is possible to
+   delete individual TCP control blocks (i.e., connections).
+
+
+
+Daniele                     Standards Track                     [Page 8]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+   Consequently, anyone having the ability to issue a SET on this object
+   can impact the operation of the node.
+
+   There are a number of managed objects in this MIB that may be
+   considered to contain sensitive information in some environments.
+   For example, the MIB identifies the active TCP connections on the
+   node.  Although this information might be considered sensitive in
+   some environments (i.e., to identify ports on which to launch
+   denial-of-service or other attacks), there are already other ways of
+   obtaining similar information.  For example, sending a random TCP
+   packet to an unused port prompts the generation of a TCP reset
+   message.
+
+   Therefore, it may be important in some environments to control read
+   and/or write access to these objects and possibly to even encrypt the
+   values of these object when sending them over the network via SNMP.
+   Not all versions of SNMP provide features for such a secure
+   environment.  SNMPv1 by itself does not provide encryption or strong
+   authentication.
+
+   It is recommended that the implementors consider the security
+   features as provided by the SNMPv3 framework.  Specifically, the use
+   of the User-based Security Model [RFC2274] and the View-based Access
+   Control Model [RFC2275] is recommended.
+
+   It is then a customer/user responsibility to ensure that the SNMP
+   entity giving access to an instance of this MIB, is properly
+   configured to give access to those objects only to those principals
+   (users) that have legitimate rights to access them.
+
+10. Author's Address
+
+   Mike Daniele
+   Compaq Computer Corporation
+   110 Spit Brook Rd
+   Nashua, NH 03062
+
+   Phone: +1-603-884-1423
+   EMail: daniele@zk3.dec.com
+
+
+
+
+
+
+
+
+
+
+
+
+Daniele                     Standards Track                     [Page 9]
+
+RFC 2452                    TCP MIB for IPv6               December 1998
+
+
+11.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (1998).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Daniele                     Standards Track                    [Page 10]
+
--- a/kernel/picotcp/RFC/rfc2460.txt
+++ b/kernel/picotcp/RFC/rfc2460.txt
--- a/kernel/picotcp/RFC/rfc2474.txt
+++ b/kernel/picotcp/RFC/rfc2474.txt
--- a/kernel/picotcp/RFC/rfc2488.txt
+++ b/kernel/picotcp/RFC/rfc2488.txt
--- a/kernel/picotcp/RFC/rfc2525.txt
+++ b/kernel/picotcp/RFC/rfc2525.txt
--- a/kernel/picotcp/RFC/rfc2581.txt
+++ b/kernel/picotcp/RFC/rfc2581.txt
@ -0,0 +1,787 @@
+
+
+
+
+
+
+Network Working Group                                          M. Allman
+Request for Comments: 2581                  NASA Glenn/Sterling Software
+Obsoletes: 2001                                                V. Paxson
+Category: Standards Track                                   ACIRI / ICSI
+                                                              W. Stevens
+                                                              Consultant
+                                                              April 1999
+
+
+                         TCP Congestion Control
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1999).  All Rights Reserved.
+
+Abstract
+
+   This document defines TCP's four intertwined congestion control
+   algorithms: slow start, congestion avoidance, fast retransmit, and
+   fast recovery.  In addition, the document specifies how TCP should
+   begin transmission after a relatively long idle period, as well as
+   discussing various acknowledgment generation methods.
+
+1. Introduction
+
+   This document specifies four TCP [Pos81] congestion control
+   algorithms: slow start, congestion avoidance, fast retransmit and
+   fast recovery.  These algorithms were devised in [Jac88] and [Jac90].
+   Their use with TCP is standardized in [Bra89].
+
+   This document is an update of [Ste97].  In addition to specifying the
+   congestion control algorithms, this document specifies what TCP
+   connections should do after a relatively long idle period, as well as
+   specifying and clarifying some of the issues pertaining to TCP ACK
+   generation.
+
+   Note that [Ste94] provides examples of these algorithms in action and
+   [WS95] provides an explanation of the source code for the BSD
+   implementation of these algorithms.
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 1]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   This document is organized as follows.  Section 2 provides various
+   definitions which will be used throughout the document.  Section 3
+   provides a specification of the congestion control algorithms.
+   Section 4 outlines concerns related to the congestion control
+   algorithms and finally, section 5 outlines security considerations.
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in [Bra97].
+
+2. Definitions
+
+   This section provides the definition of several terms that will be
+   used throughout the remainder of this document.
+
+   SEGMENT:
+      A segment is ANY TCP/IP data or acknowledgment packet (or both).
+
+   SENDER MAXIMUM SEGMENT SIZE (SMSS):  The SMSS is the size of the
+      largest segment that the sender can transmit.  This value can be
+      based on the maximum transmission unit of the network, the path
+      MTU discovery [MD90] algorithm, RMSS (see next item), or other
+      factors.  The size does not include the TCP/IP headers and
+      options.
+
+   RECEIVER MAXIMUM SEGMENT SIZE (RMSS):  The RMSS is the size of the
+      largest segment the receiver is willing to accept.  This is the
+      value specified in the MSS option sent by the receiver during
+      connection startup.  Or, if the MSS option is not used, 536 bytes
+      [Bra89].  The size does not include the TCP/IP headers and
+      options.
+
+   FULL-SIZED SEGMENT: A segment that contains the maximum number of
+      data bytes permitted (i.e., a segment containing SMSS bytes of
+      data).
+
+   RECEIVER WINDOW (rwnd) The most recently advertised receiver window.
+
+   CONGESTION WINDOW (cwnd):  A TCP state variable that limits the
+      amount of data a TCP can send.  At any given time, a TCP MUST NOT
+      send data with a sequence number higher than the sum of the
+      highest acknowledged sequence number and the minimum of cwnd and
+      rwnd.
+
+   INITIAL WINDOW (IW):  The initial window is the size of the sender's
+      congestion window after the three-way handshake is completed.
+
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 2]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   LOSS WINDOW (LW):  The loss window is the size of the congestion
+      window after a TCP sender detects loss using its retransmission
+      timer.
+
+   RESTART WINDOW (RW):  The restart window is the size of the
+      congestion window after a TCP restarts transmission after an idle
+      period (if the slow start algorithm is used; see section 4.1 for
+      more discussion).
+
+   FLIGHT SIZE:  The amount of data that has been sent but not yet
+      acknowledged.
+
+3. Congestion Control Algorithms
+
+   This section defines the four congestion control algorithms: slow
+   start, congestion avoidance, fast retransmit and fast recovery,
+   developed in [Jac88] and [Jac90].  In some situations it may be
+   beneficial for a TCP sender to be more conservative than the
+   algorithms allow, however a TCP MUST NOT be more aggressive than the
+   following algorithms allow (that is, MUST NOT send data when the
+   value of cwnd computed by the following algorithms would not allow
+   the data to be sent).
+
+3.1 Slow Start and Congestion Avoidance
+
+   The slow start and congestion avoidance algorithms MUST be used by a
+   TCP sender to control the amount of outstanding data being injected
+   into the network.  To implement these algorithms, two variables are
+   added to the TCP per-connection state.  The congestion window (cwnd)
+   is a sender-side limit on the amount of data the sender can transmit
+   into the network before receiving an acknowledgment (ACK), while the
+   receiver's advertised window (rwnd) is a receiver-side limit on the
+   amount of outstanding data.  The minimum of cwnd and rwnd governs
+   data transmission.
+
+   Another state variable, the slow start threshold (ssthresh), is used
+   to determine whether the slow start or congestion avoidance algorithm
+   is used to control data transmission, as discussed below.
+
+   Beginning transmission into a network with unknown conditions
+   requires TCP to slowly probe the network to determine the available
+   capacity, in order to avoid congesting the network with an
+   inappropriately large burst of data.  The slow start algorithm is
+   used for this purpose at the beginning of a transfer, or after
+   repairing loss detected by the retransmission timer.
+
+
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 3]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   IW, the initial value of cwnd, MUST be less than or equal to 2*SMSS
+   bytes and MUST NOT be more than 2 segments.
+
+   We note that a non-standard, experimental TCP extension allows that a
+   TCP MAY use a larger initial window (IW), as defined in equation 1
+   [AFP98]:
+
+      IW = min (4*SMSS, max (2*SMSS, 4380 bytes))           (1)
+
+   With this extension, a TCP sender MAY use a 3 or 4 segment initial
+   window, provided the combined size of the segments does not exceed
+   4380 bytes.  We do NOT allow this change as part of the standard
+   defined by this document.  However, we include discussion of (1) in
+   the remainder of this document as a guideline for those experimenting
+   with the change, rather than conforming to the present standards for
+   TCP congestion control.
+
+   The initial value of ssthresh MAY be arbitrarily high (for example,
+   some implementations use the size of the advertised window), but it
+   may be reduced in response to congestion.  The slow start algorithm
+   is used when cwnd < ssthresh, while the congestion avoidance
+   algorithm is used when cwnd > ssthresh.  When cwnd and ssthresh are
+   equal the sender may use either slow start or congestion avoidance.
+
+   During slow start, a TCP increments cwnd by at most SMSS bytes for
+   each ACK received that acknowledges new data.  Slow start ends when
+   cwnd exceeds ssthresh (or, optionally, when it reaches it, as noted
+   above) or when congestion is observed.
+
+   During congestion avoidance, cwnd is incremented by 1 full-sized
+   segment per round-trip time (RTT).  Congestion avoidance continues
+   until congestion is detected.  One formula commonly used to update
+   cwnd during congestion avoidance is given in equation 2:
+
+      cwnd += SMSS*SMSS/cwnd                     (2)
+
+   This adjustment is executed on every incoming non-duplicate ACK.
+   Equation (2) provides an acceptable approximation to the underlying
+   principle of increasing cwnd by 1 full-sized segment per RTT.  (Note
+   that for a connection in which the receiver acknowledges every data
+   segment, (2) proves slightly more aggressive than 1 segment per RTT,
+   and for a receiver acknowledging every-other packet, (2) is less
+   aggressive.)
+
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 4]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   Implementation Note: Since integer arithmetic is usually used in TCP
+   implementations, the formula given in equation 2 can fail to increase
+   cwnd when the congestion window is very large (larger than
+   SMSS*SMSS).  If the above formula yields 0, the result SHOULD be
+   rounded up to 1 byte.
+
+   Implementation Note: older implementations have an additional
+   additive constant on the right-hand side of equation (2).  This is
+   incorrect and can actually lead to diminished performance [PAD+98].
+
+   Another acceptable way to increase cwnd during congestion avoidance
+   is to count the number of bytes that have been acknowledged by ACKs
+   for new data.  (A drawback of this implementation is that it requires
+   maintaining an additional state variable.)  When the number of bytes
+   acknowledged reaches cwnd, then cwnd can be incremented by up to SMSS
+   bytes.  Note that during congestion avoidance, cwnd MUST NOT be
+   increased by more than the larger of either 1 full-sized segment per
+   RTT, or the value computed using equation 2.
+
+   Implementation Note: some implementations maintain cwnd in units of
+   bytes, while others in units of full-sized segments.  The latter will
+   find equation (2) difficult to use, and may prefer to use the
+   counting approach discussed in the previous paragraph.
+
+   When a TCP sender detects segment loss using the retransmission
+   timer, the value of ssthresh MUST be set to no more than the value
+   given in equation 3:
+
+      ssthresh = max (FlightSize / 2, 2*SMSS)            (3)
+
+   As discussed above, FlightSize is the amount of outstanding data in
+   the network.
+
+   Implementation Note: an easy mistake to make is to simply use cwnd,
+   rather than FlightSize, which in some implementations may
+   incidentally increase well beyond rwnd.
+
+   Furthermore, upon a timeout cwnd MUST be set to no more than the loss
+   window, LW, which equals 1 full-sized segment (regardless of the
+   value of IW).  Therefore, after retransmitting the dropped segment
+   the TCP sender uses the slow start algorithm to increase the window
+   from 1 full-sized segment to the new value of ssthresh, at which
+   point congestion avoidance again takes over.
+
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 5]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+3.2 Fast Retransmit/Fast Recovery
+
+   A TCP receiver SHOULD send an immediate duplicate ACK when an out-
+   of-order segment arrives.  The purpose of this ACK is to inform the
+   sender that a segment was received out-of-order and which sequence
+   number is expected.  From the sender's perspective, duplicate ACKs
+   can be caused by a number of network problems.  First, they can be
+   caused by dropped segments.  In this case, all segments after the
+   dropped segment will trigger duplicate ACKs.  Second, duplicate ACKs
+   can be caused by the re-ordering of data segments by the network (not
+   a rare event along some network paths [Pax97]).  Finally, duplicate
+   ACKs can be caused by replication of ACK or data segments by the
+   network.  In addition, a TCP receiver SHOULD send an immediate ACK
+   when the incoming segment fills in all or part of a gap in the
+   sequence space.  This will generate more timely information for a
+   sender recovering from a loss through a retransmission timeout, a
+   fast retransmit, or an experimental loss recovery algorithm, such as
+   NewReno [FH98].
+
+   The TCP sender SHOULD use the "fast retransmit" algorithm to detect
+   and repair loss, based on incoming duplicate ACKs.  The fast
+   retransmit algorithm uses the arrival of 3 duplicate ACKs (4
+   identical ACKs without the arrival of any other intervening packets)
+   as an indication that a segment has been lost.  After receiving 3
+   duplicate ACKs, TCP performs a retransmission of what appears to be
+   the missing segment, without waiting for the retransmission timer to
+   expire.
+
+   After the fast retransmit algorithm sends what appears to be the
+   missing segment, the "fast recovery" algorithm governs the
+   transmission of new data until a non-duplicate ACK arrives.  The
+   reason for not performing slow start is that the receipt of the
+   duplicate ACKs not only indicates that a segment has been lost, but
+   also that segments are most likely leaving the network (although a
+   massive segment duplication by the network can invalidate this
+   conclusion).  In other words, since the receiver can only generate a
+   duplicate ACK when a segment has arrived, that segment has left the
+   network and is in the receiver's buffer, so we know it is no longer
+   consuming network resources.  Furthermore, since the ACK "clock"
+   [Jac88] is preserved, the TCP sender can continue to transmit new
+   segments (although transmission must continue using a reduced cwnd).
+
+   The fast retransmit and fast recovery algorithms are usually
+   implemented together as follows.
+
+   1.  When the third duplicate ACK is received, set ssthresh to no more
+       than the value given in equation 3.
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 6]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   2.  Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS.
+       This artificially "inflates" the congestion window by the number
+       of segments (three) that have left the network and which the
+       receiver has buffered.
+
+   3.  For each additional duplicate ACK received, increment cwnd by
+       SMSS.  This artificially inflates the congestion window in order
+       to reflect the additional segment that has left the network.
+
+   4.  Transmit a segment, if allowed by the new value of cwnd and the
+       receiver's advertised window.
+
+   5.  When the next ACK arrives that acknowledges new data, set cwnd to
+       ssthresh (the value set in step 1).  This is termed "deflating"
+       the window.
+
+       This ACK should be the acknowledgment elicited by the
+       retransmission from step 1, one RTT after the retransmission
+       (though it may arrive sooner in the presence of significant out-
+       of-order delivery of data segments at the receiver).
+       Additionally, this ACK should acknowledge all the intermediate
+       segments sent between the lost segment and the receipt of the
+       third duplicate ACK, if none of these were lost.
+
+   Note: This algorithm is known to generally not recover very
+   efficiently from multiple losses in a single flight of packets
+   [FF96].  One proposed set of modifications to address this problem
+   can be found in [FH98].
+
+4. Additional Considerations
+
+4.1 Re-starting Idle Connections
+
+   A known problem with the TCP congestion control algorithms described
+   above is that they allow a potentially inappropriate burst of traffic
+   to be transmitted after TCP has been idle for a relatively long
+   period of time.  After an idle period, TCP cannot use the ACK clock
+   to strobe new segments into the network, as all the ACKs have drained
+   from the network.  Therefore, as specified above, TCP can potentially
+   send a cwnd-size line-rate burst into the network after an idle
+   period.
+
+   [Jac88] recommends that a TCP use slow start to restart transmission
+   after a relatively long idle period.  Slow start serves to restart
+   the ACK clock, just as it does at the beginning of a transfer.  This
+   mechanism has been widely deployed in the following manner.  When TCP
+   has not received a segment for more than one retransmission timeout,
+   cwnd is reduced to the value of the restart window (RW) before
+
+
+
+Allman, et. al.             Standards Track                     [Page 7]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   transmission begins.
+
+   For the purposes of this standard, we define RW = IW.
+
+   We note that the non-standard experimental extension to TCP defined
+   in [AFP98] defines RW = min(IW, cwnd), with the definition of IW
+   adjusted per equation (1) above.
+
+   Using the last time a segment was received to determine whether or
+   not to decrease cwnd fails to deflate cwnd in the common case of
+   persistent HTTP connections [HTH98].  In this case, a WWW server
+   receives a request before transmitting data to the WWW browser.  The
+   reception of the request makes the test for an idle connection fail,
+   and allows the TCP to begin transmission with a possibly
+   inappropriately large cwnd.
+
+   Therefore, a TCP SHOULD set cwnd to no more than RW before beginning
+   transmission if the TCP has not sent data in an interval exceeding
+   the retransmission timeout.
+
+4.2 Generating Acknowledgments
+
+   The delayed ACK algorithm specified in [Bra89] SHOULD be used by a
+   TCP receiver.  When used, a TCP receiver MUST NOT excessively delay
+   acknowledgments.  Specifically, an ACK SHOULD be generated for at
+   least every second full-sized segment, and MUST be generated within
+   500 ms of the arrival of the first unacknowledged packet.
+
+   The requirement that an ACK "SHOULD" be generated for at least every
+   second full-sized segment is listed in [Bra89] in one place as a
+   SHOULD and another as a MUST.  Here we unambiguously state it is a
+   SHOULD.  We also emphasize that this is a SHOULD, meaning that an
+   implementor should indeed only deviate from this requirement after
+   careful consideration of the implications.  See the discussion of
+   "Stretch ACK violation" in [PAD+98] and the references therein for a
+   discussion of the possible performance problems with generating ACKs
+   less frequently than every second full-sized segment.
+
+   In some cases, the sender and receiver may not agree on what
+   constitutes a full-sized segment.  An implementation is deemed to
+   comply with this requirement if it sends at least one acknowledgment
+   every time it receives 2*RMSS bytes of new data from the sender,
+   where RMSS is the Maximum Segment Size specified by the receiver to
+   the sender (or the default value of 536 bytes, per [Bra89], if the
+   receiver does not specify an MSS option during connection
+   establishment).  The sender may be forced to use a segment size less
+   than RMSS due to the maximum transmission unit (MTU), the path MTU
+   discovery algorithm or other factors.  For instance, consider the
+
+
+
+Allman, et. al.             Standards Track                     [Page 8]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   case when the receiver announces an RMSS of X bytes but the sender
+   ends up using a segment size of Y bytes (Y < X) due to path MTU
+   discovery (or the sender's MTU size).  The receiver will generate
+   stretch ACKs if it waits for 2*X bytes to arrive before an ACK is
+   sent.  Clearly this will take more than 2 segments of size Y bytes.
+   Therefore, while a specific algorithm is not defined, it is desirable
+   for receivers to attempt to prevent this situation, for example by
+   acknowledging at least every second segment, regardless of size.
+   Finally, we repeat that an ACK MUST NOT be delayed for more than 500
+   ms waiting on a second full-sized segment to arrive.
+
+   Out-of-order data segments SHOULD be acknowledged immediately, in
+   order to accelerate loss recovery.  To trigger the fast retransmit
+   algorithm, the receiver SHOULD send an immediate duplicate ACK when
+   it receives a data segment above a gap in the sequence space.  To
+   provide feedback to senders recovering from losses, the receiver
+   SHOULD send an immediate ACK when it receives a data segment that
+   fills in all or part of a gap in the sequence space.
+
+   A TCP receiver MUST NOT generate more than one ACK for every incoming
+   segment, other than to update the offered window as the receiving
+   application consumes new data [page 42, Pos81][Cla82].
+
+4.3 Loss Recovery Mechanisms
+
+   A number of loss recovery algorithms that augment fast retransmit and
+   fast recovery have been suggested by TCP researchers.  While some of
+   these algorithms are based on the TCP selective acknowledgment (SACK)
+   option [MMFR96], such as [FF96,MM96a,MM96b], others do not require
+   SACKs [Hoe96,FF96,FH98].  The non-SACK algorithms use "partial
+   acknowledgments" (ACKs which cover new data, but not all the data
+   outstanding when loss was detected) to trigger retransmissions.
+   While this document does not standardize any of the specific
+   algorithms that may improve fast retransmit/fast recovery, these
+   enhanced algorithms are implicitly allowed, as long as they follow
+   the general principles of the basic four algorithms outlined above.
+
+   Therefore, when the first loss in a window of data is detected,
+   ssthresh MUST be set to no more than the value given by equation (3).
+   Second, until all lost segments in the window of data in question are
+   repaired, the number of segments transmitted in each RTT MUST be no
+   more than half the number of outstanding segments when the loss was
+   detected.  Finally, after all loss in the given window of segments
+   has been successfully retransmitted, cwnd MUST be set to no more than
+   ssthresh and congestion avoidance MUST be used to further increase
+   cwnd.  Loss in two successive windows of data, or the loss of a
+   retransmission, should be taken as two indications of congestion and,
+   therefore, cwnd (and ssthresh) MUST be lowered twice in this case.
+
+
+
+Allman, et. al.             Standards Track                     [Page 9]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   The algorithms outlined in [Hoe96,FF96,MM96a,MM6b] follow the
+   principles of the basic four congestion control algorithms outlined
+   in this document.
+
+5.  Security Considerations
+
+   This document requires a TCP to diminish its sending rate in the
+   presence of retransmission timeouts and the arrival of duplicate
+   acknowledgments.  An attacker can therefore impair the performance of
+   a TCP connection by either causing data packets or their
+   acknowledgments to be lost, or by forging excessive duplicate
+   acknowledgments.  Causing two congestion control events back-to-back
+   will often cut ssthresh to its minimum value of 2*SMSS, causing the
+   connection to immediately enter the slower-performing congestion
+   avoidance phase.
+
+   The Internet to a considerable degree relies on the correct
+   implementation of these algorithms in order to preserve network
+   stability and avoid congestion collapse.  An attacker could cause TCP
+   endpoints to respond more aggressively in the face of congestion by
+   forging excessive duplicate acknowledgments or excessive
+   acknowledgments for new data.  Conceivably, such an attack could
+   drive a portion of the network into congestion collapse.
+
+6.  Changes Relative to RFC 2001
+
+   This document has been extensively rewritten editorially and it is
+   not feasible to itemize the list of changes between the two
+   documents. The intention of this document is not to change any of the
+   recommendations given in RFC 2001, but to further clarify cases that
+   were not discussed in detail in 2001. Specifically, this document
+   suggests what TCP connections should do after a relatively long idle
+   period, as well as specifying and clarifying some of the issues
+   pertaining to TCP ACK generation.  Finally, the allowable upper bound
+   for the initial congestion window has also been raised from one to
+   two segments.
+
+Acknowledgments
+
+   The four algorithms that are described were developed by Van
+   Jacobson.
+
+   Some of the text from this document is taken from "TCP/IP
+   Illustrated, Volume 1: The Protocols" by W. Richard Stevens
+   (Addison-Wesley, 1994) and "TCP/IP Illustrated, Volume 2: The
+   Implementation" by Gary R. Wright and W.  Richard Stevens (Addison-
+   Wesley, 1995).  This material is used with the permission of
+   Addison-Wesley.
+
+
+
+Allman, et. al.             Standards Track                    [Page 10]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   Neal Cardwell, Sally Floyd, Craig Partridge and Joe Touch contributed
+   a number of helpful suggestions.
+
+References
+
+   [AFP98]  Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's
+            Initial Window Size, RFC 2414, September 1998.
+
+   [Bra89]  Braden, R., "Requirements for Internet Hosts --
+            Communication Layers", STD 3, RFC 1122, October 1989.
+
+   [Bra97]  Bradner, S., "Key words for use in RFCs to Indicate
+            Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [Cla82]  Clark, D., "Window and Acknowledgment Strategy in TCP", RFC
+            813, July 1982.
+
+   [FF96]   Fall, K. and S. Floyd, "Simulation-based Comparisons of
+            Tahoe, Reno and SACK TCP", Computer Communication Review,
+            July 1996. ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.
+
+   [FH98]   Floyd, S. and T. Henderson, "The NewReno Modification to
+            TCP's Fast Recovery Algorithm", RFC 2582, April 1999.
+
+   [Flo94]  Floyd, S., "TCP and Successive Fast Retransmits. Technical
+            report", October 1994.
+            ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.
+
+   [Hoe96]  Hoe, J., "Improving the Start-up Behavior of a Congestion
+            Control Scheme for TCP", In ACM SIGCOMM, August 1996.
+
+   [HTH98]  Hughes, A., Touch, J. and J. Heidemann, "Issues in TCP
+            Slow-Start Restart After Idle", Work in Progress.
+
+   [Jac88]  Jacobson, V., "Congestion Avoidance and Control", Computer
+            Communication Review, vol. 18, no. 4, pp. 314-329, Aug.
+            1988.  ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
+
+   [Jac90]  Jacobson, V., "Modified TCP Congestion Avoidance Algorithm",
+            end2end-interest mailing list, April 30, 1990.
+            ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail.
+
+   [MD90]   Mogul, J. and S. Deering, "Path MTU Discovery", RFC 1191,
+            November 1990.
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 11]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+   [MM96a]  Mathis, M. and J. Mahdavi, "Forward Acknowledgment: Refining
+            TCP Congestion Control", Proceedings of SIGCOMM'96, August,
+            1996, Stanford, CA.  Available
+            fromhttp://www.psc.edu/networking/papers/papers.html
+
+   [MM96b]  Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding
+            Parameters", Technical report.  Available from
+            http://www.psc.edu/networking/papers/FACKnotes/current.
+
+   [MMFR96] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
+            Selective Acknowledgement Options", RFC 2018, October 1996.
+
+   [PAD+98] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, J.,
+            Heavens, I., Lahey, K., Semke, J. and B. Volz, "Known TCP
+            Implementation Problems", RFC 2525, March 1999.
+
+   [Pax97]  Paxson, V., "End-to-End Internet Packet Dynamics",
+            Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997.
+
+   [Pos81]  Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
+            September 1981.
+
+   [Ste94]  Stevens, W., "TCP/IP Illustrated, Volume 1: The Protocols",
+            Addison-Wesley, 1994.
+
+   [Ste97]  Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast
+            Retransmit, and Fast Recovery Algorithms", RFC 2001, January
+            1997.
+
+   [WS95]   Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2:
+            The Implementation", Addison-Wesley, 1995.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 12]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+Authors' Addresses
+
+   Mark Allman
+   NASA Glenn Research Center/Sterling Software
+   Lewis Field
+   21000 Brookpark Rd.  MS 54-2
+   Cleveland, OH  44135
+   216-433-6586
+
+   EMail: mallman@grc.nasa.gov
+   http://roland.grc.nasa.gov/~mallman
+
+
+   Vern Paxson
+   ACIRI / ICSI
+   1947 Center Street
+   Suite 600
+   Berkeley, CA 94704-1198
+
+   Phone: +1 510/642-4274 x302
+   EMail: vern@aciri.org
+
+
+   W. Richard Stevens
+   1202 E. Paseo del Zorro
+   Tucson, AZ  85718
+   520-297-9416
+
+   EMail: rstevens@kohala.com
+   http://www.kohala.com/~rstevens
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 13]
+
+RFC 2581                 TCP Congestion Control               April 1999
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (1999).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 14]
+
--- a/kernel/picotcp/RFC/rfc2675.txt
+++ b/kernel/picotcp/RFC/rfc2675.txt
@ -0,0 +1,507 @@
+
+
+
+
+
+
+Network Working Group                                          D. Borman
+Request for Comments: 2675                      Berkeley Software Design
+Obsoletes: 2147                                               S. Deering
+Category: Standards Track                                          Cisco
+                                                               R. Hinden
+                                                                   Nokia
+                                                             August 1999
+                            IPv6 Jumbograms
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1999).  All Rights Reserved.
+
+Abstract
+
+   A "jumbogram" is an IPv6 packet containing a payload longer than
+   65,535 octets.  This document describes the IPv6 Jumbo Payload
+   option, which provides the means of specifying such large payload
+   lengths.  It also describes the changes needed to TCP and UDP to make
+   use of jumbograms.
+
+   Jumbograms are relevant only to IPv6 nodes that may be attached to
+   links with a link MTU greater than 65,575 octets, and need not be
+   implemented or understood by IPv6 nodes that do not support
+   attachment to links with such large MTUs.
+
+1. Introduction
+
+      jumbo (jum'bO),
+
+          n., pl. -bos, adj.
+          -n.
+          1. a person, animal, or thing very large of its kind.
+          -adj.
+          2. very large: the jumbo box of cereal.
+
+          [1800-10; orig. uncert.; popularized as the name of a large
+           elephant purchased and exhibited by P.T. Barnum in 1882]
+
+                                              -- www.infoplease.com
+
+
+
+Borman, et al.              Standards Track                     [Page 1]
+
+RFC 2675                    IPv6 Jumbograms                  August 1999
+
+
+   The IPv6 header [IPv6] has a 16-bit Payload Length field and,
+   therefore, supports payloads up to 65,535 octets long.  This document
+   specifies an IPv6 hop-by-hop option, called the Jumbo Payload option,
+   that carries a 32-bit length field in order to allow transmission of
+   IPv6 packets with payloads between 65,536 and 4,294,967,295 octets in
+   length.  Packets with such long payloads are referred to as
+   "jumbograms".
+
+   The Jumbo Payload option is relevant only for IPv6 nodes that may be
+   attached to links with a link MTU greater than 65,575 octets (that
+   is, 65,535 + 40, where 40 octets is the size of the IPv6 header).
+   The Jumbo Payload option need not be implemented or understood by
+   IPv6 nodes that do not support attachment to links with MTU greater
+   than 65,575.
+
+   On links with configurable MTUs, the MTU must not be configured to a
+   value greater than 65,575 octets if there are nodes attached to that
+   link that do not support the Jumbo Payload option and it can not be
+   guaranteed that the Jumbo Payload option will not be sent to those
+   nodes.
+
+   The UDP header [UDP] has a 16-bit Length field which prevents it from
+   making use of jumbograms, and though the TCP header [TCP] does not
+   have a Length field, both the TCP MSS option and the TCP Urgent field
+   are constrained to 16 bits.  This document specifies some simple
+   enhancements to TCP and UDP to enable them to make use of jumbograms.
+   An implementation of TCP or UDP on an IPv6 node that supports the
+   Jumbo Payload option must include the enhancements specified here.
+
+   Note: The 16 bit checksum used by UDP and TCP becomes less accurate
+   as the length of the data being checksummed is increased.
+   Application designers may want to take this into consideration.
+
+1.1 Document History
+
+   This document merges and updates material that was previously
+   published in two separate documents:
+
+   -  The specification of the Jumbo Payload option previously appeared
+      as part of the IPv6 specification in RFC 1883.  RFC 1883 has been
+      superseded by RFC 2460, which no longer includes specification of
+      the Jumbo Payload option.
+
+   -  The specification of TCP and UDP enhancements to support
+      jumbograms previously appeared as RFC 2147.  RFC 2147 is obsoleted
+      by this document.
+
+
+
+
+
+Borman, et al.              Standards Track                     [Page 2]
+
+RFC 2675                    IPv6 Jumbograms                  August 1999
+
+
+2. Format of the Jumbo Payload Option
+
+   The Jumbo Payload option is carried in an IPv6 Hop-by-Hop Options
+   header, immediately following the IPv6 header.  This option has an
+   alignment requirement of 4n + 2.  (See [IPv6, Section 4.2] for
+   discussion of option alignment.)  The option has the following
+   format:
+
+                                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                                   |  Option Type  |  Opt Data Len |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                     Jumbo Payload Length                      |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Option Type           8-bit value C2 (hexadecimal).
+
+   Opt Data Len          8-bit value 4.
+
+   Jumbo Payload Length  32-bit unsigned integer.  Length of the IPv6
+                         packet in octets, excluding the IPv6 header
+                         but including the Hop-by-Hop Options header
+                         and any other extension headers present.
+                         Must be greater than 65,535.
+
+3. Usage of the Jumbo Payload Option
+
+   The Payload Length field in the IPv6 header must be set to zero in
+   every packet that carries the Jumbo Payload option.
+
+   If a node that understands the Jumbo Payload option receives a packet
+   whose IPv6 header carries a Payload Length of zero and a Next Header
+   value of zero (meaning that a Hop-by-Hop Options header follows), and
+   whose link-layer framing indicates the presence of octets beyond the
+   IPv6 header, the node must proceed to process the Hop-by-Hop Options
+   header in order to determine the actual length of the payload from
+   the Jumbo Payload option.
+
+   The Jumbo Payload option must not be used in a packet that carries a
+   Fragment header.
+
+   Higher-layer protocols that use the IPv6 Payload Length field to
+   compute the value of the Upper-Layer Packet Length field in the
+   checksum pseudo-header described in [IPv6, Section 8.1] must instead
+   use the Jumbo Payload Length field for that computation, for packets
+   that carry the Jumbo Payload option.
+
+
+
+
+
+
+Borman, et al.              Standards Track                     [Page 3]
+
+RFC 2675                    IPv6 Jumbograms                  August 1999
+
+
+   Nodes that understand the Jumbo Payload option are required to detect
+   a number of possible format errors, and if the erroneous packet was
+   not destined to a multicast address, report the error by sending an
+   ICMP Parameter Problem message [ICMPv6] to the packet's source.   The
+   following list of errors specifies the values to be used in the Code
+   and Pointer fields of the Parameter Problem message:
+
+      error: IPv6 Payload Length = 0 and
+             IPv6 Next Header = Hop-by-Hop Options and
+             Jumbo Payload option not present
+
+             Code: 0
+             Pointer: high-order octet of the IPv6 Payload Length
+
+      error: IPv6 Payload Length != 0 and
+             Jumbo Payload option present
+
+             Code: 0
+             Pointer: Option Type field of the Jumbo Payload option
+
+      error: Jumbo Payload option present and
+             Jumbo Payload Length < 65,536
+
+             Code: 0
+             Pointer: high-order octet of the Jumbo Payload Length
+
+      error: Jumbo Payload option present and
+             Fragment header present
+
+             Code: 0
+             Pointer: high-order octet of the Fragment header.
+
+   A node that does not understand the Jumbo Payload option is expected
+   to respond to erroneously-received jumbograms as follows, according
+   to the IPv6 specification:
+
+      error: IPv6 Payload Length = 0 and
+             IPv6 Next Header = Hop-by-Hop Options
+
+             Code: 0
+             Pointer: high-order octet of the IPv6 Payload Length
+
+      error: IPv6 Payload Length != 0 and
+             Jumbo Payload option present
+
+             Code: 2
+             Pointer: Option Type field of the Jumbo Payload option
+
+
+
+
+Borman, et al.              Standards Track                     [Page 4]
+
+RFC 2675                    IPv6 Jumbograms                  August 1999
+
+
+4. UDP Jumbograms
+
+   The 16-bit Length field of the UDP header limits the total length of
+   a UDP packet (that is, a UDP header plus data) to no greater than
+   65,535 octets.  This document specifies the following modification of
+   UDP to relax that limit: UDP packets longer than 65,535 octets may be
+   sent by setting the UDP Length field to zero, and letting the
+   receiver derive the actual UDP packet length from the IPv6 payload
+   length.  (Note that, prior to this modification, zero was not a legal
+   value for the UDP Length field, because the UDP packet length
+   includes the UDP header and therefore has a minimum value of 8.)
+
+   The specific requirements for sending a UDP jumbogram are as follows:
+
+      When sending a UDP packet, if and only if the length of the UDP
+      header plus UDP data is greater than 65,535, set the Length field
+      in the UDP header to zero.
+
+      The IPv6 packet carrying such a large UDP packet will necessarily
+      include a Jumbo Payload option in a Hop-by-Hop Options header; set
+      the Jumbo Payload Length field of that option to be the actual
+      length of the UDP header plus data, plus the length of all IPv6
+      extension headers present between the IPv6 header and the UDP
+      header.
+
+      For generating the UDP checksum, use the actual length of the UDP
+      header plus data, NOT zero, in the checksum pseudo-header [IPv6,
+      Section 8.1].
+
+   The specific requirements for receiving a UDP jumbogram are as
+   follows:
+
+      When receiving a UDP packet, if and only if the Length field in
+      the UDP header is zero, calculate the actual length of the UDP
+      header plus data from the IPv6 Jumbo Payload Length field minus
+      the length of all extension headers present between the IPv6
+      header and the UDP header.
+
+      In the unexpected case that the UDP Length field is zero but no
+      Jumbo Payload option is present (i.e., the IPv6 packet is not a
+      jumbogram), use the Payload Length field in the IPv6 header, in
+      place of the Jumbo Payload Length field, in the above calculation.
+
+      For verifying the received UDP checksum, use the calculated length
+      of the UDP header plus data, NOT zero, in the checksum pseudo-
+      header.
+
+
+
+
+
+Borman, et al.              Standards Track                     [Page 5]
+
+RFC 2675                    IPv6 Jumbograms                  August 1999
+
+
+5. TCP Jumbograms
+
+   Because there is no length field in the TCP header, there is nothing
+   limiting the length of an individual TCP packet.  However, the MSS
+   value that is negotiated at the beginning of the connection limits
+   the largest TCP packet that can be sent, and the Urgent Pointer
+   cannot reference data beyond 65,535 bytes.
+
+5.1 TCP MSS
+
+   When determining what MSS value to send, if the MTU of the directly
+   attached interface minus 60 [IPv6, Section 8.3] is greater than or
+   equal to 65,535, then set the MSS value to 65,535.
+
+   When an MSS value of 65,535 is received, it is to be treated as
+   infinity.  The actual MSS is determined by subtracting 60 from the
+   value learned by performing Path MTU Discovery [MTU-DISC] over the
+   path to the TCP peer.
+
+5.2 TCP Urgent Pointer
+
+   The Urgent Pointer problem could be fixed by adding a TCP Urgent
+   Pointer Option.  However, since it is unlikely that applications
+   using jumbograms will also use Urgent Pointers, a less intrusive
+   change similar to the MSS change will suffice.
+
+   When a TCP packet is to be sent with an Urgent Pointer (i.e., the URG
+   bit set), first calculate the offset from the Sequence Number to the
+   Urgent Pointer.  If the offset is less than 65,535, fill in the
+   Urgent field and continue with the normal TCP processing.  If the
+   offset is greater than 65,535, and the offset is greater than or
+   equal to the length of the TCP data, fill in the Urgent Pointer with
+   65,535 and continue with the normal TCP processing.  Otherwise, the
+   TCP packet must be split into two pieces.  The first piece contains
+   data up to, but not including the data pointed to by the Urgent
+   Pointer, and the Urgent field is set to 65,535 to indicate that the
+   Urgent Pointer is beyond the end of this packet.  The second piece
+   can then be sent with the Urgent field set normally.
+
+   Note: The first piece does not have to include all of the data up to
+   the Urgent Pointer.  It can be shorter, just as long as it ends
+   within 65,534 bytes of the Urgent Pointer, so that the offset to the
+   Urgent Pointer in the second piece will be less than 65,535 bytes.
+
+   For TCP input processing, when a TCP packet is received with the URG
+   bit set and an Urgent field of 65,535, the Urgent Pointer is
+   calculated using an offset equal to the length of the TCP data,
+   rather than the offset in the Urgent field.
+
+
+
+Borman, et al.              Standards Track                     [Page 6]
+
+RFC 2675                    IPv6 Jumbograms                  August 1999
+
+
+   It should also be noted that though the TCP window is only 16-bits,
+   larger windows can be used through use of the TCP Window Scale option
+   [TCP-EXT].
+
+6. Security Considerations
+
+   The Jumbo Payload option and TCP/UDP jumbograms do not introduce any
+   known new security concerns.
+
+7. Authors' Addresses
+
+   David A. Borman
+   Berkeley Software Design, Inc.
+   4719 Weston Hills Drive
+   Eagan, MN 55123
+   USA
+
+   Phone: +1 612 405 8194
+   EMail: dab@bsdi.com
+
+
+   Stephen E. Deering
+   Cisco Systems, Inc.
+   170 West Tasman Drive
+   San Jose, CA 95134-1706
+   USA
+
+   Phone: +1 408 527 8213
+   EMail: deering@cisco.com
+
+
+   Robert M. Hinden
+   Nokia
+   313 Fairchild Drive
+   Mountain View, CA 94043
+   USA
+
+   Phone: +1 650 625 2004
+   EMail: hinden@iprg.nokia.com
+
+
+
+
+
+
+
+
+
+
+
+
+Borman, et al.              Standards Track                     [Page 7]
+
+RFC 2675                    IPv6 Jumbograms                  August 1999
+
+
+8. References
+
+   [ICMPv6]   Conta, A. and S. Deering, "ICMP for the Internet Protocol
+              Version 6 (IPv6)", RFC 2463, December 1998.
+
+   [IPv6]     Deering, S. and R. Hinden, "Internet Protocol Version 6
+              (IPv6) Specification", RFC 2460, December 1998.
+
+   [MTU-DISC] McCann, J., Deering, S. and J. Mogul, "Path MTU Discovery
+              for IP Version 6", RFC 1981, August 1986.
+
+   [TCP]      Postel, J., "Transmission Control Protocol", STD 7, RFC
+              793, September 1981.
+
+   [TCP-EXT]  Jacobson, V., Braden, R. and D. Borman, "TCP Extensions
+              for High Performance", RFC 1323, May 1992.
+
+   [UDP]      Postel, J., "User Datagram Protocol", STD 6, RFC 768,
+              August 1980.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Borman, et al.              Standards Track                     [Page 8]
+
+RFC 2675                    IPv6 Jumbograms                  August 1999
+
+
+9.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (1999).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Borman, et al.              Standards Track                     [Page 9]
+
--- a/kernel/picotcp/RFC/rfc2757.txt
+++ b/kernel/picotcp/RFC/rfc2757.txt
--- a/kernel/picotcp/RFC/rfc2760.txt
+++ b/kernel/picotcp/RFC/rfc2760.txt
--- a/kernel/picotcp/RFC/rfc2861.txt
+++ b/kernel/picotcp/RFC/rfc2861.txt
@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group                                         M. Handley
+Request for Comments: 2861                                     J. Padhye
+Category: Experimental                                          S. Floyd
+                                                                   ACIRI
+                                                               June 2000
+
+
+                    TCP Congestion Window Validation
+
+Status of this Memo
+
+   This memo defines an Experimental Protocol for the Internet
+   community.  It does not specify an Internet standard of any kind.
+   Discussion and suggestions for improvement are requested.
+   Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+Abstract
+
+   TCP's congestion window controls the number of packets a TCP flow may
+   have in the network at any time.  However, long periods when the
+   sender is idle or application-limited can lead to the invalidation of
+   the congestion window, in that the congestion window no longer
+   reflects current information about the state of the network.  This
+   document describes a simple modification to TCP's congestion control
+   algorithms to decay the congestion window cwnd after the transition
+   from a sufficiently-long application-limited period, while using the
+   slow-start threshold ssthresh to save information about the previous
+   value of the congestion window.
+
+   An invalid congestion window also results when the congestion window
+   is increased (i.e., in TCP's slow-start or congestion avoidance
+   phases) during application-limited periods, when the previous value
+   of the congestion window might never have been fully utilized.  We
+   propose that the TCP sender should not increase the congestion window
+   when the TCP sender has been application-limited (and therefore has
+   not fully used the current congestion window).  We have explored
+   these algorithms both with simulations and with experiments from an
+   implementation in FreeBSD.
+
+1.  Conventions and Acronyms
+
+   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
+   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
+   document, are to be interpreted as described in [B97].
+
+
+
+Handley, et al.               Experimental                      [Page 1]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+2. Introduction
+
+   TCP's congestion window controls the number of packets a TCP flow may
+   have in the network at any time.  The congestion window is set using
+   an Additive-Increase, Multiplicative-Decrease (AIMD) mechanism that
+   probes for available bandwidth, dynamically adapting to changing
+   network conditions.  This AIMD mechanism works well when the sender
+   continually has data to send, as is typically the case for TCP used
+   for bulk-data transfer.  In contrast, for TCP used with telnet
+   applications, the data sender often has little or no data to send,
+   and the sending rate is often determined by the rate at which data is
+   generated by the user.  With the advent of the web, including
+   developments such as TCP senders with dynamically-created data and
+   HTTP 1.1 with persistent-connection TCP, the interaction between
+   application-limited periods (when the sender sends less than is
+   allowed by the congestion or receiver windows) and network-limited
+   periods (when the sender is limited by the TCP window) becomes
+   increasingly important.  More precisely, we define a network-limited
+   period as any period when the sender is sending a full window of
+   data.
+
+   Long periods when the sender is application-limited can lead to the
+   invalidation of the congestion window.  During periods when the TCP
+   sender is network-limited, the value of the congestion window is
+   repeatedly "revalidated" by the successful transmission of a window
+   of data without loss.  When the TCP sender is network-limited, there
+   is an incoming stream of acknowledgements that "clocks out" new data,
+   giving concrete evidence of recent available bandwidth in the
+   network.  In contrast, during periods when the TCP sender is
+   application-limited, the estimate of available capacity represented
+   by the congestion window may become steadily less accurate over time.
+   In particular, capacity that had once been used by the network-
+   limited connection might now be used by other traffic.
+
+   Current TCP implementations have a range of behaviors for starting up
+   after an idle period.  Some current TCP implementations slow-start
+   after an idle period longer than the RTO estimate, as suggested in
+   [RFC2581] and in the appendix of [VJ88], while other implementations
+   don't reduce their congestion window after an idle period.  RFC 2581
+   [RFC2581] recommends the following: "a TCP SHOULD set cwnd to no more
+   than RW [the initial window] before beginning transmission if the TCP
+   has not sent data in an interval exceeding the retransmission
+   timeout."  A proposal for TCP's slow-start after idle has also been
+   discussed in [HTH98].  The issue of validation of congestion
+   information during idle periods has also been addressed in contexts
+   other than TCP and IP, for example in "Use-it or Lose-it" mechanisms
+   for ATM networks [J96,J95].
+
+
+
+
+Handley, et al.               Experimental                      [Page 2]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   To address the revalidation of the congestion window after a
+   application-limited period, we propose a simple modification to TCP's
+   congestion control algorithms to decay the congestion window cwnd
+   after the transition from a sufficiently-long application-limited
+   period (i.e., at least one roundtrip time) to a network-limited
+   period.  In particular, we propose that after an idle period, the TCP
+   sender should reduce its congestion window by half for every RTT that
+   the flow has remained idle.
+
+   When the congestion window is reduced, the slow-start threshold
+   ssthresh remains as "memory" of the recent congestion window.
+   Specifically, ssthresh is never decreased when cwnd is reduced after
+   an application-limited period; before cwnd is reduced, ssthresh is
+   set to the maximum of its current value, and half-way between the old
+   and the new values of cwnd.  This use of ssthresh allows a TCP sender
+   increasing its sending rate after an application-limited period to
+   quickly slow-start to recover most of the previous value of the
+   congestion window.  To be more precise, if ssthresh is less than 3/4
+   cwnd when the congestion window is reduced after an application-
+   limited period, then ssthresh is increased to 3/4 cwnd before the
+   reduction of the congestion window.
+
+   An invalid congestion window also results when the congestion window
+   is increased (i.e., in TCP's slow-start or congestion avoidance
+   phases) during application-limited periods, when the previous value
+   of the congestion window might never have been fully utilized.  As
+   far as we know, all current TCP implementations increase the
+   congestion window when an acknowledgement arrives, if allowed by the
+   receiver's advertised window and the slow-start or congestion
+   avoidance window increase algorithm, without checking to see if the
+   previous value of the congestion window has in fact been used.  This
+   document proposes that the window increase algorithm not be invoked
+   during application-limited periods [MSML99].  In particular, the TCP
+   sender should not increase the congestion window when the TCP sender
+   has been application-limited (and therefore has not fully used the
+   current congestion window).  This restriction prevents the congestion
+   window from growing arbitrarily large, in the absence of evidence
+   that the congestion window can be supported by the network.  From
+   [MSML99, Section 5.2]: "This restriction assures that [cwnd] only
+   grows as long as TCP actually succeeds in injecting enough data into
+   the network to test the path."
+
+   A somewhat-orthogonal problem associated with maintaining a large
+   congestion window after an application-limited period is that the
+   sender, with a sudden large amount of data to send after a quiescent
+   period, might immediately send a full congestion window of back-to-
+   back packets.  This problem of sending large bursts of packets back-
+   to-back can be effectively handled using rate-based pacing (RBP,
+
+
+
+Handley, et al.               Experimental                      [Page 3]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   [VH97]), or using a maximum burst size control [FF96].  We would
+   contend that, even with mechanisms for limiting the sending of back-
+   to-back packets or pacing packets out over the period of a roundtrip
+   time, an old congestion window that has not been fully used for some
+   time can not be trusted as an indication of the bandwidth currently
+   available for that flow.  We would contend that the mechanisms to
+   pace out packets allowed by the congestion window are largely
+   orthogonal to the algorithms used to determine the appropriate size
+   of the congestion window.
+
+3. Description
+
+   When a TCP sender has sufficient data available to fill the available
+   network capacity for that flow, cwnd and ssthresh get set to
+   appropriate values for the network conditions.  When a TCP sender
+   stops sending, the flow stops sampling the network conditions, and so
+   the value of the congestion window may become inaccurate.  We believe
+   the correct conservative behavior under these circumstances is to
+   decay the congestion window by half for every RTT that the flow
+   remains inactive.  The value of half is a very conservative figure
+   based on how quickly multiplicative decrease would have decayed the
+   window in the presence of loss.
+
+   Another possibility is that the sender may not stop sending, but may
+   become application-limited rather than network-limited, and offer
+   less data to the network than the congestion window allows to be
+   sent.  In this case the TCP flow is still sampling network
+   conditions, but is not offering sufficient traffic to be sure that
+   there is still sufficient capacity in the network for that flow to
+   send a full congestion window.  Under these circumstances we believe
+   the correct conservative behavior is for the sender to keep track of
+   the maximum amount of the congestion window used during each RTT, and
+   to decay the congestion window each RTT to midway between the current
+   cwnd value and the maximum value used.
+
+   Before the congestion window is reduced, ssthresh is set to the
+   maximum of its current value and 3/4 cwnd.  If the sender then has
+   more data to send than the decayed cwnd allows, the TCP will slow-
+   start (perform exponential increase) at least half-way back up to the
+   old value of cwnd.
+
+   The justification for this value of "3/4 cwnd" is that 3/4 cwnd is a
+   conservative estimate of the recent average value of the congestion
+   window, and the TCP should safely be able to slow-start at least up
+   to this point.  For a TCP in steady-state that has been reducing its
+   congestion window each time the congestion window reached some
+   maximum value `maxwin', the average congestion window has been 3/4
+   maxwin.  On average, when the connection becomes application-limited,
+
+
+
+Handley, et al.               Experimental                      [Page 4]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   cwnd will be 3/4 maxwin, and in this case cwnd itself represents the
+   average value of the congestion window.  However, if the connection
+   happens to become application-limited when cwnd equals maxwin, then
+   the average value of the congestion window is given by 3/4 cwnd.
+
+   An alternate possibility would be to set ssthresh to the maximum of
+   the current value of ssthresh, and the old value of cwnd, allowing
+   TCP to slow-start all of the way back up to the old value of cwnd.
+   Further experimentation can be used to evaluate these two options for
+   setting ssthresh.
+
+   For the separate issue of the increase of the congestion window in
+   response to an acknowledgement, we believe the correct behavior is
+   for the sender to increase the congestion window only if the window
+   was full when the acknowledgment arrived.
+
+   We term this set of modifications to TCP Congestion Window Validation
+   (CWV) because they are related to ensuring the congestion window is
+   always a valid reflection of the current network state as probed by
+   the connection.
+
+3.1. The basic algorithm for reducing the congestion window
+
+   A key issue in the CWV algorithm is to determine how to apply the
+   guideline of reducing the congestion window once for every roundtrip
+   time that the flow is application-limited.  We use TCP's
+   retransmission timer (RTO) as a reasonable upper bound on the
+   roundtrip time, and reduce the congestion window roughly once per
+   RTO.
+
+   This basic algorithm could be implemented in TCP as follows: When TCP
+   sends a new packet it checks to see if more than RTO seconds have
+   elapsed since the previous packet was sent.  If RTO has elapsed,
+   ssthresh is set to the maximum of 3/4 cwnd and the current value of
+   ssthresh, and then the congestion window is halved for every RTO that
+   elapsed since the previous packet was sent.  In addition, T_prev is
+   set to the current time, and W_used is reset to zero.  T_prev will be
+   used to determine the elapsed time since the sender last was network-
+   limited or had reduced cwnd after an idle period.  When the sender is
+   application-limited, W_used holds the maximum congestion window
+   actually used since the sender was last network-limited.
+
+   The mechanism for determining the number of RTOs in the most recent
+   idle period could also be implemented by using a timer that expires
+   every RTO after the last packet was sent instead of a check per
+   packet - efficiency constraints on different operating systems may
+   dictate which is more efficient to implement.
+
+
+
+
+Handley, et al.               Experimental                      [Page 5]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   After TCP sends a packet, it also checks to see if that packet filled
+   the congestion window.  If so, the sender is network-limited, and
+   sets the variable T_prev to the current TCP clock time, and the
+   variable W_used to zero.
+
+   When TCP sends a packet that does not fill the congestion window, and
+   the TCP send queue is empty, then the sender is application-limited.
+   The sender checks to see if the amount of unacknowledged data is
+   greater than W_used; if so, W_used is set to the amount of
+   unacknowledged data.  In addition TCP checks to see if the elapsed
+   time since T_prev is greater than RTO.  If so, then the TCP has not
+   just reduced its congestion window following an idle period.  The TCP
+   has been application-limited rather than network-limited for at least
+   an entire RTO interval, but for less than two RTO intervals.  In this
+   case, TCP sets ssthresh to the maximum of 3/4 cwnd and the current
+   value of ssthresh, and reduces its congestion window to
+   (cwnd+W_used)/2.  W_used is then set to zero, and T_prev is set to
+   the current time, so a further reduction will not take place until at
+   least another RTO period has elapsed.  Thus, during an application-
+   limited period the CWV algorithm reduces the congestion window once
+   per RTO.
+
+3.2.  Pseudo-code for reducing the congestion window
+
+   Initially:
+       T_last = tcpnow, T_prev = tcpnow, W_used = 0
+
+   After sending a data segment:
+       If tcpnow - T_last >= RTO
+           (The sender has been idle.)
+           ssthresh =  max(ssthresh, 3*cwnd/4)
+           For i=1  To (tcpnow - T_last)/RTO
+               win =  min(cwnd, receiver's declared max window)
+               cwnd =  max(win/2, MSS)
+           T_prev = tcpnow
+           W_used = 0
+
+       T_last = tcpnow
+
+       If window is full
+           T_prev = tcpnow
+           W_used = 0
+       Else
+           If no more data is available to send
+               W_used =  max(W_used, amount of unacknowledged data)
+               If tcpnow - T_prev >= RTO
+                   (The sender has been application-limited.)
+                   ssthresh =  max(ssthresh, 3*cwnd/4)
+
+
+
+Handley, et al.               Experimental                      [Page 6]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+                   win =  min(cwnd, receiver's declared max window)
+                   cwnd = (win + W_used)/2
+                   T_prev = tcpnow
+                   W_used = 0
+
+4. Simulations
+
+   The CWV proposal has been implemented as an option in the network
+   simulator NS [NS].  The simulations in the validation test suite for
+   CWV can be run with the command "./test-all-tcp" in the directory
+   "tcl/test".  The simulations show the use of CWV to reduce the
+   congestion window after a period when the TCP connection was
+   application-limited, and to limit the increase in the congestion
+   window when a transfer is application-limited.  As the simulations
+   illustrate, the use of ssthresh to maintain connection history is a
+   critical part of the Congestion Window Validation algorithm.  [HPF99]
+   discusses these simulations in more detail.
+
+5. Experiments
+
+   We have implemented the CWV mechanism in the TCP implementation in
+   FreeBSD 3.2.  [HPF99] discusses these experiments in more detail.
+
+   The first experiment examines the effects of the Congestion Window
+   Validation mechanisms for limiting cwnd increases during
+   application-limited periods.  The experiment used a real ssh
+   connection through a modem link emulated using Dummynet [Dummynet].
+   The link speed is 30Kb/s and the link has five packet buffers
+   available.  Today most modem banks have more buffering available than
+   this, but the more buffer-limited situation sometimes occurs with
+   older modems.  In the first half of the transfer, the user is typing
+   away over the connection.  About half way through the time, the user
+   lists a moderately large file, which causes a large burst of traffic
+   to be transmitted.
+
+   For the unmodified TCP, every returning ACK during the first part of
+   the transfer results in an increase in cwnd.  As a result, the large
+   burst of data arriving from the application to the transport layer is
+   sent as many back-to-back packets, most of which get lost and
+   subsequently retransmitted.
+
+   For the modified TCP with Congestion Window Validation, the
+   congestion window is not increased when the window is not full, and
+   has been decreased during application-limited periods closer to what
+   the user actually used.  The burst of traffic is now constrained by
+   the congestion window, resulting in a better-behaved flow with
+
+
+
+
+
+Handley, et al.               Experimental                      [Page 7]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   minimal loss.  The end result is that the transfer happens
+   approximately 30% faster than the transfer without CWV, due to
+   avoiding retransmission timeouts.
+
+   The second experiment uses a real ssh connection over a real dialup
+   ppp connection, where the modem bank has much more buffering.  For
+   the unmodified TCP, the initial burst from the large file does not
+   cause loss, but does cause the RTT to increase to approximately 5
+   seconds, where the connection becomes bounded by the receiver's
+   window.
+
+   For the modified TCP with Congestion Window Validation, the flow is
+   much better behaved, and produces no large burst of traffic.  In this
+   case the linear increase for cwnd results in a slow increase in the
+   RTT as the buffer slowly fills.
+
+   For the second experiment, both the modified and the unmodified TCP
+   finish delivering the data at precisely the same time.  This is
+   because the link has been fully utilized in both cases due to the
+   modem buffer being larger than the receiver window.  Clearly a modem
+   buffer of this size is undesirable due to its effect on the RTT of
+   competing flows, but it is necessary with current TCP implementations
+   that produce bursts similar to those shown in the top graph.
+
+6. Conclusions
+
+   This document has presented several TCP algorithms for Congestion
+   Window Validation, to be employed after an idle period or a period in
+   which the sender was application-limited, and before an increase of
+   the congestion window.  The goal of these algorithms is for TCP's
+   congestion window to reflect recent knowledge of the TCP connection
+   about the state of the network path, while at the same time keeping
+   some memory (i.e., in ssthresh) about the earlier state of the path.
+   We believe that these modifications will be of benefit to both the
+   network and to the TCP flows themselves, by preventing unnecessary
+   packet drops due to the TCP sender's failure to update its
+   information (or lack of information) about current network
+   conditions.  Future work will document and investigate the benefit
+   provided by these algorithms, using both simulations and experiments.
+   Additional future work will describe a more complex version of the
+   CWV algorithm for TCP implementations where the sender does not have
+   an accurate estimate of the TCP roundtrip time.
+
+
+
+
+
+
+
+
+
+Handley, et al.               Experimental                      [Page 8]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+7. References
+
+   [FF96]     Fall, K., and Floyd, S., Simulation-based Comparisons of
+              Tahoe, Reno, and SACK TCP, Computer Communication Review,
+              V. 26 N. 3, July 1996, pp. 5-21.  URL
+              "http://www.aciri.org/floyd/papers.html".
+
+   [HPF99]    Mark Handley, Jitendra Padhye, Sally Floyd, TCP Congestion
+              Window Validation, UMass CMPSCI Technical Report 99-77,
+              September 1999.  URL "ftp://www-
+              net.cs.umass.edu/pub/Handley99-tcpq-tr-99-77.ps.gz".
+
+   [HTH98]    Amy Hughes, Joe Touch, John Heidemann, "Issues in TCP
+              Slow-Start Restart After Idle", Work in Progress.
+
+   [J88]      Jacobson, V., Congestion Avoidance and Control, Originally
+              from Proceedings of SIGCOMM '88 (Palo Alto, CA, Aug.
+              1988), and revised in 1992.  URL "http://www-
+              nrg.ee.lbl.gov/nrg-papers.html".
+
+   [JKBFL96]  Raj Jain, Shiv Kalyanaraman, Rohit Goyal, Sonia Fahmy, and
+              Fang Lu, Comments on "Use-it or Lose-it", ATM Forum
+              Document Number:  ATM Forum/96-0178, URL
+              "http://www.netlab.ohio-
+              state.edu/~jain/atmf/af_rl5b2.htm".
+
+   [JKGFL95]  R. Jain, S. Kalyanaraman, R. Goyal, S. Fahmy, and F. Lu, A
+              Fix for Source End System Rule 5, AF-TM 95-1660, December
+              1995, URL "http://www.netlab.ohio-
+              state.edu/~jain/atmf/af_rl52.htm".
+
+   [MSML99]   Matt Mathis, Jeff Semke, Jamshid Mahdavi, and Kevin Lahey,
+              The Rate-Halving Algorithm for TCP Congestion Control,
+              June 1999.  URL
+              "http://www.psc.edu/networking/ftp/papers/draft-
+              ratehalving.txt".
+
+   [NS]       NS, the UCB/LBNL/VINT Network Simulator.  URL
+              "http://www-mash.cs.berkeley.edu/ns/".
+
+   [RFC2581]  Allman, M., Paxson, V. and W. Stevens, TCP Congestion
+              Control, RFC 2581, April 1999.
+
+   [VH97]     Vikram Visweswaraiah and John Heidemann. Improving Restart
+              of Idle TCP Connections, Technical Report 97-661,
+              University of Southern California, November, 1997.
+
+
+
+
+
+Handley, et al.               Experimental                      [Page 9]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   [Dummynet] Luigi Rizzo, "Dummynet and Forward Error Correction",
+              Freenix 98, June 1998, New Orleans.  URL
+              "http://info.iet.unipi.it/~luigi/ip_dummynet/".
+
+8. Security Considerations
+
+   General security considerations concerning TCP congestion control are
+   discussed in RFC 2581.  This document describes a algorithm for one
+   aspect of those congestion control procedures, and so the
+   considerations described in RFC 2581 apply to this algorithm also.
+   There are no known additional security concerns for this specific
+   algorithm.
+
+9. Authors' Addresses
+
+   Mark Handley
+   AT&T Center for Internet Research at ICSI (ACIRI)
+
+   Phone: +1 510 666 2946
+   EMail: mjh@aciri.org
+   URL: http://www.aciri.org/mjh/
+
+
+   Jitendra Padhye
+   AT&T Center for Internet Research at ICSI (ACIRI)
+
+   Phone: +1 510 666 2887
+   EMail: padhye@aciri.org
+   URL: http://www-net.cs.umass.edu/~jitu/
+
+
+   Sally Floyd
+   AT&T Center for Internet Research at ICSI (ACIRI)
+
+   Phone: +1 510 666 2989
+   EMail: floyd@aciri.org
+   URL:  http://www.aciri.org/floyd/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Handley, et al.               Experimental                     [Page 10]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+10. Full Copyright Statement
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Handley, et al.               Experimental                     [Page 11]
+
--- a/kernel/picotcp/RFC/rfc2873.txt
+++ b/kernel/picotcp/RFC/rfc2873.txt
@ -0,0 +1,451 @@
+
+
+
+
+
+
+Network Working Group                                            X. Xiao
+Request for Comments: 2873                               Global Crossing
+Category: Standards Track                                      A. Hannan
+                                                                    iVMG
+                                                               V. Paxson
+                                                              ACIRI/ICSI
+                                                               E. Crabbe
+                                                   Exodus Communications
+                                                               June 2000
+
+
+              TCP Processing of the IPv4 Precedence Field
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+Abstract
+
+   This memo describes a conflict between TCP [RFC793] and DiffServ
+   [RFC2475] on the use of the three leftmost bits in the TOS octet of
+   an IPv4 header [RFC791]. In a network that contains DiffServ-capable
+   nodes, such a conflict can cause failures in establishing TCP
+   connections or can cause some established TCP connections to be reset
+   undesirably. This memo proposes a modification to TCP for resolving
+   the conflict.
+
+   Because the IPv6 [RFC2460] traffic class octet does not have any
+   defined meaning except what is defined in RFC 2474, and in particular
+   does not define precedence or security parameter bits, there is no
+   conflict between TCP and DiffServ on the use of any bits in the IPv6
+   traffic class octet.
+
+1. Introduction
+
+   In TCP, each connection has a set of states associated with it. Such
+   states are reflected by a set of variables stored in the TCP Control
+   Block (TCB) of both ends. Such variables may include the local and
+   remote socket number, precedence of the connection, security level
+
+
+
+
+Xiao, et al.                Standards Track                     [Page 1]
+
+RFC 2873           TCP and the IPv4 Precedence Field           June 2000
+
+
+   and compartment, etc.  Both ends must agree on the setting of the
+   precedence and security parameters in order to establish a connection
+   and keep it open.
+
+   There is no field in the TCP header that indicates the precedence of
+   a segment. Instead, the precedence field in the header of the IP
+   packet is used as the indication.  The security level and compartment
+   are likewise carried in the IP header, but as IP options rather than
+   a fixed header field.  Because of this difference, the problem with
+   precedence discussed in this memo does not apply to them.
+
+   TCP requires that the precedence (and security parameters) of a
+   connection must remain unchanged during the lifetime of the
+   connection. Therefore, for an established TCP connection with
+   precedence, the receipt of a segment with different precedence
+   indicates an error. The connection must be reset [RFC793, pp. 36, 37,
+   40, 66, 67, 71].
+
+   With the advent of DiffServ, intermediate nodes may modify the
+   Differentiated Services Codepoint (DSCP) [RFC2474] of the IP header
+   to indicate the desired Per-hop Behavior (PHB) [RFC2475, RFC2597,
+   RFC2598]. The DSCP includes the three bits formerly known as the
+   precedence field.  Because any modification to those three bits will
+   be considered illegal by endpoints that are precedence-aware, they
+   may cause failures in establishing connections, or may cause
+   established connections to be reset.
+
+2. Terminology
+
+   Segment: the unit of data that TCP sends to IP
+
+   Precedence Field: the three leftmost bits in the TOS octet of an IPv4
+   header. Note that in DiffServ, these three bits may or may not be
+   used to denote the precedence of the IP packet. There is no
+   precedence field in the traffic class octet in IPv6.
+
+   TOS Field: bits 3-6 in the TOS octet of IPv4 header [RFC 1349].
+
+   MBZ field: Must Be Zero
+
+   The structure of the TOS octet is depicted below:
+
+                   0     1     2     3     4     5     6     7
+                +-----+-----+-----+-----+-----+-----+-----+-----+
+                |   PRECEDENCE    |          TOS          | MBZ |
+                +-----+-----+-----+-----+-----+-----+-----+-----+
+
+
+
+
+
+Xiao, et al.                Standards Track                     [Page 2]
+
+RFC 2873           TCP and the IPv4 Precedence Field           June 2000
+
+
+   DS Field: the TOS octet of an IPv4 header is renamed the
+   Differentiated Services (DS) Field by DiffServ.
+
+   The structure of the DS field is depicted below:
+
+                  0   1   2   3   4   5   6   7
+                +---+---+---+---+---+---+---+---+
+                |         DSCP          |  CU   |
+                +---+---+---+---+---+---+---+---+
+
+   DSCP: Differentiated Service Code Point, the leftmost 6 bits in the
+   DS field.
+
+   CU:   currently unused.
+
+   Per-hop Behavior (PHB): a description of the externally observable
+   forwarding treatment applied at a differentiated services-compliant
+   node to a behavior aggregate.
+
+3. Problem Description
+
+   The manipulation of the DSCP to achieve the desired PHB by DiffServ-
+   capable nodes may conflict with TCP's use of the precedence field.
+   This conflict can potentially cause problems for TCP implementations
+   that conform to RFC 793.  First, page 36 of RFC 793 states:
+
+       If the connection is in any non-synchronized state (LISTEN, SYN-
+       SENT, SYN-RECEIVED), and the incoming segment acknowledges
+       something not yet sent (the segment carries an unacceptable ACK),
+       or if an incoming segment has a security level or compartment
+       which does not exactly match the level and compartment requested
+       for the connection, a reset is sent. If our SYN has not been
+       acknowledged and the precedence level of the incoming segment is
+       higher than the precedence level requested then either raise the
+       local precedence level (if allowed by the user and the system) or
+       send a reset; or if the precedence level of the incoming segment
+       is lower than the precedence level requested then continue as if
+       the precedence matched exactly (if the remote TCP cannot raise
+       the precedence level to match ours this will be detected in the
+       next segment it sends, and the connection will be terminated
+       then). If our SYN has been acknowledged (perhaps in this incoming
+       segment) the precedence level of the incoming segment must match
+       the local precedence level exactly, if it does not a reset must
+       be sent.
+
+   This leads to Problem #1:  For a precedence-aware TCP module, if
+   during TCP's synchronization process, the precedence fields of the
+   SYN and/or ACK packets are modified by the intermediate nodes,
+
+
+
+Xiao, et al.                Standards Track                     [Page 3]
+
+RFC 2873           TCP and the IPv4 Precedence Field           June 2000
+
+
+   resulting in the received ACK packet having a different precedence
+   from the precedence picked by this TCP module, the TCP connection
+   cannot be established, even if both modules actually agree on an
+   identical precedence for the connection.
+
+   Then, on page 37, RFC 793 states:
+
+       If the connection is in a synchronized state (ESTABLISHED, FIN-
+       WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT),
+       security level, or compartment, or precedence which does not
+       exactly match the level, and compartment, and precedence
+       requested for the connection, a reset is sent and connection goes
+       to the CLOSED state.
+
+   This leads to Problem #2:  For a precedence-aware TCP module, if the
+   precedence field of a received segment from an established TCP
+   connection has been changed en route by the intermediate nodes so as
+   to be different from the precedence specified during the connection
+   setup, the TCP connection will be reset.
+
+   Each of problems #1 and #2 has a mirroring problem. They cause TCP
+   connections that must be reset according to RFC 793 not to be reset.
+
+   Problem #3:  A TCP connection may be established between two TCP
+   modules that pick different precedence, because the precedence fields
+   of the SYN and ACK packets are modified by intermediate nodes,
+   resulting in both modules thinking that they are in agreement for the
+   precedence of the connection.
+
+   Problem #4:  A TCP connection has been established normally by two
+   TCP modules that pick the same precedence. But in the middle of the
+   data transmission, one of the TCP modules changes the precedence of
+   its segments. According to RFC 793, the TCP connection must be reset.
+   In a DiffServ-capable environment, if the precedence of the segments
+   is altered by intermediate nodes such that it retains the expected
+   value when arriving at the other TCP module, the connection will not
+   be reset.
+
+4. Proposed Modification to TCP
+
+   The proposed modification to TCP is that TCP must ignore the
+   precedence of all received segments. More specifically:
+
+   (1) In TCP's synchronization process, the TCP modules at both ends
+   must ignore the precedence fields of the SYN and SYN ACK packets. The
+   TCP connection will be established if all the conditions specified by
+   RFC 793 are satisfied except the precedence of the connection.
+
+
+
+
+Xiao, et al.                Standards Track                     [Page 4]
+
+RFC 2873           TCP and the IPv4 Precedence Field           June 2000
+
+
+   (2) After a connection is established, each end sends segments with
+   its desired precedence. The precedence picked by one end of the TCP
+   connection may be the same or may be different from the precedence
+   picked by the other end (because precedence is ignored during
+   connection setup time). The precedence fields may be changed by the
+   intermediate nodes too. In either case, the precedence of the
+   received packets will be ignored by the other end. The TCP connection
+   will not be reset in either case.
+
+   Problems #1 and #2 are solved by this proposed modification. Problems
+   #3 and #4 become non-issues because TCP must ignore the precedence.
+   In a DiffServ-capable environment, the two cases described in
+   problems #3 and #4 should be allowed.
+
+5. Security Considerations
+
+   A TCP implementation that terminates a connection upon receipt of any
+   segment with an incorrect precedence field, regardless of the
+   correctness of the sequence numbers in the segment's header, poses a
+   serious denial-of-service threat, as all an attacker must do to
+   terminate a connection is guess the port numbers and then send two
+   segments with different precedence values; one of them is certain to
+   terminate the connection.  Accordingly, the change to TCP processing
+   proposed in this memo would yield a significant gain in terms of that
+   TCP implementation's resilience.
+
+   On the other hand, the stricter processing rules of RFC 793 in
+   principle make TCP spoofing attacks more difficult, as the attacker
+   must not only guess the victim TCP's initial sequence number, but
+   also its precedence setting.
+
+   Finally, the security issues of each PHB group are addressed in the
+   PHB group's specification [RFC2597, RFC2598].
+
+6. Acknowledgments
+
+   Our thanks to Al Smith for his careful review and comments.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Xiao, et al.                Standards Track                     [Page 5]
+
+RFC 2873           TCP and the IPv4 Precedence Field           June 2000
+
+
+7. References
+
+   [RFC791]  Postel, J., "Internet Protocol", STD 5, RFC 791, September
+             1981.
+
+   [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
+             793, September 1981.
+
+   [RFC1349] Almquist, P., "Type of Service in the Internet Protocol
+             Suite", RFC 1349, July 1992.
+
+   [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
+             (IPv6) Specification", RFC 2460, December 1998.
+
+   [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition
+             of the Differentiated Services Field (DS Field) in the IPv4
+             and IPv6 Headers", RFC 2474, December 1998.
+
+   [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and
+             W.  Weiss, "An Architecture for Differentiated Services",
+             RFC 2475, December 1998.
+
+   [RFC2597] Heinanen, J., Baker, F., Weiss, W. and J. Wroclawski,
+             "Assured Forwarding PHB Group", RFC 2587, June 1999.
+
+   [RFC2598] Jacobson, V., Nichols, K. and K. Poduri, "An Expedited
+             Forwarding PHB", RFC 2598, June 1999.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Xiao, et al.                Standards Track                     [Page 6]
+
+RFC 2873           TCP and the IPv4 Precedence Field           June 2000
+
+
+8. Authors' Addresses
+
+   Xipeng Xiao
+   Global Crossing
+   141 Caspian Court
+   Sunnyvale, CA 94089
+   USA
+
+   Phone: +1 408-543-4801
+   EMail: xipeng@gblx.net
+
+
+   Alan Hannan
+   iVMG, Inc.
+   112 Falkirk Court
+   Sunnyvale, CA 94087
+   USA
+
+   Phone: +1 408-749-7084
+   EMail: alan@ivmg.net
+
+
+   Edward Crabbe
+   Exodus Communications
+   2650 San Tomas Expressway
+   Santa Clara, CA 95051
+   USA
+
+   Phone: +1 408-346-1544
+   EMail: edc@explosive.net
+
+
+   Vern Paxson
+   ACIRI/ICSI
+   1947 Center Street
+   Suite 600
+   Berkeley, CA 94704-1198
+   USA
+
+   Phone: +1 510-666-2882
+   EMail: vern@aciri.org
+
+
+
+
+
+
+
+
+
+
+Xiao, et al.                Standards Track                     [Page 7]
+
+RFC 2873           TCP and the IPv4 Precedence Field           June 2000
+
+
+9.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Xiao, et al.                Standards Track                     [Page 8]
+
--- a/kernel/picotcp/RFC/rfc2883.txt
+++ b/kernel/picotcp/RFC/rfc2883.txt
@ -0,0 +1,955 @@
+
+
+
+
+
+
+Network Working Group                                           S. Floyd
+Request for Comments: 2883                                         ACIRI
+Category: Standards Track                                     J. Mahdavi
+                                                                  Novell
+                                                               M. Mathis
+                                        Pittsburgh Supercomputing Center
+                                                             M. Podolsky
+                                                             UC Berkeley
+                                                               July 2000
+
+
+  An Extension to the Selective Acknowledgement (SACK) Option for TCP
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+Abstract
+
+   This note defines an extension of the Selective Acknowledgement
+   (SACK) Option [RFC2018] for TCP.  RFC 2018 specified the use of the
+   SACK option for acknowledging out-of-sequence data not covered by
+   TCP's cumulative acknowledgement field.  This note extends RFC 2018
+   by specifying the use of the SACK option for acknowledging duplicate
+   packets.  This note suggests that when duplicate packets are
+   received, the first block of the SACK option field can be used to
+   report the sequence numbers of the packet that triggered the
+   acknowledgement.  This extension to the SACK option allows the TCP
+   sender to infer the order of packets received at the receiver,
+   allowing the sender to infer when it has unnecessarily retransmitted
+   a packet.  A TCP sender could then use this information for more
+   robust operation in an environment of reordered packets [BPS99], ACK
+   loss, packet replication, and/or early retransmit timeouts.
+
+1.  Conventions and Acronyms
+
+   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
+   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
+   document, are to be interpreted as described in [B97].
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 1]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+2. Introduction
+
+   The Selective Acknowledgement (SACK) option defined in RFC 2018 is
+   used by the TCP data receiver to acknowledge non-contiguous blocks of
+   data not covered by the Cumulative Acknowledgement field.  However,
+   RFC 2018 does not specify the use of the SACK option when duplicate
+   segments are received.  This note specifies the use of the SACK
+   option when acknowledging the receipt of a duplicate packet [F99].
+   We use the term D-SACK (for duplicate-SACK) to refer to a SACK block
+   that reports a duplicate segment.
+
+   This document does not make any changes to TCP's use of the
+   cumulative acknowledgement field, or to the TCP receiver's decision
+   of *when* to send an acknowledgement packet.  This document only
+   concerns the contents of the SACK option when an acknowledgement is
+   sent.
+
+   This extension is compatible with current implementations of the SACK
+   option in TCP.  That is, if one of the TCP end-nodes does not
+   implement this D-SACK extension and the other TCP end-node does, we
+   believe that this use of the D-SACK extension by one of the end nodes
+   will not introduce problems.
+
+   The use of D-SACK does not require separate negotiation between a TCP
+   sender and receiver that have already negotiated SACK capability.
+   The absence of separate negotiation for D-SACK means that the TCP
+   receiver could send D-SACK blocks when the TCP sender does not
+   understand this extension to SACK.  In this case, the TCP sender will
+   simply discard any D-SACK blocks, and process the other SACK blocks
+   in the SACK option field as it normally would.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 2]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+3. The Sack Option Format as defined in RFC 2018
+
+   The SACK option as defined in RFC 2018 is as follows:
+
+                            +--------+--------+
+                            | Kind=5 | Length |
+          +--------+--------+--------+--------+
+          |      Left Edge of 1st Block       |
+          +--------+--------+--------+--------+
+          |      Right Edge of 1st Block      |
+          +--------+--------+--------+--------+
+          |                                   |
+          /            . . .                  /
+          |                                   |
+          +--------+--------+--------+--------+
+          |      Left Edge of nth Block       |
+          +--------+--------+--------+--------+
+          |      Right Edge of nth Block      |
+          +--------+--------+--------+--------+
+
+   The Selective Acknowledgement (SACK) option in the TCP header
+   contains a number of SACK blocks, where each block specifies the left
+   and right edge of a block of data received at the TCP receiver.  In
+   particular, a block represents a contiguous sequence space of data
+   received and queued at the receiver, where the "left edge" of the
+   block is the first sequence number of the block, and the "right edge"
+   is the sequence number immediately following the last sequence number
+   of the block.
+
+   RFC 2018 implies that the first SACK block specify the segment that
+   triggered the acknowledgement.  From RFC 2018, when the data receiver
+   chooses to send a SACK option, "the first SACK block ... MUST specify
+   the contiguous block of data containing the segment which triggered
+   this ACK, unless that segment advanced the Acknowledgment Number
+   field in the header."
+
+   However, RFC 2018 does not address the use of the SACK option when
+   acknowledging a duplicate segment.  For example, RFC 2018 specifies
+   that "each block represents received bytes of data that are
+   contiguous and isolated".  RFC 2018 further specifies that "if sent
+   at all, SACK options SHOULD be included in all ACKs which do not ACK
+   the highest sequence number in the data receiver's queue."  RFC 2018
+   does not specify the use of the SACK option when a duplicate segment
+   is received, and the cumulative acknowledgement field in the ACK
+   acknowledges all of the data in the data receiver's queue.
+
+
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 3]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+4. Use of the SACK option for reporting a duplicate segment
+
+   This section specifies the use of SACK blocks when the SACK option is
+   used in reporting a duplicate segment.  When D-SACK is used, the
+   first block of the SACK option should be a D-SACK block specifying
+   the sequence numbers for the duplicate segment that triggers the
+   acknowledgement.  If the duplicate segment is part of a larger block
+   of non-contiguous data in the receiver's data queue, then the
+   following SACK block should be used to specify this larger block.
+   Additional SACK blocks can be used to specify additional non-
+   contiguous blocks of data, as specified in RFC 2018.
+
+   The guidelines for reporting duplicate segments are summarized below:
+
+   (1) A D-SACK block is only used to report a duplicate contiguous
+   sequence of data received by the receiver in the most recent packet.
+
+   (2) Each duplicate contiguous sequence of data received is reported
+   in at most one D-SACK block.  (I.e., the receiver sends two identical
+   D-SACK blocks in subsequent packets only if the receiver receives two
+   duplicate segments.)
+
+   (3) The left edge of the D-SACK block specifies the first sequence
+   number of the duplicate contiguous sequence, and the right edge of
+   the D-SACK block specifies the sequence number immediately following
+   the last sequence in the duplicate contiguous sequence.
+
+   (4) If the D-SACK block reports a duplicate contiguous sequence from
+   a (possibly larger) block of data in the receiver's data queue above
+   the cumulative acknowledgement, then the second SACK block in that
+   SACK option should specify that (possibly larger) block of data.
+
+   (5) Following the SACK blocks described above for reporting duplicate
+   segments, additional SACK blocks can be used for reporting additional
+   blocks of data, as specified in RFC 2018.
+
+   Note that because each duplicate segment is reported in only one ACK
+   packet, information about that duplicate segment will be lost if that
+   ACK packet is dropped in the network.
+
+4.1  Reporting Full Duplicate Segments
+
+   We illustrate these guidelines with three examples.  In each example,
+   we assume that the data receiver has first received eight segments of
+   500 bytes each, and has sent an acknowledgement with the cumulative
+   acknowledgement field set to 4000 (assuming the first sequence number
+   is zero).  The D-SACK block is underlined in each example.
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 4]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+4.1.1.  Example 1: Reporting a duplicate segment.
+
+   Because several ACK packets are lost, the data sender retransmits
+   packet 3000-3499, and the data receiver subsequently receives a
+   duplicate segment with sequence numbers 3000-3499.  The receiver
+   sends an acknowledgement with the cumulative acknowledgement field
+   set to 4000, and the first, D-SACK block specifying sequence numbers
+   3000-3500.
+
+        Transmitted    Received    ACK Sent
+        Segment        Segment     (Including SACK Blocks)
+
+        3000-3499      3000-3499   3500 (ACK dropped)
+        3500-3999      3500-3999   4000 (ACK dropped)
+        3000-3499      3000-3499   4000, SACK=3000-3500
+                                              ---------
+4.1.2.  Example 2:  Reporting an out-of-order segment and a duplicate
+        segment.
+
+   Following a lost data packet, the receiver receives an out-of-order
+   data segment, which triggers the SACK option as specified in  RFC
+   2018.  Because of several lost ACK packets, the sender then
+   retransmits a data packet.  The receiver receives the duplicate
+   packet, and reports it in the first, D-SACK block:
+
+        Transmitted    Received    ACK Sent
+        Segment        Segment     (Including SACK Blocks)
+
+        3000-3499      3000-3499   3500 (ACK dropped)
+        3500-3999      3500-3999   4000 (ACK dropped)
+        4000-4499      (data packet dropped)
+        4500-4999      4500-4999   4000, SACK=4500-5000 (ACK dropped)
+        3000-3499      3000-3499   4000, SACK=3000-3500, 4500-5000
+                                                 ---------
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 5]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+4.1.3.  Example 3:  Reporting a duplicate of an out-of-order segment.
+
+   Because of a lost data packet, the receiver receives two out-of-order
+   segments.  The receiver next receives a duplicate segment for one of
+   these out-of-order segments:
+
+        Transmitted    Received    ACK Sent
+        Segment        Segment     (Including SACK Blocks)
+
+        3500-3999      3500-3999   4000
+        4000-4499      (data packet dropped)
+        4500-4999      4500-4999   4000, SACK=4500-5000
+        5000-5499      5000-5499   4000, SACK=4500-5500
+                       (duplicated packet)
+                       5000-5499   4000, SACK=5000-5500, 4500-5500
+                                              ---------
+4.2.  Reporting Partial Duplicate Segments
+
+   It may be possible that a sender transmits a packet that includes one
+   or more duplicate sub-segments--that is, only part but not all of the
+   transmitted packet has already arrived at the receiver.  This can
+   occur when the size of the sender's transmitted segments increases,
+   which can occur when the PMTU increases in the middle of a TCP
+   session, for example.  The guidelines in Section 4 above apply to
+   reporting partial as well as full duplicate segments.  This section
+   gives examples of these guidelines when reporting partial duplicate
+   segments.
+
+   When the SACK option is used for reporting partial duplicate
+   segments, the first D-SACK block reports the first duplicate sub-
+   segment.  If the data packet being acknowledged contains multiple
+   partial duplicate sub-segments, then only the first such duplicate
+   sub-segment is reported in the SACK option.  We illustrate this with
+   the examples below.
+
+4.2.1.  Example 4:  Reporting a single duplicate subsegment.
+
+   The sender increases the packet size from 500 bytes to 1000 bytes.
+   The receiver subsequently receives a 1000-byte packet containing one
+   500-byte subsegment that has already been received and one which has
+   not.  The receiver reports only the already received subsegment using
+   a single D-SACK block.
+
+
+
+
+
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 6]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+        Transmitted    Received    ACK Sent
+        Segment        Segment     (Including SACK Blocks)
+
+        500-999        500-999     1000
+        1000-1499      (delayed)
+        1500-1999      (data packet dropped)
+        2000-2499      2000-2499   1000, SACK=2000-2500
+        1000-2000      1000-1499   1500, SACK=2000-2500
+                       1000-2000   2500, SACK=1000-1500
+                                              ---------
+
+4.2.2.  Example 5:  Two non-contiguous duplicate subsegments covered by
+        the cumulative acknowledgement.
+
+   After the sender increases its packet size from 500 bytes to 1500
+   bytes, the receiver receives a packet containing two non-contiguous
+   duplicate 500-byte subsegments which are less than the cumulative
+   acknowledgement field.  The receiver reports the first such duplicate
+   segment in a single D-SACK block.
+
+         Transmitted    Received    ACK Sent
+         Segment        Segment     (Including SACK Blocks)
+
+         500-999        500-999     1000
+         1000-1499      (delayed)
+         1500-1999      (data packet dropped)
+         2000-2499      (delayed)
+         2500-2999      (data packet dropped)
+         3000-3499      3000-3499   1000, SACK=3000-3500
+         1000-2499      1000-1499   1500, SACK=3000-3500
+                        2000-2499   1500, SACK=2000-2500, 3000-3500
+                        1000-2499   2500, SACK=1000-1500, 3000-3500
+                                               ---------
+
+4.2.3.  Example 6:  Two non-contiguous duplicate subsegments not covered
+        by the cumulative acknowledgement.
+
+   This example is similar to Example 5, except that after the sender
+   increases the packet size, the receiver receives a packet containing
+   two non-contiguous duplicate subsegments which are above the
+   cumulative acknowledgement field, rather than below.  The first, D-
+   SACK block reports the first duplicate subsegment, and the second,
+   SACK block reports the larger block of non-contiguous data that it
+   belongs to.
+
+
+
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 7]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+         Transmitted    Received    ACK Sent
+         Segment        Segment     (Including SACK Blocks)
+
+         500-999        500-999     1000
+         1000-1499      (data packet dropped)
+         1500-1999      (delayed)
+         2000-2499      (data packet dropped)
+         2500-2999      (delayed)
+         3000-3499      (data packet dropped)
+         3500-3999      3500-3999   1000, SACK=3500-4000
+         1000-1499      (data packet dropped)
+         1500-2999      1500-1999   1000, SACK=1500-2000, 3500-4000
+                        2000-2499   1000, SACK=2000-2500, 1500-2000,
+                                            3500-4000
+                        1500-2999   1000, SACK=1500-2000, 1500-3000,
+                                               ---------
+                                            3500-4000
+
+4.3.  Interaction Between D-SACK and PAWS
+
+   RFC 1323 [RFC1323] specifies an algorithm for Protection Against
+   Wrapped Sequence Numbers (PAWS).  PAWS gives a method for
+   distinguishing between sequence numbers for new data, and sequence
+   numbers from a previous cycle through the sequence number space.
+   Duplicate segments might be detected by PAWS as belonging to a
+   previous cycle through the sequence number space.
+
+   RFC 1323 specifies that for such packets, the receiver should do the
+   following:
+
+      Send an acknowledgement in reply as specified in RFC 793 page 69,
+      and drop the segment.
+
+   Since PAWS still requires sending an ACK, there is no harmful
+   interaction between PAWS and the use of D-SACK.  The D-SACK block can
+   be included in the SACK option of the ACK, as outlined in Section 4,
+   independently of the use of PAWS by the TCP receiver, and
+   independently of the determination by PAWS of the validity or
+   invalidity of the data segment.
+
+   TCP senders receiving D-SACK blocks should be aware that a segment
+   reported as a duplicate segment could possibly have been from a prior
+   cycle through the sequence number space.  This is independent of the
+   use of PAWS by the TCP data receiver.  We do not anticipate that this
+   will present significant problems for senders using D-SACK
+   information.
+
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 8]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+5. Detection of Duplicate Packets
+
+   This extension to the SACK option enables the receiver to accurately
+   report the reception of duplicate data.  Because each receipt of a
+   duplicate packet is reported in only one ACK packet, the loss of a
+   single ACK can prevent this information from reaching the sender.  In
+   addition, we note that the sender can not necessarily trust the
+   receiver to send it accurate information [SCWA99].
+
+   In order for the sender to check that the first (D)SACK block of an
+   acknowledgement in fact acknowledges duplicate data, the sender
+   should compare the sequence space in the first SACK block to the
+   cumulative ACK which is carried IN THE SAME PACKET.  If the SACK
+   sequence space is less than this cumulative ACK, it is an indication
+   that the segment identified by the SACK block has been received more
+   than once by the receiver.  An implementation MUST NOT compare the
+   sequence space in the SACK block to the TCP state variable snd.una
+   (which carries the total cumulative ACK), as this may result in the
+   wrong conclusion if ACK packets are reordered.
+
+   If the sequence space in the first SACK block is greater than the
+   cumulative ACK, then the sender next compares the sequence space in
+   the first SACK block with the sequence space in the second SACK
+   block, if there is one.  This comparison can determine if the first
+   SACK block is reporting duplicate data that lies above the cumulative
+   ACK.
+
+   TCP implementations which follow RFC 2581 [RFC2581] could see
+   duplicate packets in each of the following four situations.  This
+   document does not specify what action a TCP implementation should
+   take in these cases.  The extension to the SACK option simply enables
+   the sender to detect each of these cases.  Note that these four
+   conditions are not an exhaustive list of possible cases for duplicate
+   packets, but are representative of the most common/likely cases.
+   Subsequent documents will describe experimental proposals for sender
+   responses to the detection of unnecessary retransmits due to
+   reordering, lost ACKS, or early retransmit timeouts.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd, et al.               Standards Track                     [Page 9]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+5.1.  Replication by the network
+
+   If a packet is replicated in the network, this extension to the SACK
+   option can identify this.  For example:
+
+             Transmitted    Received    ACK Sent
+             Segment        Segment     (Including SACK Blocks)
+
+             500-999        500-999     1000
+             1000-1499      1000-1499   1500
+                            (replicated)
+                            1000-1499   1500, SACK=1000-1500
+                                                   ---------
+
+   In this case, the second packet was replicated in the network.  An
+   ACK containing a D-SACK block which is lower than its ACK field and
+   is not identical to a previously retransmitted segment is indicative
+   of a replication by the network.
+
+   WITHOUT D-SACK:
+
+   If D-SACK was not used and the last ACK was piggybacked on a data
+   packet, the sender would not know that a packet had been replicated
+   in the network.  If D-SACK was not used and neither of the last two
+   ACKs was piggybacked on a data packet, then the sender could
+   reasonably infer that either some data packet *or* the final ACK
+   packet had been replicated in the network.  The receipt of the D-SACK
+   packet gives the sender positive knowledge that this data packet was
+   replicated in the network (assuming that the receiver is not lying).
+
+   RESEARCH ISSUES:
+
+   The current SACK option already allows the sender to identify
+   duplicate ACKs that do not acknowledge new data, but the D-SACK
+   option gives the sender a stronger basis for inferring that a
+   duplicate ACK does not acknowledge new data.  The knowledge that a
+   duplicate ACK does not acknowledge new data allows the sender to
+   refrain from using that duplicate ACKs to infer packet loss (e.g.,
+   Fast Retransmit) or to send more data (e.g., Fast Recovery).
+
+5.2.  False retransmit due to reordering
+
+   If packets are reordered in the network such that a segment arrives
+   more than 3 packets out of order, TCP's Fast Retransmit algorithm
+   will retransmit the out-of-order packet.  An example of this is shown
+   below:
+
+
+
+
+
+Floyd, et al.               Standards Track                    [Page 10]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+             Transmitted    Received    ACK Sent
+             Segment        Segment     (Including SACK Blocks)
+
+             500-999        500-999     1000
+             1000-1499      (delayed)
+             1500-1999      1500-1999   1000, SACK=1500-2000
+             2000-2499      2000-2499   1000, SACK=1500-2500
+             2500-2999      2500-2999   1000, SACK=1500-3000
+             1000-1499      1000-1499   3000
+                            1000-1499   3000, SACK=1000-1500
+                                                   ---------
+
+   In this case, an ACK containing a SACK block which is lower than its
+   ACK field and identical to a previously retransmitted segment is
+   indicative of a significant reordering followed by a false
+   (unnecessary) retransmission.
+
+   WITHOUT D-SACK:
+
+   With the use of D-SACK illustrated above, the sender knows that
+   either the first transmission of segment 1000-1499 was delayed in the
+   network, or the first transmission of segment 1000-1499 was dropped
+   and the second transmission of segment 1000-1499 was duplicated.
+   Given that no other segments have been duplicated in the network,
+   this second option can be considered unlikely.
+
+   Without the use of D-SACK, the sender would only know that either the
+   first transmission of segment 1000-1499 was delayed in the network,
+   or that either one of the data segments or the final ACK was
+   duplicated in the network.  Thus, the use of D-SACK allows the sender
+   to more reliably infer that the first transmission of segment
+   1000-1499 was not dropped.
+
+   [AP99], [L99], and [LK00] note that the sender could unambiguously
+   detect an unnecessary retransmit with the use of the timestamp
+   option.  [LK00] proposes a timestamp-based algorithm that minimizes
+   the penalty for an unnecessary retransmit.  [AP99] proposes a
+   heuristic for detecting an unnecessary retransmit in an environment
+   with neither timestamps nor SACK.  [L99] also proposes a two-bit
+   field as an alternate to the timestamp option for unambiguously
+   marking the first three retransmissions of a packet.  A similar idea
+   was proposed in [ISO8073].
+
+   RESEARCH ISSUES:
+
+   The use of D-SACK allows the sender to detect some cases (e.g., when
+   no ACK packets have been lost) when a a Fast Retransmit was due to
+   packet reordering instead of packet loss.  This allows the TCP sender
+
+
+
+Floyd, et al.               Standards Track                    [Page 11]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+   to adjust the duplicate acknowledgment threshold, to prevent such
+   unnecessary Fast Retransmits in the future.  Coupled with this, when
+   the sender determines, after the fact, that it has made an
+   unnecessary window reduction, the sender has the option of "undoing"
+   that reduction in the congestion window by resetting ssthresh to the
+   value of the old congestion window, and slow-starting until the
+   congestion window has reached that point.
+
+   Any proposal for "undoing" a reduction in the congestion window would
+   have to address the possibility that the TCP receiver could be lying
+   in its reports of received packets [SCWA99].
+
+5.3.  Retransmit Timeout Due to ACK Loss
+
+   If an entire window of ACKs is lost, a timeout will result.  An
+   example of this is given below:
+
+             Transmitted    Received    ACK Sent
+             Segment        Segment     (Including SACK Blocks)
+
+             500-999        500-999     1000 (ACK dropped)
+             1000-1499      1000-1499   1500 (ACK dropped)
+             1500-1999      1500-1999   2000 (ACK dropped)
+             2000-2499      2000-2499   2500 (ACK dropped)
+             (timeout)
+             500-999        500-999     2500, SACK=500-1000
+                                                   --------
+
+   In this case, all of the ACKs are dropped, resulting in a timeout.
+   This condition can be identified because the first ACK received
+   following the timeout carries a D-SACK block indicating duplicate
+   data was received.
+
+   WITHOUT D-SACK:
+
+   Without the use of D-SACK, the sender in this case would be unable to
+   decide that no data packets has been dropped.
+
+   RESEARCH ISSUES:
+
+   For a TCP that implements some form of ACK congestion control
+   [BPK97], this ability to distinguish between dropped data packets and
+   dropped ACK packets would be particularly useful.  In this case, the
+   connection could implement congestion control for the return (ACK)
+   path independently from the congestion control on the forward (data)
+   path.
+
+
+
+
+
+Floyd, et al.               Standards Track                    [Page 12]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+5.4.  Early Retransmit Timeout
+
+   If the sender's RTO is too short, an early retransmission timeout can
+   occur when no packets have in fact been dropped in the network.  An
+   example of this is given below:
+
+             Transmitted    Received    ACK Sent
+             Segment        Segment     (Including SACK Blocks)
+
+             500-999        (delayed)
+             1000-1499      (delayed)
+             1500-1999      (delayed)
+             2000-2499      (delayed)
+             (timeout)
+             500-999        (delayed)
+                            500-999     1000
+             1000-1499      (delayed)
+                            1000-1499   1500
+             ...
+                            1500-1999   2000
+                            2000-2499   2500
+                            500-999     2500, SACK=500-1000
+                                                   --------
+                            1000-1499   2500, SACK=1000-1500
+                                                   ---------
+                            ...
+
+   In this case, the first packet is retransmitted following the
+   timeout.  Subsequently, the original window of packets arrives at the
+   receiver, resulting in ACKs for these segments.  Following this, the
+   retransmissions of these segments arrive, resulting in ACKs carrying
+   SACK blocks which identify the duplicate segments.
+
+   This can be identified as an early retransmission timeout because the
+   ACK for byte 1000 is received after the timeout with no SACK
+   information, followed by an ACK which carries SACK information (500-
+   999) indicating that the retransmitted segment had already been
+   received.
+
+   WITHOUT D-SACK:
+
+   If D-SACK was not used and one of the duplicate ACKs was piggybacked
+   on a data packet, the sender would not know how many duplicate
+   packets had been received.  If D-SACK was not used and none of the
+   duplicate ACKs were piggybacked on a data packet, then the sender
+   would have sent N duplicate packets, for some N, and received N
+   duplicate ACKs.  In this case, the sender could reasonably infer that
+
+
+
+
+Floyd, et al.               Standards Track                    [Page 13]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+   some data or ACK packet had been replicated in the network, or that
+   an early retransmission timeout had occurred (or that the receiver is
+   lying).
+
+   RESEARCH ISSUES:
+
+   After the sender determines that an unnecessary (i.e., early)
+   retransmit timeout has occurred, the sender could adjust parameters
+   for setting the RTO, to prevent more unnecessary retransmit timeouts.
+   Coupled with this, when the sender determines, after the fact, that
+   it has made an unnecessary window reduction, the sender has the
+   option of "undoing" that reduction in the congestion window.
+
+6. Security Considerations
+
+   This document neither strengthens nor weakens TCP's current security
+   properties.
+
+7. Acknowledgements
+
+   We would like to thank Mark Handley, Reiner Ludwig, and Venkat
+   Padmanabhan for conversations on these issues, and to thank Mark
+   Allman for helpful feedback on this document.
+
+8. References
+
+   [AP99]    Mark Allman and Vern Paxson, On Estimating End-to-End
+             Network Path Properties, SIGCOMM 99, August 1999.  URL
+             "http://www.acm.org/sigcomm/sigcomm99/papers/session7-
+             3.html".
+
+   [BPS99]   J.C.R. Bennett, C. Partridge, and N. Shectman, Packet
+             Reordering is Not Pathological Network Behavior, IEEE/ACM
+             Transactions on Networking, Vol. 7, No. 6, December 1999,
+             pp. 789-798.
+
+   [BPK97]   Hari Balakrishnan, Venkata Padmanabhan, and Randy H. Katz,
+             The Effects of Asymmetry on TCP Performance, Third ACM/IEEE
+             Mobicom Conference, Budapest, Hungary, Sep 1997.  URL
+             "http://www.cs.berkeley.edu/~padmanab/
+             index.html#Publications".
+
+   [F99]     Floyd, S., Re: TCP and out-of-order delivery, Message ID
+             <199902030027.QAA06775@owl.ee.lbl.gov> to the end-to-end-
+             interest mailing list, February 1999.  URL
+             "http://www.aciri.org/floyd/notes/TCP_Feb99.email".
+
+
+
+
+
+Floyd, et al.               Standards Track                    [Page 14]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+   [ISO8073] ISO/IEC, Information-processing systems - Open Systems
+             Interconnection - Connection Oriented Transport Protocol
+             Specification, Internation Standard ISO/IEC 8073, December
+             1988.
+
+   [L99]     Reiner Ludwig, A Case for Flow Adaptive Wireless links,
+             Technical Report UCB//CSD-99-1053, May 1999.  URL
+             "http://iceberg.cs.berkeley.edu/papers/Ludwig-
+             FlowAdaptive/".
+
+   [LK00]    Reiner Ludwig and Randy H. Katz, The Eifel Algorithm:
+             Making TCP Robust Against Spurious Retransmissions, SIGCOMM
+             Computer Communication Review, V. 30, N. 1, January 2000.
+             URL "http://www.acm.org/sigcomm/ccr/archive/ccr-toc/ccr-
+             toc-2000.html".
+
+   [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for
+             High Performance", RFC 1323, May 1992.
+
+   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and  A. Romanow, "TCP
+             Selective Acknowledgement Options", RFC 2018, April 1996.
+
+   [RFC2581] Allman, M., Paxson,V. and W. Stevens, "TCP Congestion
+             Control", RFC 2581, April 1999.
+
+   [SCWA99]  Stefan Savage, Neal Cardwell, David Wetherall, Tom
+             Anderson, TCP Congestion Control with a Misbehaving
+             Receiver, ACM Computer Communications Review, pp. 71-78, V.
+             29, N. 5, October, 1999.  URL
+             "http://www.acm.org/sigcomm/ccr/archive/ccr-toc/ccr-toc-
+             99.html".
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd, et al.               Standards Track                    [Page 15]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+Authors' Addresses
+
+   Sally Floyd
+   AT&T Center for Internet Research at ICSI (ACIRI)
+
+   Phone: +1 510-666-6989
+   EMail: floyd@aciri.org
+   URL:  http://www.aciri.org/floyd/
+
+
+   Jamshid Mahdavi
+   Novell
+
+   Phone: 1-408-967-3806
+   EMail: mahdavi@novell.com
+
+
+   Matt Mathis
+   Pittsburgh Supercomputing Center
+
+   Phone: 412 268-3319
+   EMail: mathis@psc.edu
+   URL: http://www.psc.edu/~mathis/
+
+
+   Matthew Podolsky
+   UC Berkeley Electrical Engineering & Computer Science Dept.
+
+   Phone: 510-649-8914
+   EMail: podolsky@eecs.berkeley.edu
+   URL: http://www.eecs.berkeley.edu/~podolsky
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd, et al.               Standards Track                    [Page 16]
+
+RFC 2883                     SACK Extension                    July 2000
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd, et al.               Standards Track                    [Page 17]
+
--- a/kernel/picotcp/RFC/rfc2884.txt
+++ b/kernel/picotcp/RFC/rfc2884.txt
--- a/kernel/picotcp/RFC/rfc2914.txt
+++ b/kernel/picotcp/RFC/rfc2914.txt
@ -0,0 +1,955 @@
+
+
+
+
+
+
+Network Working Group                                         S. Floyd
+Request for Comments: 2914                                       ACIRI
+BCP: 41                                                 September 2000
+Category: Best Current Practice
+
+
+                     Congestion Control Principles
+
+Status of this Memo
+
+   This document specifies an Internet Best Current Practices for the
+   Internet Community, and requests discussion and suggestions for
+   improvements.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+Abstract
+
+   The goal of this document is to explain the need for congestion
+   control in the Internet, and to discuss what constitutes correct
+   congestion control.  One specific goal is to illustrate the dangers
+   of neglecting to apply proper congestion control.  A second goal is
+   to discuss the role of the IETF in standardizing new congestion
+   control protocols.
+
+1.  Introduction
+
+   This document draws heavily from earlier RFCs, in some cases
+   reproducing entire sections of the text of earlier documents
+   [RFC2309, RFC2357].  We have also borrowed heavily from earlier
+   publications addressing the need for end-to-end congestion control
+   [FF99].
+
+2.  Current standards on congestion control
+
+   IETF standards concerning end-to-end congestion control focus either
+   on specific protocols (e.g., TCP [RFC2581], reliable multicast
+   protocols [RFC2357]) or on the syntax and semantics of communications
+   between the end nodes and routers about congestion information (e.g.,
+   Explicit Congestion Notification [RFC2481]) or desired quality-of-
+   service (diff-serv)).  The role of end-to-end congestion control is
+   also discussed in an Informational RFC on "Recommendations on Queue
+   Management and Congestion Avoidance in the Internet" [RFC2309].  RFC
+   2309 recommends the deployment of active queue management mechanisms
+   in routers, and the continuation of design efforts towards mechanisms
+
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 1]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   in routers to deal with flows that are unresponsive to congestion
+   notification.  We freely borrow from RFC 2309 some of their general
+   discussion of end-to-end congestion control.
+
+   In contrast to the RFCs discussed above, this document is a more
+   general discussion of the principles of congestion control.  One of
+   the keys to the success of the Internet has been the congestion
+   avoidance mechanisms of TCP.  While TCP is still the dominant
+   transport protocol in the Internet, it is not ubiquitous, and there
+   are an increasing number of applications that, for one reason or
+   another, choose not to use TCP.  Such traffic includes not only
+   multicast traffic, but unicast traffic such as streaming multimedia
+   that does not require reliability; and traffic such as DNS or routing
+   messages that consist of short transfers deemed critical to the
+   operation of the network.  Much of this traffic does not use any form
+   of either bandwidth reservations or end-to-end congestion control.
+   The continued use of end-to-end congestion control by best-effort
+   traffic is critical for maintaining the stability of the Internet.
+
+   This document also discusses the general role of the IETF in the
+   standardization of new congestion control protocols.
+
+   The discussion of congestion control principles for differentiated
+   services or integrated services is not addressed in this document.
+   Some categories of integrated or differentiated services include a
+   guarantee by the network of end-to-end bandwidth, and as such do not
+   require end-to-end congestion control mechanisms.
+
+3.  The development of end-to-end congestion control.
+
+3.1.  Preventing congestion collapse.
+
+   The Internet protocol architecture is based on a connectionless end-
+   to-end packet service using the IP protocol.  The advantages of its
+   connectionless design, flexibility and robustness, have been amply
+   demonstrated.  However, these advantages are not without cost:
+   careful design is required to provide good service under heavy load.
+   In fact, lack of attention to the dynamics of packet forwarding can
+   result in severe service degradation or "Internet meltdown".  This
+   phenomenon was first observed during the early growth phase of the
+   Internet of the mid 1980s [RFC896], and is technically called
+   "congestion collapse".
+
+   The original specification of TCP [RFC793] included window-based flow
+   control as a means for the receiver to govern the amount of data sent
+   by the sender.  This flow control was used to prevent overflow of the
+   receiver's data buffer space available for that connection.  [RFC793]
+
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 2]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   reported that segments could be lost due either to errors or to
+   network congestion, but did not include dynamic adjustment of the
+   flow-control window in response to congestion.
+
+   The original fix for Internet meltdown was provided by Van Jacobson.
+   Beginning in 1986, Jacobson developed the congestion avoidance
+   mechanisms that are now required in TCP implementations [Jacobson88,
+   RFC 2581].  These mechanisms operate in the hosts to cause TCP
+   connections to "back off" during congestion.  We say that TCP flows
+   are "responsive" to congestion signals (i.e., dropped packets) from
+   the network.  It is these TCP congestion avoidance algorithms that
+   prevent the congestion collapse of today's Internet.
+
+   However, that is not the end of the story.  Considerable research has
+   been done on Internet dynamics since 1988, and the Internet has
+   grown.  It has become clear that the TCP congestion avoidance
+   mechanisms [RFC2581], while necessary and powerful, are not
+   sufficient to provide good service in all circumstances.  In addition
+   to the development of new congestion control mechanisms [RFC2357],
+   router-based mechanisms are in development that complement the
+   endpoint congestion avoidance mechanisms.
+
+   A major issue that still needs to be addressed is the potential for
+   future congestion collapse of the Internet due to flows that do not
+   use responsible end-to-end congestion control.  RFC 896 [RFC896]
+   suggested in 1984 that gateways should detect and `squelch'
+   misbehaving hosts: "Failure to  respond  to  an  ICMP  Source  Quench
+   message, though,  should be regarded as grounds for action by a
+   gateway to disconnect a host.  Detecting such failure is non-trivial
+   but  is a worthwhile area for further research."  Current papers
+   still propose that routers detect and penalize flows that are not
+   employing acceptable end-to-end congestion control [FF99].
+
+3.2.  Fairness
+
+   In addition to a concern about congestion collapse, there is a
+   concern about `fairness' for best-effort traffic.  Because TCP "backs
+   off" during congestion, a large number of TCP connections can share a
+   single, congested link in such a way that bandwidth is shared
+   reasonably equitably among similarly situated flows.  The equitable
+   sharing of bandwidth among flows depends on the fact that all flows
+   are running compatible congestion control algorithms.  For TCP, this
+   means congestion control algorithms conformant with the current TCP
+   specification [RFC793, RFC1122, RFC2581].
+
+   The issue of fairness among competing flows has become increasingly
+   important for several reasons.  First, using window scaling
+   [RFC1323], individual TCPs can use high bandwidth even over high-
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 3]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   propagation-delay paths.  Second, with the growth of the web,
+   Internet users increasingly want high-bandwidth and low-delay
+   communications, rather than the leisurely transfer of a long file in
+   the background.  The growth of best-effort traffic that does not use
+   TCP underscores this concern about fairness between competing best-
+   effort traffic in times of congestion.
+
+   The popularity of the Internet has caused a proliferation in the
+   number of TCP implementations.  Some of these may fail to implement
+   the TCP congestion avoidance mechanisms correctly because of poor
+   implementation [RFC2525].  Others may deliberately be implemented
+   with congestion avoidance algorithms that are more aggressive in
+   their use of bandwidth than other TCP implementations; this would
+   allow a vendor to claim to have a "faster TCP".  The logical
+   consequence of such implementations would be a spiral of increasingly
+   aggressive TCP implementations, or increasingly aggressive transport
+   protocols, leading back to the point where there is effectively no
+   congestion avoidance and the Internet is chronically congested.
+
+   There is a well-known way to achieve more aggressive performance
+   without even changing the transport protocol, by changing the level
+   of granularity: open multiple connections to the same place, as has
+   been done in the past by some Web browsers.  Thus, instead of a
+   spiral of increasingly aggressive transport protocols, we would
+   instead have a spiral of increasingly aggressive web browsers, or
+   increasingly aggressive applications.
+
+   This raises the issue of the appropriate granularity of a "flow",
+   where we define a `flow' as the level of granularity appropriate for
+   the application of both fairness and congestion control.  From RFC
+   2309:  "There are a few `natural' answers: 1) a TCP or UDP connection
+   (source address/port, destination address/port); 2) a
+   source/destination host pair; 3) a given source host or a given
+   destination host.  We would guess that the source/destination host
+   pair gives the most appropriate granularity in many circumstances.
+   The granularity of flows for congestion management is, at least in
+   part, a policy question that needs to be addressed in the wider IETF
+   community."
+
+   Again borrowing from RFC 2309, we use the term "TCP-compatible" for a
+   flow that behaves under congestion like a flow produced by a
+   conformant TCP.  A TCP-compatible flow is responsive to congestion
+   notification, and in steady-state uses no more bandwidth than a
+   conformant TCP running under comparable conditions (drop rate, RTT,
+   MTU, etc.)
+
+
+
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 4]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   It is convenient to divide flows into three classes: (1) TCP-
+   compatible flows, (2) unresponsive flows, i.e., flows that do not
+   slow down when congestion occurs, and (3) flows that are responsive
+   but are not TCP-compatible.  The last two classes contain more
+   aggressive flows that pose significant threats to Internet
+   performance, as we discuss below.
+
+   In addition to steady-state fairness, the fairness of the initial
+   slow-start is also a concern.  One concern is the transient effect on
+   other flows of a flow with an overly-aggressive slow-start procedure.
+   Slow-start performance is particularly important for the many flows
+   that are short-lived, and only have a small amount of data to
+   transfer.
+
+3.3.  Optimizing performance regarding throughput, delay, and loss.
+
+   In addition to the prevention of congestion collapse and concerns
+   about fairness, a third reason for a flow to use end-to-end
+   congestion control can be to optimize its own performance regarding
+   throughput, delay, and loss.  In some circumstances, for example in
+   environments of high statistical multiplexing, the delay and loss
+   rate experienced by a flow are largely independent of its own sending
+   rate.  However, in environments with lower levels of statistical
+   multiplexing or with per-flow scheduling, the delay and loss rate
+   experienced by a flow is in part a function of the flow's own sending
+   rate.  Thus, a flow can use end-to-end congestion control to limit
+   the delay or loss experienced by its own packets.  We would note,
+   however, that in an environment like the current best-effort
+   Internet, concerns regarding congestion collapse and fairness with
+   competing flows limit the range of congestion control behaviors
+   available to a flow.
+
+4.  The role of the standards process
+
+   The standardization of a transport protocol includes not only
+   standardization of aspects of the protocol that could affect
+   interoperability (e.g., information exchanged by the end-nodes), but
+   also standardization of mechanisms deemed critical to performance
+   (e.g., in TCP, reduction of the congestion window in response to a
+   packet drop).  At the same time, implementation-specific details and
+   other aspects of the transport protocol that do not affect
+   interoperability and do not significantly interfere with performance
+   do not require standardization.  Areas of TCP that do not require
+   standardization include the details of TCP's Fast Recovery procedure
+   after a Fast Retransmit [RFC2582].  The appendix uses examples from
+   TCP to discuss in more detail the role of the standards process in
+   the development of congestion control.
+
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 5]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+4.1.  The development of new transport protocols.
+
+   In addition to addressing the danger of congestion collapse, the
+   standardization process for new transport protocols takes care to
+   avoid a congestion control `arms race' among competing protocols.  As
+   an example, in RFC 2357 [RFC2357] the TSV Area Directors and their
+   Directorate outline criteria for the publication as RFCs of
+   Internet-Drafts on reliable multicast transport protocols.  From
+   [RFC2357]:  "A particular concern for the IETF is the impact of
+   reliable multicast traffic on other traffic in the Internet in times
+   of congestion, in particular the effect of reliable multicast traffic
+   on competing TCP traffic....  The challenge to the IETF is to
+   encourage research and implementations of reliable multicast, and to
+   enable the needs of applications for reliable multicast to be met as
+   expeditiously as possible, while at the same time protecting the
+   Internet from the congestion disaster or collapse that could result
+   from the widespread use of applications with inappropriate reliable
+   multicast mechanisms."
+
+   The list of technical criteria that must be addressed by RFCs on new
+   reliable multicast transport protocols include the following:  "Is
+   there a congestion control mechanism? How well does it perform? When
+   does it fail?  Note that congestion control mechanisms that operate
+   on the network more aggressively than TCP will face a great burden of
+   proof that they don't threaten network stability."
+
+   It is reasonable to expect that these concerns about the effect of
+   new transport protocols on competing traffic will apply not only to
+   reliable multicast protocols, but to unreliable unicast, reliable
+   unicast, and unreliable multicast traffic as well.
+
+4.2.  Application-level issues that affect congestion control
+
+   The specific issue of a browser opening multiple connections to the
+   same destination has been addressed by RFC 2616 [RFC2616], which
+   states in Section 8.1.4 that "Clients that use persistent connections
+   SHOULD limit the number of simultaneous connections that they
+   maintain to a given server.  A single-user client SHOULD NOT maintain
+   more than 2 connections with any server or proxy."
+
+4.3.  New developments in the standards process
+
+   The most obvious developments in the IETF that could affect the
+   evolution of congestion control are the development of integrated and
+   differentiated services [RFC2212, RFC2475] and of Explicit Congestion
+   Notification (ECN) [RFC2481].  However, other less dramatic
+   developments are likely to affect congestion control as well.
+
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 6]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   One such effort is that to construct Endpoint Congestion Management
+   [BS00], to enable multiple concurrent flows from a sender to the same
+   receiver to share congestion control state.  By allowing multiple
+   connections to the same destination to act as one flow in terms of
+   end-to-end congestion control, a Congestion Manager could allow
+   individual connections slow-starting to take advantage of previous
+   information about the congestion state of the end-to-end path.
+   Further, the use of a Congestion Manager could remove the congestion
+   control dangers of multiple flows being opened between the same
+   source/destination pair, and could perhaps be used to allow a browser
+   to open many simultaneous connections to the same destination.
+
+5.  A description of congestion collapse
+
+   This section discusses congestion collapse from undelivered packets
+   in some detail, and shows how unresponsive flows could contribute to
+   congestion collapse in the Internet.  This section draws heavily on
+   material from [FF99].
+
+   Informally, congestion collapse occurs when an increase in the
+   network load results in a decrease in the useful work done by the
+   network.  As discussed in Section 3, congestion collapse was first
+   reported in the mid 1980s [RFC896], and was largely due to TCP
+   connections unnecessarily retransmitting packets that were either in
+   transit or had already been received at the receiver.  We call the
+   congestion collapse that results from the unnecessary retransmission
+   of packets classical congestion collapse.  Classical congestion
+   collapse is a stable condition that can result in throughput that is
+   a small fraction of normal [RFC896].  Problems with classical
+   congestion collapse have generally been corrected by the timer
+   improvements and congestion control mechanisms in modern
+   implementations of TCP [Jacobson88].
+
+   A second form of potential congestion collapse occurs due to
+   undelivered packets.  Congestion collapse from undelivered packets
+   arises when bandwidth is wasted by delivering packets through the
+   network that are dropped before reaching their ultimate destination.
+   This is probably the largest unresolved danger with respect to
+   congestion collapse in the Internet today.  Different scenarios can
+   result in different degrees of congestion collapse, in terms of the
+   fraction of the congested links' bandwidth used for productive work.
+   The danger of congestion collapse from undelivered packets is due
+   primarily to the increasing deployment of open-loop applications not
+   using end-to-end congestion control.  Even more destructive would be
+   best-effort applications that *increase* their sending rate in
+   response to an increased packet drop rate (e.g., automatically using
+   an increased level of FEC).
+
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 7]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   Table 1 gives the results from a scenario with congestion collapse
+   from undelivered packets, where scarce bandwidth is wasted by packets
+   that never reach their destination.  The simulation uses a scenario
+   with three TCP flows and one UDP flow competing over a congested 1.5
+   Mbps link.  The access links for all nodes are 10 Mbps, except that
+   the access link to the receiver of the UDP flow is 128 Kbps, only 9%
+   of the bandwidth of shared link.  When the UDP source rate exceeds
+   128 Kbps, most of the UDP packets will be dropped at the output port
+   to that final link.
+
+        UDP
+        Arrival   UDP       TCP       Total
+        Rate      Goodput   Goodput   Goodput
+       --------------------------------------
+         0.7       0.7      98.5      99.2
+         1.8       1.7      97.3      99.1
+         2.6       2.6      96.0      98.6
+         5.3       5.2      92.7      97.9
+         8.8       8.4      87.1      95.5
+        10.5       8.4      84.8      93.2
+        13.1       8.4      81.4      89.8
+        17.5       8.4      77.3      85.7
+        26.3       8.4      64.5      72.8
+        52.6       8.4      38.1      46.4
+        58.4       8.4      32.8      41.2
+        65.7       8.4      28.5      36.8
+        75.1       8.4      19.7      28.1
+        87.6       8.4      11.3      19.7
+       105.2       8.4       3.4      11.8
+       131.5       8.4       2.4      10.7
+
+   Table 1.  A simulation with three TCP flows and one UDP flow.
+
+   Table 1 shows the UDP arrival rate from the sender, the UDP goodput
+   (defined as the bandwidth delivered to the receiver), the TCP goodput
+   (as delivered to the TCP receivers), and the aggregate goodput on the
+   congested 1.5 Mbps link.  Each rate is given as a fraction of the
+   bandwidth of the congested link.  As the UDP source rate increases,
+   the TCP goodput decreases roughly linearly, and the UDP goodput is
+   nearly constant.  Thus, as the UDP flow increases its offered load,
+   its only effect is to hurt the TCP and aggregate goodput.  On the
+   congested link, the UDP flow ultimately `wastes' the bandwidth that
+   could have been used by the TCP flow, and reduces the goodput in the
+   network as a whole down to a small fraction of the bandwidth of the
+   congested link.
+
+
+
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 8]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   The simulations in Table 1 illustrate both unfairness and congestion
+   collapse.  As [FF99] discusses, compatible congestion control is not
+   the only way to provide fairness; per-flow scheduling at the
+   congested routers is an alternative mechanism at the routers that
+   guarantees fairness.  However, as discussed in [FF99], per-flow
+   scheduling can not be relied upon to prevent congestion collapse.
+
+   There are only two alternatives for eliminating the danger of
+   congestion collapse from undelivered packets.  The first alternative
+   for preventing congestion collapse from undelivered packets is the
+   use of effective end-to-end congestion control by the end nodes.
+   More specifically, the requirement would be that a flow avoid a
+   pattern of significant losses at links downstream from the first
+   congested link on the path.  (Here, we would consider any link a
+   `congested link' if any flow is using bandwidth that would otherwise
+   be used by other traffic on the link.) Given that an end-node is
+   generally unable to distinguish between a path with one congested
+   link and a path with multiple congested links, the most reliable way
+   for a flow to avoid a pattern of significant losses at a downstream
+   congested link is for the flow to use end-to-end congestion control,
+   and reduce its sending rate in the presence of loss.
+
+   A second alternative for preventing congestion collapse from
+   undelivered packets would be a guarantee by the network that packets
+   accepted at a congested link in the network will be delivered all the
+   way to the receiver [RFC2212, RFC2475].  We note that the choice
+   between the first alternative of end-to-end congestion control and
+   the second alternative of end-to-end bandwidth guarantees does not
+   have to be an either/or decision; congestion collapse can be
+   prevented by the use of effective end-to-end congestion by some of
+   the traffic, and the use of end-to-end bandwidth guarantees from the
+   network for the rest of the traffic.
+
+6.  Forms of end-to-end congestion control
+
+   This document has discussed concerns about congestion collapse and
+   about fairness with TCP for new forms of congestion control.  This
+   does not mean, however, that concerns about congestion collapse and
+   fairness with TCP necessitate that all best-effort traffic deploy
+   congestion control based on TCP's Additive-Increase Multiplicative-
+   Decrease (AIMD) algorithm of reducing the sending rate in half in
+   response to each packet drop.  This section separately discusses the
+   implications of these two concerns of congestion collapse and
+   fairness with TCP.
+
+
+
+
+
+
+
+Floyd, ed.               Best Current Practice                  [Page 9]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+6.1.  End-to-end congestion control for avoiding congestion collapse.
+
+   The avoidance of congestion collapse from undelivered packets
+   requires that flows avoid a scenario of a high sending rate, multiple
+   congested links, and a persistent high packet drop rate at the
+   downstream link.  Because congestion collapse from undelivered
+   packets consists of packets that waste valuable bandwidth only to be
+   dropped downstream, this form of congestion collapse is not possible
+   in an environment where each flow traverses only one congested link,
+   or where only a small number of packets are dropped at links
+   downstream of the first congested link.  Thus, any form of congestion
+   control that successfully avoids a high sending rate in the presence
+   of a high packet drop rate should be sufficient to avoid congestion
+   collapse from undelivered packets.
+
+   We would note that the addition of Explicit Congestion Notification
+   (ECN) to the IP architecture would not, in and of itself, remove the
+   danger of congestion collapse for best-effort traffic.  ECN allows
+   routers to set a bit in packet headers as an indication of congestion
+   to the end-nodes, rather than being forced to rely on packet drops to
+   indicate congestion.  However, with ECN, packet-marking would replace
+   packet-dropping only in times of moderate congestion.  In particular,
+   when congestion is heavy, and a router's buffers overflow, the router
+   has no choice but to drop arriving packets.
+
+6.2.  End-to-end congestion control for fairness with TCP.
+
+   The concern expressed in [RFC2357] about fairness with TCP places a
+   significant though not crippling constraint on the range of viable
+   end-to-end congestion control mechanisms for best-effort traffic.  An
+   environment with per-flow scheduling at all congested links would
+   isolate flows from each other, and eliminate the need for congestion
+   control mechanisms to be TCP-compatible.  An environment with
+   differentiated services, where flows marked as belonging to a certain
+   diff-serv class would be scheduled in isolation from best-effort
+   traffic, could allow the emergence of an entire diff-serv class of
+   traffic where congestion control was not required to be TCP-
+   compatible.  Similarly, a pricing-controlled environment, or a diff-
+   serv class with its own pricing paradigm, could supercede the concern
+   about fairness with TCP.  However, for the current Internet
+   environment, where other best-effort traffic could compete in a FIFO
+   queue with TCP traffic, the absence of fairness with TCP could lead
+   to one flow `starving out' another flow in a time of high congestion,
+   as was illustrated in Table 1 above.
+
+   However, the list of TCP-compatible congestion control procedures is
+   not limited to AIMD with the same increase/ decrease parameters as
+   TCP.  Other TCP-compatible congestion control procedures include
+
+
+
+Floyd, ed.               Best Current Practice                 [Page 10]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   rate-based variants of AIMD; AIMD with different sets of
+   increase/decrease parameters that give the same steady-state
+   behavior; equation-based congestion control where the sender adjusts
+   its sending rate in response to information about the long-term
+   packet drop rate; layered multicast where receivers subscribe and
+   unsubscribe from layered multicast groups; and possibly other forms
+   that we have not yet begun to consider.
+
+7. Acknowledgements
+
+   Much of this document draws directly on previous RFCs addressing
+   end-to-end congestion control.  This attempts to be a summary of
+   ideas that have been discussed for many years, and by many people.
+   In particular, acknowledgement is due to the members of the End-to-
+   End Research Group, the Reliable Multicast Research Group, and the
+   Transport Area Directorate.  This document has also benefited from
+   discussion and feedback from the Transport Area Working Group.
+   Particular thanks are due to Mark Allman for feedback on an earlier
+   version of this document.
+
+8. References
+
+   [BS00]       Balakrishnan H. and S. Seshan, "The Congestion Manager",
+                Work in Progress.
+
+   [DMKM00]     Dawkins, S., Montenegro, G., Kojo, M. and V. Magret,
+                "End-to-end Performance Implications of Slow Links",
+                Work in Progress.
+
+   [FF99]       Floyd, S. and K. Fall, "Promoting the Use of End-to-End
+                Congestion Control in the Internet", IEEE/ACM
+                Transactions on Networking, August 1999.  URL
+                http://www.aciri.org/floyd/end2end-paper.html
+
+   [HPF00]      Handley, M., Padhye, J. and S. Floyd, "TCP Congestion
+                Window Validation", RFC 2861, June 2000.
+
+   [Jacobson88] V. Jacobson, Congestion Avoidance and Control, ACM
+                SIGCOMM '88, August 1988.
+
+   [RFC793]     Postel, J., "Transmission Control Protocol", STD 7, RFC
+                793, September 1981.
+
+   [RFC896]     Nagle, J., "Congestion Control in IP/TCP", RFC 896,
+                January 1984.
+
+   [RFC1122]    Braden, R., Ed., "Requirements for Internet Hosts --
+                Communication Layers", STD 3, RFC 1122, October 1989.
+
+
+
+Floyd, ed.               Best Current Practice                 [Page 11]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   [RFC1323]    Jacobson, V., Braden, R. and D. Borman, "TCP Extensions
+                for High Performance", RFC 1323, May 1992.
+
+   [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate
+                Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC2212]    Shenker, S., Partridge, C. and R. Guerin, "Specification
+                of Guaranteed Quality of Service", RFC 2212, September
+                1997.
+
+   [RFC2309]    Braden, R., Clark, D., Crowcroft, J., Davie, B.,
+                Deering, S., Estrin, D., Floyd, S., Jacobson, V.,
+                Minshall, G., Partridge, C., Peterson, L., Ramakrishnan,
+                K.K., Shenker, S., Wroclawski, J., and L. Zhang,
+                "Recommendations on Queue Management and Congestion
+                Avoidance in the Internet", RFC 2309, April 1998.
+
+   [RFC2357]    Mankin, A., Romanow, A., Bradner, S. and V. Paxson,
+                "IETF Criteria for Evaluating Reliable Multicast
+                Transport and Application Protocols", RFC 2357, June
+                1998.
+
+   [RFC2414]    Allman, M., Floyd, S. and C. Partridge, "Increasing
+                TCP's Initial Window", RFC 2414, September 1998.
+
+   [RFC2475]    Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.
+                and W.  Weiss, "An Architecture for Differentiated
+                Services", RFC 2475, December 1998.
+
+   [RFC2481]    Ramakrishnan K. and S. Floyd, "A Proposal to add
+                Explicit Congestion Notification (ECN) to IP", RFC 2481,
+                January 1999.
+
+   [RFC2525]    Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner,
+                J., Heavens, I., Lahey, K., Semke, J. and B. Volz,
+                "Known TCP Implementation Problems", RFC 2525, March
+                1999.
+
+   [RFC2581]    Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+                Control", RFC 2581, April 1999.
+
+   [RFC2582]    Floyd, S. and T. Henderson, "The NewReno Modification to
+                TCP's Fast Recovery Algorithm", RFC 2582, April 1999.
+
+   [RFC2616]    Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+                Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext
+                Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
+
+
+
+
+Floyd, ed.               Best Current Practice                 [Page 12]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   [SCWA99]     S. Savage, N. Cardwell, D. Wetherall, and T. Anderson,
+                TCP Congestion Control with a Misbehaving Receiver, ACM
+                Computer Communications Review, October 1999.
+
+   [TCPB98]     Hari Balakrishnan, Venkata N. Padmanabhan, Srinivasan
+                Seshan, Mark Stemm, and Randy H. Katz, TCP Behavior of a
+                Busy Internet Server: Analysis and Improvements, IEEE
+                Infocom, March 1998.  Available from:
+                "http://www.cs.berkeley.edu/~hari/papers/infocom98.ps.gz".
+
+   [TCPF98]     Dong Lin and H.T. Kung, TCP Fast Recovery Strategies:
+                Analysis and Improvements, IEEE Infocom, March 1998.
+                Available from:
+                "http://www.eecs.harvard.edu/networking/papers/infocom-
+                tcp-final-198.pdf".
+
+9.  TCP-Specific issues
+
+   In this section we discuss some of the particulars of TCP congestion
+   control, to illustrate a realization of the congestion control
+   principles, including some of the details that arise when
+   incorporating them into a production transport protocol.
+
+9.1.  Slow-start.
+
+   The TCP sender can not open a new connection by sending a large burst
+   of data (e.g., a receiver's advertised window) all at once.  The TCP
+   sender is limited by a small initial value for the congestion window.
+   During slow-start, the TCP sender can increase its sending rate by at
+   most a factor of two in one roundtrip time.  Slow-start ends when
+   congestion is detected, or when the sender's congestion window is
+   greater than the slow-start threshold ssthresh.
+
+   An issue that potentially affects global congestion control, and
+   therefore has been explicitly addressed in the standards process,
+   includes an increase in the value of the initial window
+   [RFC2414,RFC2581].
+
+   Issues that have not been addressed in the standards process, and are
+   generally considered not to require standardization, include such
+   issues as the use (or non-use) of rate-based pacing, and mechanisms
+   for ending slow-start early, before the congestion window reaches
+   ssthresh.  Such mechanisms result in slow-start behavior that is as
+   conservative or more conservative than standard TCP.
+
+
+
+
+
+
+
+Floyd, ed.               Best Current Practice                 [Page 13]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+9.2.  Additive Increase, Multiplicative Decrease.
+
+   In the absence of congestion, the TCP sender increases its congestion
+   window by at most one packet per roundtrip time. In response to a
+   congestion indication, the TCP sender decreases its congestion window
+   by half.  (More precisely, the new congestion window is half of the
+   minimum of the congestion window and the receiver's advertised
+   window.)
+
+   An issue that potentially affects global congestion control, and
+   therefore would be likely to be explicitly addressed in the standards
+   process, would include a proposed addition of congestion control for
+   the return stream of `pure acks'.
+
+   An issue that has not been addressed in the standards process, and is
+   generally not considered to require standardization, would be a
+   change to the congestion window to apply as an upper bound on the
+   number of bytes presumed to be in the pipe, instead of applying as a
+   sliding window starting from the cumulative acknowledgement.
+   (Clearly, the receiver's advertised window applies as a sliding
+   window starting from the cumulative acknowledgement field, because
+   packets received above the cumulative acknowledgement field are held
+   in TCP's receive buffer, and have not been delivered to the
+   application.  However, the congestion window applies to the number of
+   packets outstanding in the pipe, and does not necessarily have to
+   include packets that have been received out-of-order by the TCP
+   receiver.)
+
+9.3.  Retransmit timers.
+
+   The TCP sender sets a retransmit timer to infer that a packet has
+   been dropped in the network.  When the retransmit timer expires, the
+   sender infers that a packet has been lost, sets ssthresh to half of
+   the current window, and goes into slow-start, retransmitting the lost
+   packet.  If the retransmit timer expires because no acknowledgement
+   has been received for a retransmitted packet, the retransmit timer is
+   also "backed-off", doubling the value of the next retransmit timeout
+   interval.
+
+   An issue that potentially affects global congestion control, and
+   therefore would be likely to be explicitly addressed in the standards
+   process, might include a modified mechanism for setting the
+   retransmit timer that could significantly increase the number of
+   retransmit timers that expire prematurely, when the acknowledgement
+   has not yet arrived at the sender, but in fact no packets have been
+   dropped.  This could be of concern to the Internet standards process
+
+
+
+
+
+Floyd, ed.               Best Current Practice                 [Page 14]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   because retransmit timers that expire prematurely could lead to an
+   increase in the number of packets unnecessarily transmitted on a
+   congested link.
+
+9.4.  Fast Retransmit and Fast Recovery.
+
+   After seeing three duplicate acknowledgements, the TCP sender infers
+   a packet loss.  The TCP sender sets ssthresh to half of the current
+   window, reduces the congestion window to at most half of the previous
+   window, and retransmits the lost packet.
+
+   An issue that potentially affects global congestion control, and
+   therefore would be likely to be explicitly addressed in the standards
+   process, might include a proposal (if there was one) for inferring a
+   lost packet after only one or two duplicate acknowledgements.  If
+   poorly designed, such a proposal could lead to an increase in the
+   number of packets unnecessarily transmitted on a congested path.
+
+   An issue that has not been addressed in the standards process, and
+   would not be expected to require standardization, would be a proposal
+   to send a "new" or presumed-lost packet in response to a duplicate or
+   partial acknowledgement, if allowed by the congestion window.  An
+   example of this would be sending a new packet in response to a single
+   duplicate acknowledgement, to keep the `ack clock' going in case no
+   further acknowledgements would have arrived.  Such a proposal is an
+   example of a beneficial change that does not involve interoperability
+   and does not affect global congestion control, and that therefore
+   could be implemented by vendors without requiring the intervention of
+   the IETF standards process.  (This issue has in fact been addressed
+   in [DMKM00], which suggests that "researchers may wish to experiment
+   with injecting new traffic into the network when duplicate
+   acknowledgements are being received, as described in [TCPB98] and
+   [TCPF98]."
+
+9.5.  Other aspects of TCP congestion control.
+
+   Other aspects of TCP congestion control that have not been discussed
+   in any of the sections above include TCP's recovery from an idle or
+   application-limited period [HPF00].
+
+10. Security Considerations
+
+   This document has been about the risks associated with congestion
+   control, or with the absence of congestion control.  Section 3.2
+   discusses the potentials for unfairness if competing flows don't use
+   compatible congestion control mechanisms, and Section 5 considers the
+   dangers of congestion collapse if flows don't use end-to-end
+   congestion control.
+
+
+
+Floyd, ed.               Best Current Practice                 [Page 15]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+   Because this document does not propose any specific congestion
+   control mechanisms, it is also not necessary to present specific
+   security measures associated with congestion control.  However, we
+   would note that there are a range of security considerations
+   associated with congestion control that should be considered in IETF
+   documents.
+
+   For example, individual congestion control mechanisms should be as
+   robust as possible to the attempts of individual end-nodes to subvert
+   end-to-end congestion control [SCWA99].  This is a particular concern
+   in multicast congestion control, because of the far-reaching
+   distribution of the traffic and the greater opportunities for
+   individual receivers to fail to report congestion.
+
+   RFC 2309 also discussed the potential dangers to the Internet of
+   unresponsive flows, that is, flows that don't reduce their sending
+   rate in the presence of congestion, and describes the need for
+   mechanisms in the network to deal with flows that are unresponsive to
+   congestion notification.  We would note that there is still a need
+   for research, engineering, measurement, and deployment in these
+   areas.
+
+   Because the Internet aggregates very large numbers of flows, the risk
+   to the whole infrastructure of subverting the congestion control of a
+   few individual flows is limited.  Rather, the risk to the
+   infrastructure would come from the widespread deployment of many
+   end-nodes subverting end-to-end congestion control.
+
+AUTHOR'S ADDRESS
+
+   Sally Floyd
+   AT&T Center for Internet Research at ICSI (ACIRI)
+
+   Phone: +1 (510) 642-4274 x189
+   EMail: floyd@aciri.org
+   URL: http://www.aciri.org/floyd/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd, ed.               Best Current Practice                 [Page 16]
+
+RFC 2914             Congestion Control Principles        September 2000
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd, ed.               Best Current Practice                 [Page 17]
+
--- a/kernel/picotcp/RFC/rfc2923.txt
+++ b/kernel/picotcp/RFC/rfc2923.txt
@ -0,0 +1,843 @@
+
+
+
+
+
+
+Network Working Group                                        K. Lahey
+Request for Comments: 2923                            dotRocket, Inc.
+Category: Informational                                September 2000
+
+
+                  TCP Problems with Path MTU Discovery
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+Abstract
+
+   This memo catalogs several known Transmission Control Protocol (TCP)
+   implementation problems dealing with Path Maximum Transmission Unit
+   Discovery (PMTUD), including the long-standing black hole problem,
+   stretch acknowlegements (ACKs) due to confusion between Maximum
+   Segment Size (MSS) and segment size, and MSS advertisement based on
+   PMTU.
+
+1. Introduction
+
+   This memo catalogs several known TCP implementation problems dealing
+   with Path MTU Discovery [RFC1191], including the long-standing black
+   hole problem, stretch ACKs due to confusion between MSS and segment
+   size, and MSS advertisement based on PMTU.  The goal in doing so is
+   to improve conditions in the existing Internet by enhancing the
+   quality of current TCP/IP implementations.
+
+   While Path MTU Discovery (PMTUD) can be used with any upper-layer
+   protocol, it is most commonly used by TCP;  this document does not
+   attempt to treat problems encountered by other upper-layer protocols.
+   Path MTU Discovery for IPv6 [RFC1981] treats only IPv6-dependent
+   issues, but not the TCP issues brought up in this document.
+
+   Each problem is defined as follows:
+
+   Name of Problem
+      The name associated with the problem.  In this memo, the name is
+      given as a subsection heading.
+
+
+
+
+
+Lahey                        Informational                      [Page 1]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+   Classification
+      One or more problem categories for which the problem is
+      classified:  "congestion control", "performance", "reliability",
+      "non-interoperation -- connectivity failure".
+
+   Description
+      A definition of the problem, succinct but including necessary
+      background material.
+
+   Significance
+      A brief summary of the sorts of environments for which the problem
+      is significant.
+
+   Implications
+      Why the problem is viewed as a problem.
+
+   Relevant RFCs
+      The RFCs defining the TCP specification with which the problem
+      conflicts.  These RFCs often qualify behavior using terms such as
+      MUST, SHOULD, MAY, and others written capitalized.  See RFC 2119
+      for the exact interpretation of these terms.
+
+   Trace file demonstrating the problem
+      One or more ASCII trace files demonstrating the problem, if
+      applicable.
+
+   Trace file demonstrating correct behavior
+      One or more examples of how correct behavior appears in a trace,
+      if applicable.
+
+   References
+      References that further discuss the problem.
+
+   How to detect
+      How to test an implementation to see if it exhibits the problem.
+      This discussion may include difficulties and subtleties associated
+      with causing the problem to manifest itself, and with interpreting
+      traces to detect the presence of the problem (if applicable).
+
+   How to fix
+      For known causes of the problem, how to correct the
+      implementation.
+
+
+
+
+
+
+
+
+
+Lahey                        Informational                      [Page 2]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+2. Known implementation problems
+
+2.1.
+
+   Name of Problem
+      Black Hole Detection
+
+   Classification
+      Non-interoperation -- connectivity failure
+
+   Description
+      A host performs Path MTU Discovery by sending out as large a
+      packet as possible, with the Don't Fragment (DF) bit set in the IP
+      header.  If the packet is too large for a router to forward on to
+      a particular link, the router must send an ICMP Destination
+      Unreachable -- Fragmentation Needed message to the source address.
+      The host then adjusts the packet size based on the ICMP message.
+
+      As was pointed out in [RFC1435], routers don't always do this
+      correctly -- many routers fail to send the ICMP messages, for a
+      variety of reasons ranging from kernel bugs to configuration
+      problems.  Firewalls are often misconfigured to suppress all ICMP
+      messages.  IPsec [RFC2401] and IP-in-IP [RFC2003] tunnels
+      shouldn't cause these sorts of problems, if the implementations
+      follow the advice in the appropriate documents.
+
+      PMTUD, as documented in [RFC1191], fails when the appropriate ICMP
+      messages are not received by the originating host.  The upper-
+      layer protocol continues to try to send large packets and, without
+      the ICMP messages, never discovers that it needs to reduce the
+      size of those packets.  Its packets are disappearing into a PMTUD
+      black hole.
+
+   Significance
+      When PMTUD fails due to the lack of ICMP messages, TCP will also
+      completely fail under some conditions.
+
+   Implications
+      This failure is especially difficult to debug, as pings and some
+      interactive TCP connections to the destination host work.  Bulk
+      transfers fail with the first large packet and the connection
+      eventually times out.
+
+      These situations can almost always be blamed on a misconfiguration
+      within the network, which should be corrected.  However it seems
+      inappropriate for some TCP implementations to suffer
+
+
+
+
+
+Lahey                        Informational                      [Page 3]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+      interoperability failures over paths which do not affect other TCP
+      implementations (i.e. those without PMTUD).  This creates a market
+      disincentive for deploying TCP implementation with PMTUD enabled.
+
+   Relevant RFCs
+      RFC 1191 describes Path MTU Discovery.  RFC 1435 provides an early
+      description of these sorts of problems.
+
+   Trace file demonstrating the problem
+      Made using tcpdump [Jacobson89] recording at an intermediate host.
+
+      20:12:11.951321 A > B: S 1748427200:1748427200(0)
+           win 49152 <mss 1460>
+      20:12:11.951829 B > A: S 1001927984:1001927984(0)
+           ack 1748427201 win 16384 <mss 65240>
+      20:12:11.955230 A > B: . ack 1 win 49152 (DF)
+      20:12:11.959099 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:12:13.139074 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:12:16.188685 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:12:22.290483 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:12:34.491856 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:12:58.896405 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:13:47.703184 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:14:52.780640 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:15:57.856037 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:17:02.932431 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:18:08.009337 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:19:13.090521 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:20:18.168066 A > B: . 1:1461(1460) ack 1 win 49152 (DF)
+      20:21:23.242761 A > B: R 1461:1461(0) ack 1 win 49152 (DF)
+
+      The short SYN packet has no trouble traversing the network, due to
+      its small size.  Similarly, ICMP echo packets used to diagnose
+      connectivity problems will succeed.
+
+      Large data packets fail to traverse the network.  Eventually the
+      connection times out.  This can be especially confusing when the
+      application starts out with a very small write, which succeeds,
+      following up with many large writes, which then fail.
+
+   Trace file demonstrating correct behavior
+
+      Made using tcpdump recording at an intermediate host.
+
+      16:48:42.659115 A > B: S 271394446:271394446(0)
+           win 8192 <mss 1460> (DF)
+      16:48:42.672279 B > A: S 2837734676:2837734676(0)
+           ack 271394447 win 16384 <mss 65240>
+
+
+
+Lahey                        Informational                      [Page 4]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+      16:48:42.676890 A > B: . ack 1 win 8760 (DF)
+      16:48:42.870574 A > B: . 1:1461(1460) ack 1 win 8760 (DF)
+      16:48:42.871799 A > B: . 1461:2921(1460) ack 1 win 8760 (DF)
+      16:48:45.786814 A > B: . 1:1461(1460) ack 1 win 8760 (DF)
+      16:48:51.794676 A > B: . 1:1461(1460) ack 1 win 8760 (DF)
+      16:49:03.808912 A > B: . 1:537(536) ack 1 win 8760
+      16:49:04.016476 B > A: . ack 537 win 16384
+      16:49:04.021245 A > B: . 537:1073(536) ack 1 win 8760
+      16:49:04.021697 A > B: . 1073:1609(536) ack 1 win 8760
+      16:49:04.120694 B > A: . ack 1609 win 16384
+      16:49:04.126142 A > B: . 1609:2145(536) ack 1 win 8760
+
+      In this case, the sender sees four packets fail to traverse the
+      network (using a two-packet initial send window) and turns off
+      PMTUD.  All subsequent packets have the DF flag turned off, and
+      the size set to the default value of 536 [RFC1122].
+
+   References
+      This problem has been discussed extensively on the tcp-impl
+      mailing list;  the name "black hole" has been in use for many
+      years.
+
+   How to detect
+      This shows up as a TCP connection which hangs (fails to make
+      progress) until closed by timeout (this often manifests itself as
+      a connection that connects and starts to transfer, then eventually
+      terminates after 15 minutes with zero bytes transfered).  This is
+      particularly annoying with an application like ftp, which will
+      work perfectly while it uses small packets for control
+      information, and then fail on bulk transfers.
+
+      A series of ICMP echo packets will show that the two end hosts are
+      still capable of passing packets,  a series of MTU-sized ICMP echo
+      packets will show some fragmentation, and a series of MTU-sized
+      ICMP echo packets with DF set will fail.  This can be confusing
+      for network engineers trying to diagnose the problem.
+
+      There are several traceroute implementations that do PMTUD, and
+      can demonstrate the problem.
+
+   How to fix
+      TCP should notice that the connection is timing out.  After
+      several timeouts, TCP should attempt to send smaller packets,
+      perhaps turning off the DF flag for each packet.  If this
+      succeeds, it should continue to turn off PMTUD for the connection
+      for some reasonable period of time, after which it should probe
+      again to try to determine if the path has changed.
+
+
+
+
+Lahey                        Informational                      [Page 5]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+      Note that, under IPv6, there is no DF bit -- it is implicitly on
+      at all times.  Fragmentation is not allowed in routers, only at
+      the originating host.  Fortunately, the minimum supported MTU for
+      IPv6 is 1280 octets, which is significantly larger than the 68
+      octet minimum in IPv4.  This should make it more reasonable for
+      IPv6 TCP implementations to fall back to 1280 octet packets, when
+      IPv4 implementations will probably have to turn off DF to respond
+      to black hole detection.
+
+      Ideally, the ICMP black holes should be fixed when they are found.
+
+      If hosts start to implement black hole detection, it may be that
+      these problems will go unnoticed and unfixed.  This is especially
+      unfortunate, since detection can take several seconds each time,
+      and these delays could result in a significant, hidden degradation
+      of performance.  Hosts that implement black hole detection should
+      probably log detected black holes, so that they can be fixed.
+
+2.2.
+
+   Name of Problem
+      Stretch ACK due to PMTUD
+
+   Classification
+      Congestion Control / Performance
+
+   Description
+      When a naively implemented TCP stack communicates with a PMTUD
+      equipped stack, it will try to generate an ACK for every second
+      full-sized segment.  If it determines the full-sized segment based
+      on the advertised MSS, this can degrade badly in the face of
+      PMTUD.
+
+      The PMTU can wind up being a small fraction of the advertised MSS;
+      in this case, an ACK would be generated only very infrequently.
+
+   Significance
+
+      Stretch ACKs have a variety of unfortunate effects, more fully
+      outlined in [RFC2525].  Most of these have to do with encouraging
+      a more bursty connection, due to the infrequent arrival of ACKs.
+      They can also impede congestion window growth.
+
+   Implications
+
+      The complete implications of stretch ACKs are outlined in
+      [RFC2525].
+
+
+
+
+Lahey                        Informational                      [Page 6]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+   Relevant RFCs
+      RFC 1122 outlines the requirements for frequency of ACK
+      generation.  [RFC2581] expands on this and clarifies that delayed
+      ACK is a SHOULD, not a MUST.
+
+   Trace file demonstrating it
+
+      Made using tcpdump recording at an intermediate host.  The
+      timestamp options from all but the first two packets have been
+      removed for clarity.
+
+   18:16:52.976657 A > B: S 3183102292:3183102292(0) win 16384
+        <mss 4312,nop,wscale 0,nop,nop,timestamp 12128 0> (DF)
+   18:16:52.979580 B > A: S 2022212745:2022212745(0) ack 3183102293 win
+        49152 <mss 4312,nop,wscale 1,nop,nop,timestamp 1592957 12128> (DF)
+   18:16:52.979738 A > B: . ack 1 win 17248  (DF)
+   18:16:52.982473 A > B: . 1:4301(4300) ack 1 win 17248  (DF)
+   18:16:52.982557 C > A: icmp: B unreachable -
+        need to frag (mtu 1500)! (DF)
+   18:16:52.985839 B > A: . ack 1 win 32768  (DF)
+   18:16:54.129928 A > B: . 1:1449(1448) ack 1 win 17248  (DF)
+        .
+        .
+        .
+   18:16:58.507078 A > B: . 1463941:1465389(1448) ack 1 win 17248  (DF)
+   18:16:58.507200 A > B: . 1465389:1466837(1448) ack 1 win 17248  (DF)
+   18:16:58.507326 A > B: . 1466837:1468285(1448) ack 1 win 17248  (DF)
+   18:16:58.507439 A > B: . 1468285:1469733(1448) ack 1 win 17248  (DF)
+   18:16:58.524763 B > A: . ack 1452357 win 32768  (DF)
+   18:16:58.524986 B > A: . ack 1461045 win 32768  (DF)
+   18:16:58.525138 A > B: . 1469733:1471181(1448) ack 1 win 17248  (DF)
+   18:16:58.525268 A > B: . 1471181:1472629(1448) ack 1 win 17248  (DF)
+   18:16:58.525393 A > B: . 1472629:1474077(1448) ack 1 win 17248  (DF)
+   18:16:58.525516 A > B: . 1474077:1475525(1448) ack 1 win 17248  (DF)
+   18:16:58.525642 A > B: . 1475525:1476973(1448) ack 1 win 17248  (DF)
+   18:16:58.525766 A > B: . 1476973:1478421(1448) ack 1 win 17248  (DF)
+   18:16:58.526063 A > B: . 1478421:1479869(1448) ack 1 win 17248  (DF)
+   18:16:58.526187 A > B: . 1479869:1481317(1448) ack 1 win 17248  (DF)
+   18:16:58.526310 A > B: . 1481317:1482765(1448) ack 1 win 17248  (DF)
+   18:16:58.526432 A > B: . 1482765:1484213(1448) ack 1 win 17248  (DF)
+   18:16:58.526561 A > B: . 1484213:1485661(1448) ack 1 win 17248  (DF)
+   18:16:58.526671 A > B: . 1485661:1487109(1448) ack 1 win 17248  (DF)
+   18:16:58.537944 B > A: . ack 1478421 win 32768  (DF)
+   18:16:58.538328 A > B: . 1487109:1488557(1448) ack 1 win 17248  (DF)
+
+
+
+
+
+
+
+Lahey                        Informational                      [Page 7]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+   Note that the interval between ACKs is significantly larger than two
+   times the segment size;  it works out to be almost exactly two times
+   the advertised MSS.  This transfer was long enough that it could be
+   verified that the stretch ACK was not the result of lost ACK packets.
+
+   Trace file demonstrating correct behavior
+
+   Made using tcpdump recording at an intermediate host.  The timestamp
+   options from all but the first two packets have been removed for
+   clarity.
+
+   18:13:32.287965 A > B: S 2972697496:2972697496(0)
+        win 16384 <mss 4312,nop,wscale 0,nop,nop,timestamp 11326 0> (DF)
+   18:13:32.290785 B > A: S 245639054:245639054(0)
+        ack 2972697497 win 34496 <mss 4312> (DF)
+   18:13:32.290941 A > B: . ack 1 win 17248 (DF)
+   18:13:32.293774 A > B: . 1:4313(4312) ack 1 win 17248 (DF)
+   18:13:32.293856 C > A: icmp: B unreachable -
+        need to frag (mtu 1500)! (DF)
+   18:13:33.637338 A > B: . 1:1461(1460) ack 1 win 17248 (DF)
+        .
+        .
+        .
+   18:13:35.561691 A > B: . 1514021:1515481(1460) ack 1 win 17248 (DF)
+   18:13:35.561814 A > B: . 1515481:1516941(1460) ack 1 win 17248 (DF)
+   18:13:35.561938 A > B: . 1516941:1518401(1460) ack 1 win 17248 (DF)
+   18:13:35.562059 A > B: . 1518401:1519861(1460) ack 1 win 17248 (DF)
+   18:13:35.562174 A > B: . 1519861:1521321(1460) ack 1 win 17248 (DF)
+   18:13:35.564008 B > A: . ack 1481901 win 64680 (DF)
+   18:13:35.564383 A > B: . 1521321:1522781(1460) ack 1 win 17248 (DF)
+   18:13:35.564499 A > B: . 1522781:1524241(1460) ack 1 win 17248 (DF)
+   18:13:35.615576 B > A: . ack 1484821 win 64680 (DF)
+   18:13:35.615646 B > A: . ack 1487741 win 64680 (DF)
+   18:13:35.615716 B > A: . ack 1490661 win 64680 (DF)
+   18:13:35.615784 B > A: . ack 1493581 win 64680 (DF)
+   18:13:35.615856 B > A: . ack 1496501 win 64680 (DF)
+   18:13:35.615952 A > B: . 1524241:1525701(1460) ack 1 win 17248 (DF)
+   18:13:35.615966 B > A: . ack 1499421 win 64680 (DF)
+   18:13:35.616088 A > B: . 1525701:1527161(1460) ack 1 win 17248 (DF)
+   18:13:35.616105 B > A: . ack 1502341 win 64680 (DF)
+   18:13:35.616211 A > B: . 1527161:1528621(1460) ack 1 win 17248 (DF)
+   18:13:35.616228 B > A: . ack 1505261 win 64680 (DF)
+   18:13:35.616327 A > B: . 1528621:1530081(1460) ack 1 win 17248 (DF)
+   18:13:35.616349 B > A: . ack 1508181 win 64680 (DF)
+   18:13:35.616448 A > B: . 1530081:1531541(1460) ack 1 win 17248 (DF)
+   18:13:35.616565 A > B: . 1531541:1533001(1460) ack 1 win 17248 (DF)
+   18:13:35.616891 A > B: . 1533001:1534461(1460) ack 1 win 17248 (DF)
+
+
+
+
+Lahey                        Informational                      [Page 8]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+   In this trace, an ACK is generated for every two segments that
+   arrive.  (The segment size is slightly larger in this trace, even
+   though the source hosts are the same, because of the lack of
+   timestamp options in this trace.)
+
+   How to detect
+   This condition can be observed in a packet trace when the advertised
+   MSS is significantly larger than the actual PMTU of a connection.
+
+   How to fix Several solutions for this problem have been proposed:
+
+   A simple solution is to ACK every other packet, regardless of size.
+   This has the drawback of generating large numbers of ACKs in the face
+   of lots of very small packets;  this shows up with applications like
+   the X Window System.
+
+   A slightly more complex solution would monitor the size of incoming
+   segments and try to determine what segment size the sender is using.
+   This requires slightly more state in the receiver, but has the
+   advantage of making receiver silly window syndrome avoidance
+   computations more accurate [RFC813].
+
+2.3.
+
+   Name of Problem
+   Determining MSS from PMTU
+
+   Classification
+   Performance
+
+   Description
+   The MSS advertised at the start of a connection should be based on
+   the MTU of the interfaces on the system.  (For efficiency and other
+   reasons this may not be the largest MSS possible.)  Some systems use
+   PMTUD determined values to determine the MSS to advertise.
+
+   This results in an advertised MSS that is smaller than the largest
+   MTU the system can receive.
+
+   Significance
+   The advertised MSS is an indication to the remote system about the
+   largest TCP segment that can be received [RFC879].  If this value is
+   too small, the remote system will be forced to use a smaller segment
+   size when sending, purely because the local system found a particular
+   PMTU earlier.
+
+
+
+
+
+
+Lahey                        Informational                      [Page 9]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+   Given the asymmetric nature of many routes on the Internet
+   [Paxson97], it seems entirely possible that the return PMTU is
+   different from the sending PMTU.  Limiting the segment size in this
+   way can reduce performance and frustrate the PMTUD algorithm.
+
+   Even if the route was symmetric, setting this artificially lowered
+   limit on segment size will make it impossible to probe later to
+   determine if the PMTU has changed.
+
+   Implications
+   The whole point of PMTUD is to send as large a segment as possible.
+   If long-running connections cannot successfully probe for larger
+   PMTU, then potential performance gains will be impossible to realize.
+   This destroys the whole point of PMTUD.
+
+   Relevant RFCs RFC 1191.  [RFC879] provides a complete discussion of
+   MSS calculations and appropriate values.  Note that this practice
+   does not violate any of the specifications in these RFCs.
+
+   Trace file demonstrating it
+   This trace was made using tcpdump running on an intermediate host.
+   Host A initiates two separate consecutive connections, A1 and A2, to
+   host B.  Router C is the location of the MTU bottleneck.  As usual,
+   TCP options are removed from all non-SYN packets.
+
+   22:33:32.305912 A1 > B: S 1523306220:1523306220(0)
+        win 8760 <mss 1460> (DF)
+   22:33:32.306518 B > A1: S 729966260:729966260(0)
+        ack 1523306221 win 16384 <mss 65240>
+   22:33:32.310307 A1 > B: . ack 1 win 8760 (DF)
+   22:33:32.323496 A1 > B: P 1:1461(1460) ack 1 win 8760 (DF)
+   22:33:32.323569 C > A1: icmp: 129.99.238.5 unreachable -
+        need to frag (mtu 1024) (DF) (ttl 255, id 20666)
+   22:33:32.783694 A1 > B: . 1:985(984) ack 1 win 8856 (DF)
+   22:33:32.840817 B > A1: . ack 985 win 16384
+   22:33:32.845651 A1 > B: . 1461:2445(984) ack 1 win 8856 (DF)
+   22:33:32.846094 B > A1: . ack 985 win 16384
+   22:33:33.724392 A1 > B: . 985:1969(984) ack 1 win 8856 (DF)
+   22:33:33.724893 B > A1: . ack 2445 win 14924
+   22:33:33.728591 A1 > B: . 2445:2921(476) ack 1 win 8856 (DF)
+   22:33:33.729161 A1 > B: . ack 1 win 8856 (DF)
+   22:33:33.840758 B > A1: . ack 2921 win 16384
+
+   [...]
+
+   22:33:34.238659 A1 > B: F 7301:8193(892) ack 1 win 8856 (DF)
+   22:33:34.239036 B > A1: . ack 8194 win 15492
+   22:33:34.239303 B > A1: F 1:1(0) ack 8194 win 16384
+
+
+
+Lahey                        Informational                     [Page 10]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+   22:33:34.242971 A1 > B: . ack 2 win 8856 (DF)
+   22:33:34.454218 A2 > B: S 1523591299:1523591299(0)
+        win 8856 <mss 984> (DF)
+   22:33:34.454617 B > A2: S 732408874:732408874(0)
+        ack 1523591300 win 16384 <mss 65240>
+   22:33:34.457516 A2 > B: . ack 1 win 8856 (DF)
+   22:33:34.470683 A2 > B: P 1:985(984) ack 1 win 8856 (DF)
+   22:33:34.471144 B > A2: . ack 985 win 16384
+   22:33:34.476554 A2 > B: . 985:1969(984) ack 1 win 8856 (DF)
+   22:33:34.477580 A2 > B: P 1969:2953(984) ack 1 win 8856 (DF)
+
+   [...]
+
+   Notice that the SYN packet for session A2 specifies an MSS of 984.
+
+   Trace file demonstrating correct behavior
+
+   As before, this trace was made using tcpdump running on an
+   intermediate host.  Host A initiates two separate consecutive
+   connections, A1 and A2, to host B.  Router C is the location of the
+   MTU bottleneck.  As usual, TCP options are removed from all non-SYN
+   packets.
+
+   22:36:58.828602 A1 > B: S 3402991286:3402991286(0) win 32768
+        <mss 4312,wscale 0,nop,timestamp 1123370309 0,
+         echo 1123370309> (DF)
+   22:36:58.844040 B > A1: S 946999880:946999880(0)
+        ack 3402991287 win 16384
+        <mss 65240,nop,wscale 0,nop,nop,timestamp 429552 1123370309>
+   22:36:58.848058 A1 > B: . ack 1 win 32768  (DF)
+   22:36:58.851514 A1 > B: P 1:1025(1024) ack 1 win 32768  (DF)
+   22:36:58.851584 C > A1: icmp: 129.99.238.5 unreachable -
+        need to frag (mtu 1024) (DF)
+   22:36:58.855885 A1 > B: . 1:969(968) ack 1 win 32768  (DF)
+   22:36:58.856378 A1 > B: . 969:985(16) ack 1 win 32768  (DF)
+   22:36:59.036309 B > A1: . ack 985 win 16384
+   22:36:59.039255 A1 > B: FP 985:1025(40) ack 1 win 32768  (DF)
+   22:36:59.039623 B > A1: . ack 1026 win 16344
+   22:36:59.039828 B > A1: F 1:1(0) ack 1026 win 16384
+   22:36:59.043037 A1 > B: . ack 2 win 32768  (DF)
+   22:37:01.436032 A2 > B: S 3404812097:3404812097(0) win 32768
+        <mss 4312,wscale 0,nop,timestamp 1123372916 0,
+         echo 1123372916> (DF)
+   22:37:01.436424 B > A2: S 949814769:949814769(0)
+        ack 3404812098 win 16384
+        <mss 65240,nop,wscale 0,nop,nop,timestamp 429562 1123372916>
+   22:37:01.440147 A2 > B: . ack 1 win 32768  (DF)
+   22:37:01.442736 A2 > B: . 1:969(968) ack 1 win 32768  (DF)
+
+
+
+Lahey                        Informational                     [Page 11]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+   22:37:01.442894 A2 > B: P 969:985(16) ack 1 win 32768  (DF)
+   22:37:01.443283 B > A2: . ack 985 win 16384
+   22:37:01.446068 A2 > B: P 985:1025(40) ack 1 win 32768  (DF)
+   22:37:01.446519 B > A2: . ack 1025 win 16384
+   22:37:01.448465 A2 > B: F 1025:1025(0) ack 1 win 32768  (DF)
+   22:37:01.448837 B > A2: . ack 1026 win 16384
+   22:37:01.449007 B > A2: F 1:1(0) ack 1026 win 16384
+   22:37:01.452201 A2 > B: . ack 2 win 32768  (DF)
+
+   Note that the same MSS was used for both session A1 and session A2.
+
+   How to detect
+   This can be detected using a packet trace of two separate
+   connections;  the first should invoke PMTUD; the second should start
+   soon enough after the first that the PMTU value does not time out.
+
+   How to fix
+   The MSS should be determined based on the MTUs of the interfaces on
+   the system, as outlined in [RFC1122] and [RFC1191].
+
+3. Security Considerations
+
+   The one security concern raised by this memo is that ICMP black holes
+   are often caused by over-zealous security administrators who block
+   all ICMP messages.  It is vitally important that those who design and
+   deploy security systems understand the impact of strict filtering on
+   upper-layer protocols.  The safest web site in the world is worthless
+   if most TCP implementations cannot transfer data from it.  It would
+   be far nicer to have all of the black holes fixed rather than fixing
+   all of the TCP implementations.
+
+4. Acknowledgements
+
+   Thanks to Mark Allman, Vern Paxson, and Jamshid Mahdavi for generous
+   help reviewing the document, and to Matt Mathis for early suggestions
+   of various mechanisms that can cause PMTUD black holes, as well as
+   review.  The structure for describing TCP problems, and the early
+   description of that structure is from [RFC2525].  Special thanks to
+   Amy Bock, who helped perform the PMTUD tests which discovered these
+   bugs.
+
+
+
+
+
+
+
+
+
+
+
+Lahey                        Informational                     [Page 12]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+5. References
+
+   [RFC2581]    Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+                Control", RFC 2581, April 1999.
+
+   [RFC1122]    Braden, R., "Requirements for Internet Hosts --
+                Communication Layers", STD 3, RFC 1122, October 1989.
+
+   [RFC813]     Clark, D., "Window and Acknowledgement Strategy in TCP",
+                RFC 813, July 1982.
+
+   [Jacobson89] V. Jacobson, C. Leres, and S. McCanne, tcpdump, June
+                1989, ftp.ee.lbl.gov
+
+   [RFC1435]    Knowles, S., "IESG Advice from Experience with Path MTU
+                Discovery", RFC 1435, March 1993.
+
+   [RFC1191]    Mogul, J. and S. Deering, "Path MTU discovery", RFC
+                1191, November 1990.
+
+   [RFC1981]    McCann, J., Deering, S. and J. Mogul, "Path MTU
+                Discovery for IP version 6", RFC 1981, August 1996.
+
+   [Paxson96]   V. Paxson, "End-to-End Routing Behavior in the
+                Internet", IEEE/ACM Transactions on Networking (5),
+                pp.~601-615, Oct. 1997.
+
+   [RFC2525]    Paxon, V., Allman, M., Dawson, S., Fenner, W., Griner,
+                J., Heavens, I., Lahey, K., Semke, I. and B. Volz,
+                "Known TCP Implementation Problems", RFC 2525, March
+                1999.
+
+   [RFC879]     Postel, J., "The TCP Maximum Segment Size and Related
+                Topics", RFC 879, November 1983.
+
+   [RFC2001]    Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast
+                Retransmit, and Fast Recovery Algorithms", RFC 2001,
+                January 1997.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lahey                        Informational                     [Page 13]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+6. Author's Address
+
+   Kevin Lahey
+   dotRocket, Inc.
+   1901 S. Bascom Ave., Suite 300
+   Campbell, CA 95008
+   USA
+
+   Phone:  +1 408-371-8977 x115
+   email:  kml@dotrocket.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lahey                        Informational                     [Page 14]
+
+RFC 2923          TCP Problems with Path MTU Discovery    September 2000
+
+
+7.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lahey                        Informational                     [Page 15]
+
--- a/kernel/picotcp/RFC/rfc2988.txt
+++ b/kernel/picotcp/RFC/rfc2988.txt
@ -0,0 +1,451 @@
+
+
+
+
+
+
+Network Working Group                                          V. Paxson
+Request for Comments: 2988                                         ACIRI
+Category: Standards Track                                      M. Allman
+                                                            NASA GRC/BBN
+                                                           November 2000
+
+
+                  Computing TCP's Retransmission Timer
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+Abstract
+
+   This document defines the standard algorithm that Transmission
+   Control Protocol (TCP) senders are required to use to compute and
+   manage their retransmission timer.  It expands on the discussion in
+   section 4.2.3.1 of RFC 1122 and upgrades the requirement of
+   supporting the algorithm from a SHOULD to a MUST.
+
+1   Introduction
+
+   The Transmission Control Protocol (TCP) [Pos81] uses a retransmission
+   timer to ensure data delivery in the absence of any feedback from the
+   remote data receiver.  The duration of this timer is referred to as
+   RTO (retransmission timeout).  RFC 1122 [Bra89] specifies that the
+   RTO should be calculated as outlined in [Jac88].
+
+   This document codifies the algorithm for setting the RTO.  In
+   addition, this document expands on the discussion in section 4.2.3.1
+   of RFC 1122 and upgrades the requirement of supporting the algorithm
+   from a SHOULD to a MUST.  RFC 2581 [APS99] outlines the algorithm TCP
+   uses to begin sending after the RTO expires and a retransmission is
+   sent.  This document does not alter the behavior outlined in RFC 2581
+   [APS99].
+
+
+
+
+
+
+
+Paxson & Allman             Standards Track                     [Page 1]
+
+RFC 2988          Computing TCP's Retransmission Timer     November 2000
+
+
+   In some situations it may be beneficial for a TCP sender to be more
+   conservative than the algorithms detailed in this document allow.
+   However, a TCP MUST NOT be more aggressive than the following
+   algorithms allow.
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in [Bra97].
+
+2   The Basic Algorithm
+
+   To compute the current RTO, a TCP sender maintains two state
+   variables, SRTT (smoothed round-trip time) and RTTVAR (round-trip
+   time variation).  In addition, we assume a clock granularity of G
+   seconds.
+
+   The rules governing the computation of SRTT, RTTVAR, and RTO are as
+   follows:
+
+   (2.1) Until a round-trip time (RTT) measurement has been made for a
+         segment sent between the sender and receiver, the sender SHOULD
+         set RTO <- 3 seconds (per RFC 1122 [Bra89]), though the
+         "backing off" on repeated retransmission discussed in (5.5)
+         still applies.
+
+            Note that some implementations may use a "heartbeat" timer
+            that in fact yield a value between 2.5 seconds and 3
+            seconds.  Accordingly, a lower bound of 2.5 seconds is also
+            acceptable, providing that the timer will never expire
+            faster than 2.5 seconds.  Implementations using a heartbeat
+            timer with a granularity of G SHOULD not set the timer below
+            2.5 + G seconds.
+
+   (2.2) When the first RTT measurement R is made, the host MUST set
+
+            SRTT <- R
+            RTTVAR <- R/2
+            RTO <- SRTT + max (G, K*RTTVAR)
+
+         where K = 4.
+
+   (2.3) When a subsequent RTT measurement R' is made, a host MUST set
+
+            RTTVAR <- (1 - beta) * RTTVAR + beta * |SRTT - R'|
+            SRTT <- (1 - alpha) * SRTT + alpha * R'
+
+
+
+
+
+
+Paxson & Allman             Standards Track                     [Page 2]
+
+RFC 2988          Computing TCP's Retransmission Timer     November 2000
+
+
+         The value of SRTT used in the update to RTTVAR is its value
+         before updating SRTT itself using the second assignment.  That
+         is, updating RTTVAR and SRTT MUST be computed in the above
+         order.
+
+         The above SHOULD be computed using alpha=1/8 and beta=1/4 (as
+         suggested in [JK88]).
+
+         After the computation, a host MUST update
+         RTO <- SRTT + max (G, K*RTTVAR)
+
+   (2.4) Whenever RTO is computed, if it is less than 1 second then the
+         RTO SHOULD be rounded up to 1 second.
+
+         Traditionally, TCP implementations use coarse grain clocks to
+         measure the RTT and trigger the RTO, which imposes a large
+         minimum value on the RTO.  Research suggests that a large
+         minimum RTO is needed to keep TCP conservative and avoid
+         spurious retransmissions [AP99].  Therefore, this
+         specification requires a large minimum RTO as a conservative
+         approach, while at the same time acknowledging that at some
+         future point, research may show that a smaller minimum RTO is
+         acceptable or superior.
+
+   (2.5) A maximum value MAY be placed on RTO provided it is at least 60
+         seconds.
+
+3   Taking RTT Samples
+
+   TCP MUST use Karn's algorithm [KP87] for taking RTT samples.  That
+   is, RTT samples MUST NOT be made using segments that were
+   retransmitted (and thus for which it is ambiguous whether the reply
+   was for the first instance of the packet or a later instance).  The
+   only case when TCP can safely take RTT samples from retransmitted
+   segments is when the TCP timestamp option [JBB92] is employed, since
+   the timestamp option removes the ambiguity regarding which instance
+   of the data segment triggered the acknowledgment.
+
+   Traditionally, TCP implementations have taken one RTT measurement at
+   a time (typically once per RTT).  However, when using the timestamp
+   option, each ACK can be used as an RTT sample.  RFC 1323 [JBB92]
+   suggests that TCP connections utilizing large congestion windows
+   should take many RTT samples per window of data to avoid aliasing
+   effects in the estimated RTT.  A TCP implementation MUST take at
+   least one RTT measurement per RTT (unless that is not possible per
+   Karn's algorithm).
+
+
+
+
+
+Paxson & Allman             Standards Track                     [Page 3]
+
+RFC 2988          Computing TCP's Retransmission Timer     November 2000
+
+
+   For fairly modest congestion window sizes research suggests that
+   timing each segment does not lead to a better RTT estimator [AP99].
+   Additionally, when multiple samples are taken per RTT the alpha and
+   beta defined in section 2 may keep an inadequate RTT history.  A
+   method for changing these constants is currently an open research
+   question.
+
+4   Clock Granularity
+
+   There is no requirement for the clock granularity G used for
+   computing RTT measurements and the different state variables.
+   However, if the K*RTTVAR term in the RTO calculation equals zero,
+   the variance term MUST be rounded to G seconds (i.e., use the
+   equation given in step 2.3).
+
+       RTO <- SRTT + max (G, K*RTTVAR)
+
+   Experience has shown that finer clock granularities (<= 100 msec)
+   perform somewhat better than more coarse granularities.
+
+   Note that [Jac88] outlines several clever tricks that can be used to
+   obtain better precision from coarse granularity timers.  These
+   changes are widely implemented in current TCP implementations.
+
+5   Managing the RTO Timer
+
+   An implementation MUST manage the retransmission timer(s) in such a
+   way that a segment is never retransmitted too early, i.e. less than
+   one RTO after the previous transmission of that segment.
+
+   The following is the RECOMMENDED algorithm for managing the
+   retransmission timer:
+
+   (5.1) Every time a packet containing data is sent (including a
+         retransmission), if the timer is not running, start it running
+         so that it will expire after RTO seconds (for the current value
+         of RTO).
+
+   (5.2) When all outstanding data has been acknowledged, turn off the
+         retransmission timer.
+
+   (5.3) When an ACK is received that acknowledges new data, restart the
+         retransmission timer so that it will expire after RTO seconds
+         (for the current value of RTO).
+
+
+
+
+
+
+
+Paxson & Allman             Standards Track                     [Page 4]
+
+RFC 2988          Computing TCP's Retransmission Timer     November 2000
+
+
+   When the retransmission timer expires, do the following:
+
+   (5.4) Retransmit the earliest segment that has not been acknowledged
+         by the TCP receiver.
+
+   (5.5) The host MUST set RTO <- RTO * 2 ("back off the timer").  The
+         maximum value discussed in (2.5) above may be used to provide an
+         upper bound to this doubling operation.
+
+   (5.6) Start the retransmission timer, such that it expires after RTO
+         seconds (for the value of RTO after the doubling operation
+         outlined in 5.5).
+
+   Note that after retransmitting, once a new RTT measurement is
+   obtained (which can only happen when new data has been sent and
+   acknowledged), the computations outlined in section 2 are performed,
+   including the computation of RTO, which may result in "collapsing"
+   RTO back down after it has been subject to exponential backoff
+   (rule 5.5).
+
+   Note that a TCP implementation MAY clear SRTT and RTTVAR after
+   backing off the timer multiple times as it is likely that the
+   current SRTT and RTTVAR are bogus in this situation.  Once SRTT and
+   RTTVAR are cleared they should be initialized with the next RTT
+   sample taken per (2.2) rather than using (2.3).
+
+6   Security Considerations
+
+   This document requires a TCP to wait for a given interval before
+   retransmitting an unacknowledged segment.  An attacker could cause a
+   TCP sender to compute a large value of RTO by adding delay to a
+   timed packet's latency, or that of its acknowledgment.  However,
+   the ability to add delay to a packet's latency often coincides with
+   the ability to cause the packet to be lost, so it is difficult to
+   see what an attacker might gain from such an attack that could cause
+   more damage than simply discarding some of the TCP connection's
+   packets.
+
+   The Internet to a considerable degree relies on the correct
+   implementation of the RTO algorithm (as well as those described in
+   RFC 2581) in order to preserve network stability and avoid
+   congestion collapse.  An attacker could cause TCP endpoints to
+   respond more aggressively in the face of congestion by forging
+   acknowledgments for segments before the receiver has actually
+   received the data, thus lowering RTO to an unsafe value.  But to do
+   so requires spoofing the acknowledgments correctly, which is
+   difficult unless the attacker can monitor traffic along the path
+   between the sender and the receiver.  In addition, even if the
+
+
+
+Paxson & Allman             Standards Track                     [Page 5]
+
+RFC 2988          Computing TCP's Retransmission Timer     November 2000
+
+
+   attacker can cause the sender's RTO to reach too small a value, it
+   appears the attacker cannot leverage this into much of an attack
+   (compared to the other damage they can do if they can spoof packets
+   belonging to the connection), since the sending TCP will still back
+   off its timer in the face of an incorrectly transmitted packet's
+   loss due to actual congestion.
+
+Acknowledgments
+
+   The RTO algorithm described in this memo was originated by Van
+   Jacobson in [Jac88].
+
+References
+
+   [AP99]  Allman, M. and V. Paxson, "On Estimating End-to-End Network
+           Path Properties", SIGCOMM 99.
+
+   [APS99] Allman, M., Paxson V. and W. Stevens, "TCP Congestion
+           Control", RFC 2581, April 1999.
+
+   [Bra89] Braden, R., "Requirements for Internet Hosts --
+           Communication Layers", STD 3, RFC 1122, October 1989.
+
+   [Bra97] Bradner, S., "Key words for use in RFCs to Indicate
+           Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer
+           Communication Review, vol. 18, no. 4, pp. 314-329, Aug.  1988.
+
+   [JK88]  Jacobson, V. and M. Karels, "Congestion Avoidance and
+           Control", ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
+
+   [KP87]  Karn, P. and C. Partridge, "Improving Round-Trip Time
+           Estimates in Reliable Transport Protocols", SIGCOMM 87.
+
+   [Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
+           September 1981.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Paxson & Allman             Standards Track                     [Page 6]
+
+RFC 2988          Computing TCP's Retransmission Timer     November 2000
+
+
+Author's Addresses
+
+   Vern Paxson
+   ACIRI / ICSI
+   1947 Center Street
+   Suite 600
+   Berkeley, CA 94704-1198
+
+   Phone: 510-666-2882
+   Fax:   510-643-7684
+   EMail: vern@aciri.org
+   http://www.aciri.org/vern/
+
+
+   Mark Allman
+   NASA Glenn Research Center/BBN Technologies
+   Lewis Field
+   21000 Brookpark Rd.  MS 54-2
+   Cleveland, OH  44135
+
+   Phone: 216-433-6586
+   Fax:   216-433-8705
+   EMail: mallman@grc.nasa.gov
+   http://roland.grc.nasa.gov/~mallman
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Paxson & Allman             Standards Track                     [Page 7]
+
+RFC 2988          Computing TCP's Retransmission Timer     November 2000
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Paxson & Allman             Standards Track                     [Page 8]
+
--- a/kernel/picotcp/RFC/rfc3042.txt
+++ b/kernel/picotcp/RFC/rfc3042.txt
@ -0,0 +1,507 @@
+
+
+
+
+
+
+Network Working Group                                          M. Allman
+Request for Comments: 3042                                  NASA GRC/BBN
+Category: Standards Track                                H. Balakrishnan
+                                                                     MIT
+                                                                S. Floyd
+                                                                   ACIRI
+                                                            January 2001
+
+
+          Enhancing TCP's Loss Recovery Using Limited Transmit
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2001).  All Rights Reserved.
+
+Abstract
+
+   This document proposes a new Transmission Control Protocol (TCP)
+   mechanism that can be used to more effectively recover lost segments
+   when a connection's congestion window is small, or when a large
+   number of segments are lost in a single transmission window.  The
+   "Limited Transmit" algorithm calls for sending a new data segment in
+   response to each of the first two duplicate acknowledgments that
+   arrive at the sender.  Transmitting these segments increases the
+   probability that TCP can recover from a single lost segment using the
+   fast retransmit algorithm, rather than using a costly retransmission
+   timeout.  Limited Transmit can be used both in conjunction with, and
+   in the absence of, the TCP selective acknowledgment (SACK) mechanism.
+
+1   Introduction
+
+   A number of researchers have observed that TCP's loss recovery
+   strategies do not work well when the congestion window at a TCP
+   sender is small.  This can happen, for instance, because there is
+   only a limited amount of data to send, or because of the limit
+   imposed by the receiver-advertised window, or because of the
+   constraints imposed by end-to-end congestion control over a
+   connection with a small bandwidth-delay product
+   [Riz96,Mor97,BPS+98,Bal98,LK98].  When a TCP detects a missing
+   segment, it enters a loss recovery phase using one of two methods.
+
+
+
+Allman, et al.              Standards Track                     [Page 1]
+
+RFC 3042              Enhancing TCP Loss Recovery           January 2001
+
+
+   First, if an acknowledgment (ACK) for a given segment is not received
+   in a certain amount of time a retransmission timeout occurs and the
+   segment is resent [RFC793,PA00].  Second, the "Fast Retransmit"
+   algorithm resends a segment when three duplicate ACKs arrive at the
+   sender [Jac88,RFC2581].  However, because duplicate ACKs from the
+   receiver are also triggered by packet reordering in the Internet, the
+   TCP sender waits for three duplicate ACKs in an attempt to
+   disambiguate segment loss from packet reordering.  Once in a loss
+   recovery phase, a number of techniques can be used to retransmit lost
+   segments, including slow start-based recovery or Fast Recovery
+   [RFC2581], NewReno [RFC2582], and loss recovery based on selective
+   acknowledgments (SACKs) [RFC2018,FF96].
+
+   TCP's retransmission timeout (RTO) is based on measured round-trip
+   times (RTT) between the sender and receiver, as specified in [PA00].
+   To prevent spurious retransmissions of segments that are only delayed
+   and not lost, the minimum RTO is conservatively chosen to be 1
+   second.  Therefore, it behooves TCP senders to detect and recover
+   from as many losses as possible without incurring a lengthy timeout
+   when the connection remains idle.  However, if not enough duplicate
+   ACKs arrive from the receiver, the Fast Retransmit algorithm is never
+   triggered---this situation occurs when the congestion window is small
+   or if a large number of segments in a window are lost.  For instance,
+   consider a congestion window (cwnd) of three segments.  If one
+   segment is dropped by the network, then at most two duplicate ACKs
+   will arrive at the sender.  Since three duplicate ACKs are required
+   to trigger Fast Retransmit, a timeout will be required to resend the
+   dropped packet.
+
+   [BPS+97] found that roughly 56% of retransmissions sent by a busy web
+   server were sent after the RTO expires, while only 44% were handled
+   by Fast Retransmit.  In addition, only 4% of the RTO-based
+   retransmissions could have been avoided with SACK, which of course
+   has to continue to disambiguate reordering from genuine loss.  In
+   contrast, using the technique outlined in this document and in
+   [Bal98], 25% of the RTO-based retransmissions in that dataset would
+   have likely been avoided.
+
+   The next section of this document outlines small changes to TCP
+   senders that will decrease the reliance on the retransmission timer,
+   and thereby improve TCP performance when Fast Retransmit is not
+   triggered.  These changes do not adversely affect the performance of
+   TCP nor interact adversely with other connections, in other
+   circumstances.
+
+
+
+
+
+
+
+Allman, et al.              Standards Track                     [Page 2]
+
+RFC 3042              Enhancing TCP Loss Recovery           January 2001
+
+
+1.1 Terminology
+
+   In this document, he key words "MUST", "MUST NOT", "REQUIRED",
+   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
+   AND "OPTIONAL" are to be interpreted as described in RFC 2119 [1] and
+   indicate requirement levels for protocols.
+
+2   The Limited Transmit Algorithm
+
+   When a TCP sender has previously unsent data queued for transmission
+   it SHOULD use the Limited Transmit algorithm, which calls for a TCP
+   sender to transmit new data upon the arrival of the first two
+   consecutive duplicate ACKs when the following conditions are
+   satisfied:
+
+     * The receiver's advertised window allows the transmission of the
+       segment.
+
+     * The amount of outstanding data would remain less than or equal
+       to the congestion window plus 2 segments.  In other words, the
+       sender can only send two segments beyond the congestion window
+       (cwnd).
+
+   The congestion window (cwnd) MUST NOT be changed when these new
+   segments are transmitted.  Assuming that these new segments and the
+   corresponding ACKs are not dropped, this procedure allows the sender
+   to infer loss using the standard Fast Retransmit threshold of three
+   duplicate ACKs [RFC2581].  This is more robust to reordered packets
+   than if an old packet were retransmitted on the first or second
+   duplicate ACK.
+
+   Note: If the connection is using selective acknowledgments [RFC2018],
+   the data sender MUST NOT send new segments in response to duplicate
+   ACKs that contain no new SACK information, as a misbehaving receiver
+   can generate such ACKs to trigger inappropriate transmission of data
+   segments.  See [SCWA99] for a discussion of attacks by misbehaving
+   receivers.
+
+   Limited Transmit follows the "conservation of packets" congestion
+   control principle [Jac88].  Each of the first two duplicate ACKs
+   indicate that a segment has left the network.  Furthermore, the
+   sender has not yet decided that a segment has been dropped and
+   therefore has no reason to assume that the current congestion control
+   state is inaccurate.  Therefore, transmitting segments does not
+   deviate from the spirit of TCP's congestion control principles.
+
+
+
+
+
+
+Allman, et al.              Standards Track                     [Page 3]
+
+RFC 3042              Enhancing TCP Loss Recovery           January 2001
+
+
+   [BPS99] shows that packet reordering is not a rare network event.
+   [RFC2581] does not provide for sending of data on the first two
+   duplicate ACKs that arrive at the sender.  This causes a burst of
+   segments to be sent when an ACK for new data does arrive following
+   packet reordering.  Using Limited Transmit, data packets will be
+   clocked out by incoming ACKs and therefore transmission will not be
+   as bursty.
+
+   Note: Limited Transmit is implemented in the ns simulator [NS].
+   Researchers wishing to investigate this mechanism further can do so
+   by enabling "singledup_" for the given TCP connection.
+
+3   Related Work
+
+   Deployment of Explicit Congestion Notification (ECN) [Flo94,RFC2481]
+   may benefit connections with small congestion window sizes [SA00].
+   ECN provides a method for indicating congestion to the end-host
+   without dropping segments.  While some segment drops may still occur,
+   ECN may allow TCP to perform better with small congestion window
+   sizes because the sender can avoid many of the Fast Retransmits and
+   Retransmit Timeouts that would otherwise have been needed to detect
+   dropped segments [SA00].
+
+   When ECN-enabled TCP traffic competes with non-ECN-enabled TCP
+   traffic, ECN-enabled traffic can receive up to 30% higher goodput.
+   For bulk transfers, the relative performance benefit of ECN is
+   greatest when on average each flow has 3-4 outstanding packets during
+   each round-trip time [ZQ00].  This should be a good estimate for the
+   performance impact of a flow using Limited Transmit, since both ECN
+   and Limited Transmit reduce the reliance on the retransmission timer
+   for signaling congestion.
+
+   The Rate-Halving congestion control algorithm [MSML99] uses a form of
+   limited transmit, as it calls for transmitting a data segment on
+   every second duplicate ACK that arrives at the sender.  The algorithm
+   decouples the decision of what to send from the decision of when to
+   send.  However, similar to Limited Transmit the algorithm will always
+   send a new data segment on the second duplicate ACK that arrives at
+   the sender.
+
+4   Security Considerations
+
+   The additional security implications of the changes proposed in this
+   document, compared to TCP's current vulnerabilities, are minimal.
+   The potential security issues come from the subversion of end-to-end
+   congestion control from "false" duplicate ACKs, where a "false"
+   duplicate ACK is a duplicate ACK that does not actually acknowledge
+   new data received at the TCP receiver.  False duplicate ACKs could
+
+
+
+Allman, et al.              Standards Track                     [Page 4]
+
+RFC 3042              Enhancing TCP Loss Recovery           January 2001
+
+
+   result from duplicate ACKs that are themselves duplicated in the
+   network, or from misbehaving TCP receivers that send false duplicate
+   ACKs to subvert end-to-end congestion control [SCWA99,RFC2581].
+
+   When the TCP data receiver has agreed to use the SACK option, the TCP
+   data sender has fairly strong protection against false duplicate
+   ACKs.  In particular, with SACK, a duplicate ACK that acknowledges
+   new data arriving at the receiver reports the sequence numbers of
+   that new data.  Thus, with SACK, the TCP sender can verify that an
+   arriving duplicate ACK acknowledges data that the TCP sender has
+   actually sent, and for which no previous acknowledgment has been
+   received, before sending new data as a result of that acknowledgment.
+   For further protection, the TCP sender could keep a record of packet
+   boundaries for transmitted data packets, and recognize at most one
+   valid acknowledgment for each packet (e.g., the first acknowledgment
+   acknowledging the receipt of all of the sequence numbers in that
+   packet).
+
+   One could imagine some limited protection against false duplicate
+   ACKs for a non-SACK TCP connection, where the TCP sender keeps a
+   record of the number of packets transmitted, and recognizes at most
+   one acknowledgment per packet to be used for triggering the sending
+   of new data.  However, this accounting of packets transmitted and
+   acknowledged would require additional state and extra complexity at
+   the TCP sender, and does not seem necessary.
+
+   The most important protection against false duplicate ACKs comes from
+   the limited potential of duplicate ACKs in subverting end-to-end
+   congestion control.  There are two separate cases to consider: when
+   the TCP sender receives less than a threshold number of duplicate
+   ACKs, and when the TCP sender receives at least a threshold number of
+   duplicate ACKs.  In the latter case a TCP with Limited Transmit will
+   behave essentially the same as a TCP without Limited Transmit in that
+   the congestion window will be halved and a loss recovery period will
+   be initiated.
+
+   When a TCP sender receives less than a threshold number of duplicate
+   ACKs a misbehaving receiver could send two duplicate ACKs after each
+   regular ACK.  One might imagine that the TCP sender would send at
+   three times its allowed sending rate.  However, using Limited
+   Transmit as outlined in section 2 the sender is only allowed to
+   exceed the congestion window by less than the duplicate ACK threshold
+   (of three segments), and thus would not send a new packet for each
+   duplicate ACK received.
+
+
+
+
+
+
+
+Allman, et al.              Standards Track                     [Page 5]
+
+RFC 3042              Enhancing TCP Loss Recovery           January 2001
+
+
+Acknowledgments
+
+   Bill Fenner, Jamshid Mahdavi and the Transport Area Working Group
+   provided valuable feedback on an early version of this document.
+
+References
+
+   [Bal98]   Hari Balakrishnan.  Challenges to Reliable Data Transport
+             over Heterogeneous Wireless Networks.  Ph.D. Thesis,
+             University of California at Berkeley, August 1998.
+
+   [BPS+97]  Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan,
+             Mark Stemm, and Randy Katz.  TCP Behavior of a Busy Web
+             Server:  Analysis and Improvements.  Technical Report
+             UCB/CSD-97-966, August 1997.  Available from
+             http://nms.lcs.mit.edu/~hari/papers/csd-97-966.ps.  (Also
+             in Proc. IEEE INFOCOM Conf., San Francisco, CA, March
+             1998.)
+
+   [BPS99]   Jon Bennett, Craig Partridge, Nicholas Shectman.  Packet
+             Reordering is Not Pathological Network Behavior.  IEEE/ACM
+             Transactions on Networking, December 1999.
+
+   [FF96]    Kevin Fall, Sally Floyd.  Simulation-based Comparisons of
+             Tahoe, Reno, and SACK TCP.  ACM Computer Communication
+             Review, July 1996.
+
+   [Flo94]   Sally Floyd.  TCP and Explicit Congestion Notification.
+             ACM Computer Communication Review, October 1994.
+
+   [Jac88]   Van Jacobson.  Congestion Avoidance and Control.  ACM
+             SIGCOMM 1988.
+
+   [LK98]    Dong Lin, H.T. Kung.  TCP Fast Recovery Strategies:
+             Analysis and Improvements.  Proceedings of InfoCom, March
+             1998.
+
+   [MSML99]  Matt Mathis, Jeff Semke, Jamshid Mahdavi, Kevin Lahey.  The
+             Rate Halving Algorithm, 1999. URL:
+             http://www.psc.edu/networking/rate_halving.html.
+
+   [Mor97]   Robert Morris.  TCP Behavior with Many Flows.  Proceedings
+             of the Fifth IEEE International Conference on Network
+             Protocols.  October 1997.
+
+   [NS]      Ns network simulator.  URL: http://www.isi.edu/nsnam/.
+
+
+
+
+
+Allman, et al.              Standards Track                     [Page 6]
+
+RFC 3042              Enhancing TCP Loss Recovery           January 2001
+
+
+   [PA00]    Paxson, V. and M. Allman, "Computing TCP's Retransmission
+             Timer", RFC 2988, November 2000.
+
+   [Riz96]   Luigi Rizzo.  Issues in the Implementation of Selective
+             Acknowledgments for TCP.  January, 1996.  URL:
+             http://www.iet.unipi.it/~luigi/selack.ps
+
+   [SA00]    Hadi Salim, J. and U. Ahmed, "Performance Evaluation of
+             Explicit Congestion Notification (ECN) in IP Networks", RFC
+             2884, July 2000.
+
+   [SCWA99]  Stefan Savage, Neal Cardwell, David Wetherall, Tom
+             Anderson.  TCP Congestion Control with a Misbehaving
+             Receiver.  ACM Computer Communications Review, October
+             1999.
+
+   [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
+             793, September 1981.
+
+   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
+             Selective Acknowledgement Options", RFC 2018, October 1996.
+
+   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+             Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC2481] Ramakrishnan, K. and S. Floyd, "A Proposal to Add Explicit
+             Congestion Notification (ECN) to IP", RFC 2481, January
+             1999.
+
+   [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+             Control", RFC 2581, April 1999.
+
+   [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to
+             TCP's Fast Recovery Algorithm", RFC 2582, April 1999.
+
+   [ZQ00]    Yin Zhang and Lili Qiu, Understanding the End-to-End
+             Performance Impact of RED in a Heterogeneous Environment,
+             Cornell CS Technical Report 2000-1802, July 2000.  URL
+             http://www.cs.cornell.edu/yzhang/papers.htm.
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et al.              Standards Track                     [Page 7]
+
+RFC 3042              Enhancing TCP Loss Recovery           January 2001
+
+
+Authors' Addresses
+
+   Mark Allman
+   NASA Glenn Research Center/BBN Technologies
+   Lewis Field
+   21000 Brookpark Rd.  MS 54-5
+   Cleveland, OH  44135
+
+   Phone: +1-216-433-6586
+   Fax:   +1-216-433-8705
+   EMail: mallman@grc.nasa.gov
+   http://roland.grc.nasa.gov/~mallman
+
+
+   Hari Balakrishnan
+   Laboratory for Computer Science
+   545 Technology Square
+   Massachusetts Institute of Technology
+   Cambridge, MA 02139
+
+   EMail: hari@lcs.mit.edu
+   http://nms.lcs.mit.edu/~hari/
+
+
+   Sally Floyd
+   AT&T Center for Internet Research at ICSI (ACIRI)
+   1947 Center St, Suite 600
+   Berkeley, CA 94704
+
+   Phone: +1-510-666-2989
+   EMail: floyd@aciri.org
+   http://www.aciri.org/floyd/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et al.              Standards Track                     [Page 8]
+
+RFC 3042              Enhancing TCP Loss Recovery           January 2001
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2001).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et al.              Standards Track                     [Page 9]
+
--- a/kernel/picotcp/RFC/rfc3124.txt
+++ b/kernel/picotcp/RFC/rfc3124.txt
--- a/kernel/picotcp/RFC/rfc3135.txt
+++ b/kernel/picotcp/RFC/rfc3135.txt
--- a/kernel/picotcp/RFC/rfc3150.txt
+++ b/kernel/picotcp/RFC/rfc3150.txt
@ -0,0 +1,955 @@
+
+
+
+
+
+
+Network Working Group                                         S. Dawkins
+Request for Comments: 3150                                 G. Montenegro
+BCP: 48                                                         M . Kojo
+Category: Best Current Practice                                V. Magret
+                                                               July 2001
+
+
+           End-to-end Performance Implications of Slow Links
+
+Status of this Memo
+
+   This document specifies an Internet Best Current Practices for the
+   Internet Community, and requests discussion and suggestions for
+   improvements.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2001).  All Rights Reserved.
+
+Abstract
+
+   This document makes performance-related recommendations for users of
+   network paths that traverse "very low bit-rate" links.
+
+   "Very low bit-rate" implies "slower than we would like".  This
+   recommendation may be useful in any network where hosts can saturate
+   available bandwidth, but the design space for this recommendation
+   explicitly includes connections that traverse 56 Kb/second modem
+   links or 4.8 Kb/second wireless access links - both of which are
+   widely deployed.
+
+   This document discusses general-purpose mechanisms.  Where
+   application-specific mechanisms can outperform the relevant general-
+   purpose mechanism, we point this out and explain why.
+
+   This document has some recommendations in common with RFC 2689,
+   "Providing integrated services over low-bitrate links", especially in
+   areas like header compression.  This document focuses more on
+   traditional data applications for which "best-effort delivery" is
+   appropriate.
+
+
+
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 1]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+Table of Contents
+
+   1.0 Introduction .................................................  2
+   2.0 Description of Optimizations .................................  3
+           2.1 Header Compression Alternatives ......................  3
+           2.2 Payload Compression Alternatives .....................  5
+           2.3 Choosing MTU sizes ...................................  5
+           2.4 Interactions with TCP Congestion Control [RFC2581] ...  6
+           2.5 TCP Buffer Auto-tuning ...............................  9
+           2.6 Small Window Effects ................................. 10
+   3.0 Summary of Recommended Optimizations ......................... 10
+   4.0 Topics For Further Work ...................................... 12
+   5.0 Security Considerations ...................................... 12
+   6.0 IANA Considerations .......................................... 13
+   7.0 Acknowledgements ............................................. 13
+   8.0 References ................................................... 13
+   Authors' Addresses ............................................... 16
+   Full Copyright Statement ......................................... 17
+
+1.0 Introduction
+
+   The Internet protocol stack was designed to operate in a wide range
+   of link speeds, and has met this design goal with only a limited
+   number of enhancements (for example, the use of TCP window scaling as
+   described in "TCP Extensions for High Performance" [RFC1323] for
+   very-high-bandwidth connections).
+
+   Pre-World Wide Web application protocols tended to be either
+   interactive applications sending very little data (e.g., Telnet) or
+   bulk transfer applications that did not require interactive response
+   (e.g., File Transfer Protocol, Network News).  The World Wide Web has
+   given us traffic that is both interactive and often "bulky",
+   including images, sound, and video.
+
+   The World Wide Web has also popularized the Internet, so that there
+   is significant interest in accessing the Internet over link speeds
+   that are much "slower" than typical office network speeds.  In fact,
+   a significant proportion of the current Internet users is connected
+   to the Internet over a relatively slow last-hop link.  In future, the
+   number of such users is likely to increase rapidly as various mobile
+   devices are foreseen to to be attached to the Internet over slow
+   wireless links.
+
+   In order to provide the best interactive response for these "bulky"
+   transfers, implementors may wish to minimize the number of bits
+   actually transmitted over these "slow" connections.  There are two
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 2]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   areas that can be considered - compressing the bits that make up the
+   overhead associated with the connection, and compressing the bits
+   that make up the payload being transported over the connection.
+
+   In addition, implementors may wish to consider TCP receive window
+   settings and queuing mechanisms as techniques to improve performance
+   over low-speed links.  While these techniques do not involve protocol
+   changes, they are included in this document for completeness.
+
+2.0 Description of Optimizations
+
+   This section describes optimizations which have been suggested for
+   use in situations where hosts can saturate their links.  The next
+   section summarizes recommendations about the use of these
+   optimizations.
+
+2.1 Header Compression Alternatives
+
+   Mechanisms for TCP and IP header compression defined in [RFC1144,
+   RFC2507, RFC2508, RFC2509, RFC3095] provide the following benefits:
+
+      -  Improve interactive response time
+
+      -  Decrease header overhead (for a typical dialup MTU of 296
+         bytes, the overhead of TCP/IP headers can decrease from about
+         13 percent with typical 40-byte headers to 1-1.5 percent with
+         with 3-5 byte compressed headers, for most packets).  This
+         enables use of small packets for delay-sensitive low data-rate
+         traffic and good line efficiency for bulk data even with small
+         segment sizes (for reasons to use a small MTU on slow links,
+         see section 2.3)
+
+      -  Many slow links today are wireless and tend to be significantly
+         lossy.  Header compression reduces packet loss rate over lossy
+         links (simply because shorter transmission times expose packets
+         to fewer events that cause loss).
+
+   [RFC1144] header compression is a Proposed Standard for TCP Header
+   compression that is widely deployed.  Unfortunately it is vulnerable
+   on lossy links, because even a single bit error results in loss of
+   synchronization between the compressor and decompressor.  It uses TCP
+   timeouts to detect a loss of such synchronization, but these errors
+   result in loss of data (up to a full TCP window), delay of a full
+   RTO, and unnecessary slow-start.
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 3]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   A more recent header compression proposal [RFC2507] includes an
+   explicit request for retransmission of an uncompressed packet to
+   allow resynchronization without waiting for a TCP timeout (and
+   executing congestion avoidance procedures).  This works much better
+   on links with lossy characteristics.
+
+   The above scheme ceases to perform well under conditions as extreme
+   as those of many cellular links (error conditions of 1e-3 or 1e-2 and
+   round trip times over 100 ms.).  For these cases, the 'Robust Header
+   Compression' working group has developed ROHC [RFC3095].  Extensions
+   of ROHC to support compression of TCP headers are also under
+   development.
+
+   [RFC1323] defines a "TCP Timestamp" option, used to prevent
+   "wrapping" of the TCP sequence number space on high-speed links, and
+   to improve TCP RTT estimates by providing unambiguous TCP roundtrip
+   timings.  Use of TCP timestamps prevents header compression, because
+   the timestamps are sent as TCP options.  This means that each
+   timestamped header has TCP options that differ from the previous
+   header, and headers with changed TCP options are always sent
+   uncompressed.  In addition, timestamps do not seem to have much of an
+   impact on RTO estimation [AlPa99].
+
+   Nevertheless, the ROHC working group is developing schemes to
+   compress TCP headers, including options such as timestamps and
+   selective acknowledgements.
+
+   Recommendation: Implement [RFC2507], in particular as it relates to
+   IPv4 tunnels and Minimal Encapsulation for Mobile IP, as well as TCP
+   header compression for lossy links and links that reorder packets.
+   PPP capable devices should implement "IP Header Compression over PPP"
+   [RFC2509].  Robust Header Compression [RFC3095] is recommended for
+   extremely slow links with very high error rates (see above), but
+   implementors should judge if its complexity is justified (perhaps by
+   the cost of the radio frequency resources).
+
+   [RFC1144] header compression should only be enabled when operating
+   over reliable "slow" links.
+
+   Use of TCP Timestamps [RFC1323] is not recommended with these
+   connections, because it complicates header compression.  Even though
+   the Robust Header Compression (ROHC) working group is developing
+   specifications to remedy this, those mechanisms are not yet fully
+   developed nor deployed, and may not be generally justifiable.
+   Furthermore, connections traversing "slow" links do not require
+   protection against TCP sequence-number wrapping.
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 4]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+2.2 Payload Compression Alternatives
+
+   Compression of IP payloads is also desirable on "slow" network links.
+   "IP Payload Compression Protocol (IPComp)" [RFC2393] defines a
+   framework where common compression algorithms can be applied to
+   arbitrary IP segment payloads.
+
+   IP payload compression is something of a niche optimization.  It is
+   necessary because IP-level security converts IP payloads to random
+   bitstreams, defeating commonly-deployed link-layer compression
+   mechanisms which are faced with payloads that have no redundant
+   "information" that can be more compactly represented.
+
+   However, many IP payloads are already compressed (images, audio,
+   video, "zipped" files being transferred), or are already encrypted
+   above the IP layer (e.g., SSL [SSL]/TLS [RFC2246]).  These payloads
+   will not "compress" further, limiting the benefit of this
+   optimization.
+
+   For uncompressed HTTP payload types, HTTP/1.1 [RFC2616] also includes
+   Content-Encoding and Accept-Encoding headers, supporting a variety of
+   compression algorithms for common compressible MIME types like
+   text/plain.  This leaves only the HTTP headers themselves
+   uncompressed.
+
+   In general, application-level compression can often outperform
+   IPComp, because of the opportunity to use compression dictionaries
+   based on knowledge of the specific data being compressed.
+
+   Extensive use of application-level compression techniques will reduce
+   the need for IPComp, especially for WWW users.
+
+   Recommendation: IPComp may optionally be implemented.
+
+2.3 Choosing MTU Sizes
+
+   There are several points to keep in mind when choosing an MTU for
+   low-speed links.
+
+   First, if a full-length MTU occupies a link for longer than the
+   delayed ACK timeout (typically 200 milliseconds, but may be up to 500
+   milliseconds), this timeout will cause an ACK to be generated for
+   every segment, rather than every second segment, as occurs with most
+   implementations of the TCP delayed ACK algorithm.
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 5]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   Second, "relatively large" MTUs, which take human-perceptible amounts
+   of time to be transmitted into the network, create human-perceptible
+   delays in other flows using the same link.  [RFC1144] considers
+   100-200 millisecond delays as human-perceptible.  The convention of
+   choosing 296-byte MTUs (with header compression enabled) for dialup
+   access is a compromise that limits the maximum link occupancy delay
+   with full-length MTUs close to 200 milliseconds on 9.6 Kb/second
+   links.
+
+   Third, on last-hop links using a larger link MTU size, and therefore
+   larger MSS, would allow a TCP sender to increase its congestion
+   window faster in bytes than when using a smaller MTU size (and a
+   smaller MSS).  However, with a smaller MTU size, and a smaller MSS
+   size, the congestion window, when measured in segments, increases
+   more quickly than it would with a larger MSS size.  Connections using
+   smaller MSS sizes are more likely to be able to send enough segments
+   to generate three duplicate acknowledgements, triggering fast
+   retransmit/fast recovery when packet losses are encountered.  Hence,
+   a smaller MTU size is useful for slow links with lossy
+   characteristics.
+
+   Fourth, using a smaller MTU size also decreases the queuing delay of
+   a TCP flow (and thereby RTT) compared to use of larger MTU size with
+   the same number of packets in a queue.  This means that a TCP flow
+   using a smaller segment size and traversing a slow link is able to
+   inflate the congestion window (in number of segments) to a larger
+   value while experiencing the same queuing delay.
+
+   Finally, some networks charge for traffic on a per-packet basis, not
+   on a per-kilobyte basis.  In these cases, connections using a larger
+   MTU may be charged less than connections transferring the same number
+   of bytes using a smaller MTU.
+
+   Recommendation: If it is possible to do so, MTUs should be chosen
+   that do not monopolize network interfaces for human-perceptible
+   amounts of time, and implementors should not chose MTUs that will
+   occupy a network interface for significantly more than 100-200
+   milliseconds.
+
+2.4 Interactions with TCP Congestion Control [RFC2581]
+
+   In many cases, TCP connections that traverse slow links have the slow
+   link as an "access" link, with higher-speed links in use for most of
+   the connection path.  One common configuration might be a laptop
+   computer using dialup access to a terminal server (a last-hop
+   router), with an HTTP server on a high-speed LAN "behind" the
+   terminal server.
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 6]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   In this case, the HTTP server may be able to place packets on its
+   directly-attached high-speed LAN at a higher rate than the last-hop
+   router can forward them on the low-speed link.  When the last-hop
+   router falls behind, it will be unable to buffer the traffic intended
+   for the low-speed link, and will become a point of congestion and
+   begin to drop the excess packets.  In particular, several packets may
+   be dropped in a single transmission window when initial slow start
+   overshoots the last-hop router buffer.
+
+   Although packet loss is occurring, it isn't detected at the TCP
+   sender until one RTT time after the router buffer space is exhausted
+   and the first packet is dropped.  This late congestion signal allows
+   the congestion window to increase up to double the size it was at the
+   time the first packet was dropped at the router.
+
+   If the link MTU is large enough to take more than the delayed ACK
+   timeout interval to transmit a packet, an ACK is sent for every
+   segment and the congestion window is doubled in a single RTT.  If a
+   smaller link MTU is in use and delayed ACKs can be utilized, the
+   congestion window increases by a factor of 1.5 in one RTT.  In both
+   cases the sender continues transmitting packets well beyond the
+   congestion point of the last-hop router, resulting in multiple packet
+   losses in a single window.
+
+   The self-clocking nature of TCP's slow start and congestion avoidance
+   algorithms prevent this buffer overrun from continuing.  In addition,
+   these algorithms allow senders to "probe" for available bandwidth -
+   cycling through an increasing rate of transmission until loss occurs,
+   followed by a dramatic (50-percent) drop in transmission rate.  This
+   happens when a host directly connected to a low-speed link offers an
+   advertised window that is unrealistically large for the low-speed
+   link.  During the congestion avoidance phase the peer host continues
+   to probe for available bandwidth, trying to fill the advertised
+   window, until packet loss occurs.
+
+   The same problems may also exist when a sending host is directly
+   connected to a slow link as most slow links have some local buffer in
+   the link interface.  This link interface buffer is subject to
+   overflow exactly in the same way as the last-hop router buffer.
+
+   When a last-hop router with a small number of buffers per outbound
+   link is used, the first buffer overflow occurs earlier than it would
+   if the router had a larger number of buffers.  Subsequently with a
+   smaller number of buffers the periodic packet losses occur more
+   frequently during congestion avoidance, when the sender probes for
+   available bandwidth.
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 7]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   The most important responsibility of router buffers is to absorb
+   bursts.  Too few buffers (for example, only three buffers per
+   outbound link as described in [RFC2416]) means that routers will
+   overflow their buffer pools very easily and are unlikely to absorb
+   even a very small burst.  When a larger number of router buffers are
+   allocated per outbound link, the buffer space does not overflow as
+   quickly but the buffers are still likely to become full due to TCP's
+   default behavior.  A larger number of router buffers leads to longer
+   queuing delays and a longer RTT.
+
+   If router queues become full before congestion is signaled or remain
+   full for long periods of time, this is likely to result in "lock-
+   out", where a single connection or a few connections occupy the
+   router queue space, preventing other connections from using the link
+   [RFC2309], especially when a tail drop queue management discipline is
+   being used.
+
+   Therefore, it is essential to have a large enough number of buffers
+   in routers to be able to absorb data bursts, but keep the queues
+   normally small.  In order to achieve this it has been recommended in
+   [RFC2309] that an active queue management mechanism, like Random
+   Early Detection (RED) [RED93], should be implemented in all Internet
+   routers, including the last-hop routers in front of a slow link.  It
+   should also be noted that RED requires a sufficiently large number of
+   router buffers to work properly.  In addition, the appropriate
+   parameters of RED on a last-hop router connected to a slow link will
+   likely deviate from the defaults recommended.
+
+   Active queue management mechanism do not eliminate packet drops but,
+   instead, drop packets at earlier stage to solve the full-queue
+   problem for flows that are responsive to packet drops as congestion
+   signal.  Hosts that are directly connected to low-speed links may
+   limit the receive windows they advertise in order to lower or
+   eliminate the number of packet drops in a last-hop router.  When
+   doing so one should, however, take care that the advertised window is
+   large enough to allow full utilization of the last-hop link capacity
+   and to allow triggering fast retransmit, when a packet loss is
+   encountered.  This recommendation takes two forms:
+
+   -  Modern operating systems use relatively large default TCP receive
+      buffers compared to what is required to fully utilize the link
+      capacity of low-speed links.  Users should be able to choose the
+      default receive window size in use - typically a system-wide
+      parameter.  (This "choice" may be as simple as "dial-up access/LAN
+      access" on a dialog box - this would accommodate many environments
+      without requiring hand-tuning by experienced network engineers.)
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 8]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   -  Application developers should not attempt to manually manage
+      network bandwidth using socket buffer sizes.  Only in very rare
+      circumstances will an application actually know both the bandwidth
+      and delay of a path and be able to choose a suitably low (or high)
+      value for the socket buffer size to obtain good network
+      performance.
+
+   This recommendation is not a general solution for any network path
+   that might involve a slow link.  Instead, this recommendation is
+   applicable in environments where the host "knows" it is always
+   connected to other hosts via "slow links".  For hosts that may
+   connect to other host over a variety of links (e.g., dial-up laptop
+   computers with LAN-connected docking stations), buffer auto-tuning
+   for the receive buffer is a more reasonable recommendation, and is
+   discussed below.
+
+2.5 TCP Buffer Auto-tuning
+
+   [SMM98] recognizes a tension between the desire to allocate "large"
+   TCP buffers, so that network paths are fully utilized, and a desire
+   to limit the amount of memory dedicated to TCP buffers, in order to
+   efficiently support large numbers of connections to hosts over
+   network paths that may vary by six orders of magnitude.
+
+   The technique proposed is to dynamically allocate TCP buffers, based
+   on the current congestion window, rather than attempting to
+   preallocate TCP buffers without any knowledge of the network path.
+
+   This proposal results in receive buffers that are appropriate for the
+   window sizes in use, and send buffers large enough to contain two
+   windows of segments, so that SACK and fast recovery can recover
+   losses without forcing the connection to use lengthy retransmission
+   timeouts.
+
+   While most of the motivation for this proposal is given from a
+   server's perspective, hosts that connect using multiple interfaces
+   with markedly-different link speeds may also find this kind of
+   technique useful.  This is true in particular with slow links, which
+   are likely to dominate the end-to-end RTT.  If the host is connected
+   only via a single slow link interface at a time, it is fairly easy to
+   (dynamically) adjust the receive window (and thus the advertised
+   window) to a value appropriate for the slow last-hop link with known
+   bandwidth and delay characteristics.
+
+   Recommendation: If a host is sometimes connected via a slow link but
+   the host is also connected using other interfaces with markedly-
+   different link speeds, it may use receive buffer auto-tuning to
+   adjust the advertised window to an appropriate value.
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 9]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+2.6 Small Window Effects
+
+   If a TCP connection stabilizes with a congestion window of only a few
+   segments (as could be expected on a "slow" link), the sender isn't
+   sending enough segments to generate three duplicate acknowledgements,
+   triggering fast retransmit and fast recovery.  This means that a
+   retransmission timeout is required to repair the loss - dropping the
+   TCP connection to a congestion window with only one segment.
+
+   [TCPB98] and [TCPF98] observe that (in studies of network trace
+   datasets) it is relatively common for TCP retransmission timeouts to
+   occur even when some duplicate acknowledgements are being sent.  The
+   challenge is to use these duplicate acknowledgements to trigger fast
+   retransmit/fast recovery without injecting traffic into the network
+   unnecessarily - and especially not injecting traffic in ways that
+   will result in instability.
+
+   The "Limited Transmit" algorithm [RFC3042] suggests sending a new
+   segment when the first and second duplicate acknowledgements are
+   received, so that the receiver is more likely to be able to continue
+   to generate duplicate acknowledgements until the TCP retransmit
+   threshold is reached, triggering fast retransmit and fast recovery.
+   When the congestion window is small, this is very useful in assisting
+   fast retransmit and fast recovery to recover from a packet loss
+   without using a retransmission timeout.  We note that a maximum of
+   two additional new segments will be sent before the receiver sends
+   either a new acknowledgement advancing the window or two additional
+   duplicate acknowledgements, triggering fast retransmit/fast recovery,
+   and that these new segments will be acknowledgement-clocked, not
+   back-to-back.
+
+   Recommendation: Limited Transmit should be implemented in all hosts.
+
+3.0 Summary of Recommended Optimizations
+
+   This section summarizes our recommendations regarding the previous
+   standards-track mechanisms, for end nodes that are connected via a
+   slow link.
+
+   Header compression should be implemented.  [RFC1144] header
+   compression can be enabled over robust network links.  [RFC2507]
+   should be used over network connections that are expected to
+   experience loss due to corruption as well as loss due to congestion.
+   For extremely lossy and slow links, implementors should evaluate ROHC
+   [RFC3095] as a potential solution.  [RFC1323] TCP timestamps must be
+   turned off because (1) their protection against TCP sequence number
+   wrapping is unjustified for slow links, and (2) they complicate TCP
+   header compression.
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 10]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   IP Payload Compression [RFC2393] should be implemented, although
+   compression at higher layers of the protocol stack (for example [RFC
+   2616]) may make this mechanism less useful.
+
+   For HTTP/1.1 environments, [RFC2616] payload compression should be
+   implemented and should be used for payloads that are not already
+   compressed.
+
+   Implementors should choose MTUs that don't monopolize network
+   interfaces for more than 100-200 milliseconds, in order to limit the
+   impact of a single connection on all other connections sharing the
+   network interface.
+
+   Use of active queue management is recommended on last-hop routers
+   that provide Internet access to host behind a slow link.  In
+   addition, number of router buffers per slow link should be large
+   enough to absorb concurrent data bursts from more than a single flow.
+   To absorb concurrent data bursts from two or three TCP senders with a
+   typical data burst of three back-to-back segments per sender, at
+   least six (6) or nine (9) buffers are needed.  Effective use of
+   active queue management is likely to require even larger number of
+   buffers.
+
+   Implementors should consider the possibility that a host will be
+   directly connected to a low-speed link when choosing default TCP
+   receive window sizes.
+
+   Application developers should not attempt to manually manage network
+   bandwidth using socket buffer sizes as only in very rare
+   circumstances an application will be able to choose a suitable value
+   for the socket buffer size to obtain good network performance.
+
+   Limited Transmit [RFC3042] should be implemented in all end hosts as
+   it assists in triggering fast retransmit when congestion window is
+   small.
+
+   All of the mechanisms described above are stable standards-track RFCs
+   (at Proposed Standard status, as of this writing).
+
+   In addition, implementors may wish to consider TCP buffer auto-
+   tuning, especially when the host system is likely to be used with a
+   wide variety of access link speeds.  This is not a standards-track
+   TCP mechanism but, as it is an operating system implementation issue,
+   it does not need to be standardized.
+
+   Of the above mechanisms, only Header Compression (for IP and TCP) may
+   cease to work in the presence of end-to-end IPSEC.  However,
+   [RFC3095] does allow compressing the ESP header.
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 11]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+4.0 Topics For Further Work
+
+   In addition to the standards-track mechanisms discussed above, there
+   are still opportunities to improve performance over low-speed links.
+
+   "Sending fewer bits" is an obvious response to slow link speeds.  The
+   now-defunct HTTP-NG proposal [HTTP-NG] replaced the text-based HTTP
+   header representation with a binary representation for compactness.
+   However, HTTP-NG is not moving forward and HTTP/1.1 is not being
+   enhanced to include a more compact HTTP header representation.
+   Instead, the Wireless Application Protocol (WAP) Forum has opted for
+   the XML-based Wireless Session Protocol [WSP], which includes a
+   compact header encoding mechanism.
+
+   It would be nice to agree on a more compact header representation
+   that will be used by all WWW communities, not only the wireless WAN
+   community.  Indeed, general XML content encodings have been proposed
+   [Millau], although they are not yet widely adopted.
+
+   We note that TCP options which change from segment to segment
+   effectively disable header compression schemes deployed today,
+   because there's no way to indicate that some fields in the header are
+   unchanged from the previous segment, while other fields are not.  The
+   Robust Header Compression working group is developing such schemes
+   for TCP options such as timestamps and selective acknowledgements.
+   Hopefully, documents subsequent to [RFC3095] will define such
+   specifications.
+
+   Another effort worth following is that of 'Delta Encoding'.  Here,
+   clients that request a slightly modified version of some previously
+   cached resource would receive a succinct description of the
+   differences, rather than the entire resource [HTTP-DELTA].
+
+5.0 Security Considerations
+
+   All recommendations included in this document are stable standards-
+   track RFCs (at Proposed Standard status, as of this writing) or
+   otherwise do not suggest any changes to any protocol.  With the
+   exception of Van Jacobson compression [RFC1144] and [RFC2507,
+   RFC2508, RFC2509], all other mechanisms are applicable to TCP
+   connections protected by end-to-end IPSec.  This includes ROHC
+   [RFC3095], albeit partially, because even though it can compress the
+   outermost ESP header to some extent, encryption still renders any
+   payload data uncompressible (including any subsequent protocol
+   headers).
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 12]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+6.0 IANA Considerations
+
+   This document is a pointer to other, existing IETF standards.  There
+   are no new IANA considerations.
+
+7.0 Acknowledgements
+
+   This recommendation has grown out of "Long Thin Networks" [RFC2757],
+   which in turn benefited from work done in the IETF TCPSAT working
+   group.
+
+8.0 References
+
+   [AlPa99]     Mark Allman and Vern Paxson, "On Estimating End-to-End
+                Network Path Properties", in ACM SIGCOMM 99 Proceedings,
+                1999.
+
+   [HTTP-DELTA] J. Mogul, et al., "Delta encoding in HTTP", Work in
+                Progress.
+
+   [HTTP-NG]    Mike Spreitzer, Bill Janssen, "HTTP 'Next Generation'",
+                9th International WWW Conference, May, 2000.  Also
+                available as:  http://www.www9.org/w9cdrom/60/60.html
+
+   [Millau]     Marc Girardot, Neel Sundaresan, "Millau: an encoding
+                format for efficient representation and exchange of XML
+                over the Web", 9th International WWW Conference, May,
+                2000.  Also available as:
+                http://www.www9.org/w9cdrom/154/154.html
+
+   [PAX97]      Paxson, V., "End-to-End Internet Packet Dynamics", 1997,
+                in SIGCOMM 97 Proceedings, available as:
+                http://www.acm.org/sigcomm/ccr/archive/ccr-toc/ccr-toc-
+                97.html
+
+   [RED93]      Floyd, S., and Jacobson, V., Random Early Detection
+                gateways for Congestion Avoidance, IEEE/ACM Transactions
+                on Networking, V.1 N.4, August 1993, pp. 397-413.  Also
+                available from http://ftp.ee.lbl.gov/floyd/red.html.
+
+   [RFC1144]    Jacobson, V., "Compressing TCP/IP Headers for Low-Speed
+                Serial Links", RFC 1144, February 1990.
+
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 13]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   [RFC1323]    Jacobson, V., Braden, R. and D. Borman, "TCP Extensions
+                for High Performance", RFC 1323, May 1992.
+
+   [RFC2246]    Dierks, T. and C. Allen, "The TLS Protocol: Version
+                1.0", RFC 2246, January 1999.
+
+   [RFC2309]    Braden, R., Clark, D., Crowcroft, J., Davie, B.,
+                Deering, S., Estrin, D., Floyd, S., Jacobson, V.,
+                Minshall, G., Partridge, C., Peterson, L., Ramakrishnan,
+                K., Shenker, S., Wroclawski, J. and L. Zhang,
+                "Recommendations on Queue Management and Congestion
+                Avoidance in the Internet", RFC 2309, April 1998.
+
+   [RFC2393]    Shacham, A., Monsour, R., Pereira, R. and M. Thomas, "IP
+                Payload Compression Protocol (IPComp)", RFC 2393,
+                December 1998.
+
+   [RFC2401]    Kent, S. and R. Atkinson, "Security Architecture for the
+                Internet Protocol", RFC 2401, November 1998.
+
+   [RFC2416]    Shepard, T. and C. Partridge, "When TCP Starts Up With
+                Four Packets Into Only Three Buffers", RFC 2416,
+                September 1998.
+
+   [RFC2507]    Degermark, M., Nordgren, B. and S. Pink, "IP Header
+                Compression", RFC 2507, February 1999.
+
+   [RFC2508]    Casner, S. and V. Jacobson. "Compressing IP/UDP/RTP
+                Headers for Low-Speed Serial Links", RFC 2508, February
+                1999.
+
+   [RFC2509]    Engan, M., Casner, S. and C. Bormann, "IP Header
+                Compression over PPP", RFC 2509, February 1999.
+
+   [RFC2581]    Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+                Control", RFC 2581, April 1999.
+
+   [RFC2616]    Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+                Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext
+                Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
+
+   [RFC2757]    Montenegro, G., Dawkins, S., Kojo, M., Magret, V., and
+                N. Vaidya, "Long Thin Networks", RFC 2757, January 2000.
+
+   [RFC3042]    Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing
+                TCP's Loss Recovery Using Limited Transmit", RFC 3042,
+                January 2001.
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 14]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+   [RFC3095]    Bormann, C., Burmeister, C., Degermark, M., Fukushima,
+                H., Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T.,
+                Le, K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro,
+                K., Wiebke, T., Yoshimura, T. and H. Zheng, "RObust
+                Header Compression (ROHC): Framework and four Profiles:
+                RTP, UDP ESP and uncompressed", RFC 3095, July 2001.
+
+   [SMM98]      Jeffrey Semke, Matthew Mathis, and Jamshid Mahdavi,
+                "Automatic TCP Buffer Tuning", in ACM SIGCOMM 98
+                Proceedings 1998. Available from
+                http://www.acm.org/sigcomm/sigcomm98/tp/abs_26.html.
+
+   [SSL]        Alan O. Freier, Philip Karlton, Paul C. Kocher, The SSL
+                Protocol: Version 3.0, March 1996.  (Expired Internet-
+                Draft, available from
+                http://home.netscape.com/eng/ssl3/ssl-toc.html)
+
+   [TCPB98]     Hari Balakrishnan, Venkata N. Padmanabhan, Srinivasan
+                Seshan, Mark Stemm, Randy H. Katz, "TCP Behavior of a
+                Busy Internet Server: Analysis and Improvements", IEEE
+                Infocom, March 1998. Available from:
+                http://www.cs.berkeley.edu/~hari/papers/infocom98.ps.gz
+
+   [TCPF98]     Dong Lin and H.T. Kung, "TCP Fast Recovery Strategies:
+                Analysis and Improvements", IEEE Infocom, March 1998.
+                Available from:
+                http://www.eecs.harvard.edu/networking/papers/ infocom-
+                tcp-final-198.pdf
+
+   [WSP]        Wireless Application Protocol Forum, "WAP Wireless
+                Session Protocol Specification", approved 4 May, 2000,
+                available from
+                http://www1.wapforum.org/tech/documents/WAP-203-WSP-
+                20000504-a.pdf.  (informative reference).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 15]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+Authors' Addresses
+
+   Questions about this document may be directed to:
+
+   Spencer Dawkins
+   Fujitsu Network Communications
+   2801 Telecom Parkway
+   Richardson, Texas 75082
+
+   Phone:  +1-972-479-3782
+   EMail: spencer.dawkins@fnc.fujitsu.com
+
+
+   Gabriel Montenegro
+   Sun Microsystems Laboratories, Europe
+   29, chemin du Vieux Chene
+   38240 Meylan, FRANCE
+
+   Phone:  +33 476 18 80 45
+   EMail: gab@sun.com
+
+
+   Markku Kojo
+   Department of Computer Science
+   University of Helsinki
+   P.O. Box 26 (Teollisuuskatu 23)
+   FIN-00014 HELSINKI
+   Finland
+
+   Phone: +358-9-1914-4179
+   Fax:   +358-9-1914-4441
+   EMail: kojo@cs.helsinki.fi
+
+
+   Vincent Magret
+   Alcatel Internetworking, Inc.
+   26801 W. Agoura road
+   Calabasas, CA, 91301
+
+   Phone: +1 818 878 4485
+   EMail: vincent.magret@alcatel.com
+
+
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 16]
+
+RFC 3150                   PILC - Slow Links                   July 2001
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2001).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 17]
+
--- a/kernel/picotcp/RFC/rfc3155.txt
+++ b/kernel/picotcp/RFC/rfc3155.txt
@ -0,0 +1,899 @@
+
+
+
+
+
+
+Network Working Group                                         S. Dawkins
+Request for Comments: 3155                                 G. Montenegro
+BCP: 50                                                          M. Kojo
+Category: Best Current Practice                                V. Magret
+                                                               N. Vaidya
+                                                             August 2001
+
+
+        End-to-end Performance Implications of Links with Errors
+
+Status of this Memo
+
+   This document specifies an Internet Best Current Practices for the
+   Internet Community, and requests discussion and suggestions for
+   improvements.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2001).  All Rights Reserved.
+
+Abstract
+
+   This document discusses the specific TCP mechanisms that are
+   problematic in environments with high uncorrected error rates, and
+   discusses what can be done to mitigate the problems without
+   introducing intermediate devices into the connection.
+
+Table of Contents
+
+   1.0 Introduction .............................................    2
+      1.1 Should you be reading this recommendation?  ...........    3
+      1.2 Relationship of this recommendation to PEPs ...........    4
+      1.3 Relationship of this recommendation to Link Layer
+          Mechanisms.............................................    4
+   2.0 Errors and Interactions with TCP Mechanisms ..............    5
+      2.1 Slow Start and Congestion Avoidance [RFC2581] .........    5
+      2.2 Fast Retransmit and Fast Recovery [RFC2581] ...........    6
+      2.3 Selective Acknowledgements [RFC2018, RFC2883] .........    7
+   3.0 Summary of Recommendations ...............................    8
+   4.0 Topics For Further Work ..................................    9
+      4.1 Achieving, and maintaining, large windows .............   10
+   5.0 Security Considerations ..................................   11
+   6.0 IANA Considerations ......................................   11
+   7.0 Acknowledgements .........................................   11
+   References ...................................................   11
+   Authors' Addresses ...........................................   14
+   Full Copyright Statement .....................................   16
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 1]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+1.0 Introduction
+
+   The rapidly-growing Internet is being accessed by an increasingly
+   wide range of devices over an increasingly wide variety of links.  At
+   least some of these links do not provide the degree of reliability
+   that hosts expect, and this expansion into unreliable links causes
+   some Internet protocols, especially TCP [RFC793], to perform poorly.
+
+   Specifically, TCP congestion control [RFC2581], while appropriate for
+   connections that lose traffic primarily because of congestion and
+   buffer exhaustion, interacts badly with uncorrected errors when TCP
+   connections traverse links with high uncorrected error rates.  The
+   result is that sending TCPs may spend an excessive amount of time
+   waiting for acknowledgement that do not arrive, and then, although
+   these losses are not due to congestion-related buffer exhaustion, the
+   sending TCP transmits at substantially reduced traffic levels as it
+   probes the network to determine "safe" traffic levels.
+
+   This document does not address issues with other transport protocols,
+   for example, UDP.
+
+   Congestion avoidance in the Internet is based on an assumption that
+   most packet losses are due to congestion.  TCP's congestion avoidance
+   strategy treats the absence of acknowledgement as a congestion
+   signal.  This has worked well since it was introduced in 1988 [VJ-
+   DCAC], because most links and subnets have relatively low error rates
+   in normal operation, and congestion is the primary cause of loss in
+   these environments.  However, links and subnets that do not enjoy low
+   uncorrected error rates are becoming more prevalent in parts of the
+   Internet.  In particular, these include terrestrial and satellite
+   wireless links.  Users relying on traffic traversing these links may
+   see poor performance because their TCP connections are spending
+   excessive time in congestion avoidance and/or slow start procedures
+   triggered by packet losses due to transmission errors.
+
+   The recommendations in this document aim at improving utilization of
+   available path capacity over such high error-rate links in ways that
+   do not threaten the stability of the Internet.
+
+   Applications use TCP in very different ways, and these have
+   interactions with TCP's behavior [RFC2861].  Nevertheless, it is
+   possible to make some basic assumptions about TCP flows.
+   Accordingly, the mechanisms discussed here are applicable to all uses
+   of TCP, albeit in varying degrees according to different scenarios
+   (as noted where appropriate).
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 2]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   This recommendation is based on the explicit assumption that major
+   changes to the entire installed base of routers and hosts are not a
+   practical possibility.  This constrains any changes to hosts that are
+   directly affected by errored links.
+
+1.1 Should you be reading this recommendation?
+
+   All known subnetwork technologies provide an "imperfect" subnetwork
+   service - the bit error rate is non-zero.  But there's no obvious way
+   for end stations to tell the difference between packets discarded due
+   to congestion and losses due to transmission errors.
+
+   If a directly-attached subnetwork is reporting transmission errors to
+   a host, these reports matter, but we can't rely on explicit
+   transmission error reports to both hosts.
+
+   Another way of deciding if a subnetwork should be considered to have
+   a "high error rate" is by appealing to mathematics.
+
+   An approximate formula for the TCP Reno response function is given in
+   [PFTK98]:
+
+                                s
+   T = --------------------------------------------------
+       RTT*sqrt(2p/3) + tRTO*(3*sqrt(3p/8))*p*(1 + 32p**2)
+
+   where
+
+       T = the sending rate in bytes per second
+       s = the packet size in bytes
+       RTT = round-trip time in seconds
+       tRTO = TCP retransmit timeout value in seconds
+       p = steady-state packet loss rate
+
+   If one plugs in an observed packet loss rate, does the math and then
+   sees predicted bandwidth utilization that is greater than the link
+   speed, the connection will not benefit from recommendations in this
+   document, because the level of packet losses being encountered won't
+   affect the ability of TCP to utilize the link.  If, however, the
+   predicted bandwidth is less than the link speed, packet losses are
+   affecting the ability of TCP to utilize the link.
+
+   If further investigation reveals a subnetwork with significant
+   transmission error rates, the recommendations in this document will
+   improve the ability of TCP to utilize the link.
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 3]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   A few caveats are in order, when doing this calculation:
+
+   (1)   the RTT is the end-to-end RTT, not the link RTT.
+   (2)   Max(1.0, 4*RTT) can be substituted as a simplification for
+         tRTO.
+   (3)   losses may be bursty - a loss rate measured over an interval
+         that includes multiple bursty loss events may understate the
+         impact of these loss events on the sending rate.
+
+1.2 Relationship of this recommendation to PEPs
+
+   This document discusses end-to-end mechanisms that do not require
+   TCP-level awareness by intermediate nodes.  This places severe
+   limitations on what the end nodes can know about the nature of losses
+   that are occurring between the end nodes.  Attempts to apply
+   heuristics to distinguish between congestion and transmission error
+   have not been successful [BV97, BV98, BV98a].  This restriction is
+   relaxed in an informational document on Performance Enhancing Proxies
+   (PEPs) [RFC3135]. Because PEPs can be placed on boundaries where
+   network characteristics change dramatically, PEPs have an additional
+   opportunity to improve performance over links with uncorrected
+   errors.
+
+   However, generalized use of PEPs contravenes the end-to-end principle
+   and is highly undesirable given their deleterious implications, which
+   include the following: lack of fate sharing (a PEP adds a third point
+   of failure besides the endpoints themselves), end-to-end reliability
+   and diagnostics, preventing end-to-end security (particularly network
+   layer security such as IPsec), mobility (handoffs are much more
+   complex because state must be transferred), asymmetric routing (PEPs
+   typically require being on both the forward and reverse paths of a
+   connection), scalability (PEPs add more state to maintain), QoS
+   transparency and guarantees.
+
+   Not every type of PEP has all the drawbacks listed above.
+   Nevertheless, the use of PEPs may have very serious consequences
+   which must be weighed carefully.
+
+1.3 Relationship of this recommendation to Link Layer Mechanisms
+
+   This recommendation is for use with TCP over subnetwork technologies
+   (link layers) that have already been deployed.  Subnetworks that are
+   intended to carry Internet protocols, but have not been completely
+   specified are the subject of a best common practices (BCP) document
+   which has been developed or is under development by the Performance
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 4]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   Implications of Link Characteristics WG (PILC) [PILC-WEB].  This last
+   document is aimed at designers who still have the opportunity to
+   reduce the number of uncorrected errors TCP will encounter.
+
+2.0 Errors and Interactions with TCP Mechanisms
+
+   A TCP sender adapts its use of network path capacity based on
+   feedback from the TCP receiver.  As TCP is not able to distinguish
+   between losses due to congestion and losses due to uncorrected
+   errors, it is not able to accurately determine available path
+   capacity in the presence of significant uncorrected errors.
+
+2.1 Slow Start and Congestion Avoidance [RFC2581]
+
+   Slow Start and Congestion Avoidance [RFC2581] are essential to the
+   current stability of the Internet.  These mechanisms were designed to
+   accommodate networks that do not provide explicit congestion
+   notification.  Although experimental mechanisms such as [RFC2481] are
+   moving in the direction of explicit congestion notification, the
+   effect of ECN on ECN-aware TCPs is essentially the same as the effect
+   of implicit congestion notification through congestion-related loss,
+   except that ECN provides this notification before packets are lost,
+   and must then be retransmitted.
+
+   TCP connections experiencing high error rates on their paths interact
+   badly with Slow Start and with Congestion Avoidance, because high
+   error rates make the interpretation of losses ambiguous - the sender
+   cannot know whether detected losses are due to congestion or to data
+   corruption.  TCP makes the "safe" choice and assumes that the losses
+   are due to congestion.
+
+      -  Whenever sending TCPs receive three out-of-order
+         acknowledgement, they assume the network is mildly congested
+         and invoke fast retransmit/fast recovery (described below).
+
+      -  Whenever TCP's retransmission timer expires, the sender assumes
+         that the network is congested and invokes slow start.
+
+      -  Less-reliable link layers often use small link MTUs.  This
+         slows the rate of increase in the sender's window size during
+         slow start, because the sender's window is increased in units
+         of segments.  Small link MTUs alone don't improve reliability.
+         Path MTU discovery [RFC1191] must also be used to prevent
+         fragmentation.  Path MTU discovery allows the most rapid
+         opening of the sender's window size during slow start, but a
+         number of round trips may still be required to open the window
+         completely.
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 5]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   Recommendation: Any standards-conformant TCP will implement Slow
+   Start and Congestion Avoidance, which are MUSTs in STD 3 [RFC1122].
+   Recommendations in this document will not interfere with these
+   mechanisms.
+
+2.2 Fast Retransmit and Fast Recovery [RFC2581]
+
+   TCP provides reliable delivery of data as a byte-stream to an
+   application, so that when a segment is lost (whether due to either
+   congestion or transmission loss), the receiver TCP implementation
+   must wait to deliver data to the receiving application until the
+   missing data is received.  The receiver TCP implementation detects
+   missing segments by segments arriving with out-of-order sequence
+   numbers.
+
+   TCPs should immediately send an acknowledgement when data is received
+   out-of-order [RFC2581], providing the next expected sequence number
+   with no delay, so that the sender can retransmit the required data as
+   quickly as possible and the receiver can resume delivery of data to
+   the receiving application.  When an acknowledgement carries the same
+   expected sequence number as an acknowledgement that has already been
+   sent for the last in-order segment received, these acknowledgement
+   are called "duplicate ACKs".
+
+   Because IP networks are allowed to reorder packets, the receiver may
+   send duplicate acknowledgments for segments that arrive out of order
+   due to routing changes, link-level retransmission, etc.  When a TCP
+   sender receives three duplicate ACKs, fast retransmit [RFC2581]
+   allows it to infer that a segment was lost.  The sender retransmits
+   what it considers to be this lost segment without waiting for the
+   full retransmission timeout, thus saving time.
+
+   After a fast retransmit, a sender halves its congestion window and
+   invokes the fast recovery [RFC2581] algorithm, whereby it invokes
+   congestion avoidance from a halved congestion window, but does not
+   invoke slow start from a one-segment congestion window as it would do
+   after a retransmission timeout.  As the sender is still receiving
+   dupacks, it knows the receiver is receiving packets sent, so the full
+   reduction after a timeout when no communication has been received is
+   not called for.  This relatively safe optimization also saves time.
+
+   It is important to be realistic about the maximum throughput that TCP
+   can have over a connection that traverses a high error-rate link.  In
+   general, TCP will increase its congestion window beyond the delay-
+   bandwidth product.  TCP's congestion avoidance strategy is additive-
+   increase, multiplicative-decrease, which means that if additional
+   errors are encountered before the congestion window recovers
+   completely from a 50-percent reduction, the effect can be a "downward
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 6]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   spiral" of the congestion window due to additional 50-percent
+   reductions.  Even using Fast Retransmit/Fast Recovery, the sender
+   will halve the congestion window each time a window contains one or
+   more segments that are lost, and will re-open the window by one
+   additional segment for each congestion window's worth of
+   acknowledgement received.
+
+   If a connection's path traverses a link that loses one or more
+   segments during this recovery period, the one-half reduction takes
+   place again, this time on a reduced congestion window - and this
+   downward spiral will continue to hold the congestion window below
+   path capacity until the connection is able to recover completely by
+   additive increase without experiencing loss.
+
+   Of course, no downward spiral occurs if the error rate is constantly
+   high and the congestion window always remains small; the
+   multiplicative-increase "slow start" will be exited early, and the
+   congestion window remains low for the duration of the TCP connection.
+   In links with high error rates, the TCP window may remain rather
+   small for long periods of time.
+
+   Not all causes of small windows are related to errors.  For example,
+   HTTP/1.0 commonly closes TCP connections to indicate boundaries
+   between requested resources.  This means that these applications are
+   constantly closing "trained" TCP connections and opening "untrained"
+   TCP connections which will execute slow start, beginning with one or
+   two segments.  This can happen even with HTTP/1.1, if webmasters
+   configure their HTTP/1.1 servers to close connections instead of
+   waiting to see if the connection will be useful again.
+
+   A small window - especially a window of less than four segments -
+   effectively prevents the sender from taking advantage of Fast
+   Retransmits.  Moreover, efficient recovery from multiple losses
+   within a single window requires adoption of new proposals (NewReno
+   [RFC2582]).
+
+   Recommendation: Implement Fast Retransmit and Fast Recovery at this
+   time.  This is a widely-implemented optimization and is currently at
+   Proposed Standard level.  [RFC2488] recommends implementation of Fast
+   Retransmit/Fast Recovery in satellite environments.
+
+2.3 Selective Acknowledgements [RFC2018, RFC2883]
+
+   Selective Acknowledgements [RFC2018] allow the repair of multiple
+   segment losses per window without requiring one (or more) round-trips
+   per loss.
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 7]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   [RFC2883] proposes a minor extension to SACK that allows receiving
+   TCPs to provide more information about the order of delivery of
+   segments, allowing "more robust operation in an environment of
+   reordered packets, ACK loss, packet replication, and/or early
+   retransmit timeouts".  Unless explicitly stated otherwise, in this
+   document, "Selective Acknowledgements" (or "SACK") refers to the
+   combination of [RFC2018] and [RFC2883].
+
+   Selective acknowledgments are most useful in LFNs ("Long Fat
+   Networks") because of the long round trip times that may be
+   encountered in these environments, according to Section 1.1 of
+   [RFC1323], and are especially useful if large windows are required,
+   because there is a higher probability of multiple segment losses per
+   window.
+
+   On the other hand, if error rates are generally low but occasionally
+   higher due to channel conditions, TCP will have the opportunity to
+   increase its window to larger values during periods of improved
+   channel conditions between bursts of errors.  When bursts of errors
+   occur, multiple losses within a window are likely to occur.  In this
+   case, SACK would provide benefits in speeding the recovery and
+   preventing unnecessary reduction of the window size.
+
+   Recommendation: Implement SACK as specified in [RFC2018] and updated
+   by [RFC2883], both Proposed Standards.  In cases where SACK cannot be
+   enabled for both sides of a connection, TCP senders may use NewReno
+   [RFC2582] to better handle partial ACKs and multiple losses within a
+   single window.
+
+3.0 Summary of Recommendations
+
+   The Internet does not provide a widely-available loss feedback
+   mechanism that allows TCP to distinguish between congestion loss and
+   transmission error.  Because congestion affects all traffic on a path
+   while transmission loss affects only the specific traffic
+   encountering uncorrected errors, avoiding congestion has to take
+   precedence over quickly repairing transmission errors.  This means
+   that the best that can be achieved without new feedback mechanisms is
+   minimizing the amount of time that is spent unnecessarily in
+   congestion avoidance.
+
+   The Fast Retransmit/Fast Recovery mechanism allows quick repair of
+   loss without giving up the safety of congestion avoidance.  In order
+   for Fast Retransmit/Fast Recovery to work, the window size must be
+   large enough to force the receiver to send three duplicate
+   acknowledgments before the retransmission timeout interval expires,
+   forcing full TCP slow-start.
+
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 8]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   Selective Acknowledgements (SACK) extend the benefit of Fast
+   Retransmit/Fast Recovery to situations where multiple segment losses
+   in the window need to be repaired more quickly than can be
+   accomplished by executing Fast Retransmit for each segment loss, only
+   to discover the next segment loss.
+
+   These mechanisms are not limited to wireless environments.  They are
+   usable in all environments.
+
+4.0 Topics For Further Work
+
+   "Limited Transmit" [RFC3042] has been specified as an optimization
+   extending Fast Retransmit/Fast Recovery for TCP connections with
+   small congestion windows that will not trigger three duplicate
+   acknowledgments.  This specification is deemed safe, and it also
+   provides benefits for TCP connections that experience a large amount
+   of packet (data or ACK) loss.  Implementors should evaluate this
+   standards track specification for TCP in loss environments.
+
+   Delayed Duplicate Acknowledgements [MV97, VMPM99] attempts to prevent
+   TCP-level retransmission when link-level retransmission is still in
+   progress, adding additional traffic to the network.  This proposal is
+   worthy of additional study, but is not recommended at this time,
+   because we don't know how to calculate appropriate amounts of delay
+   for an arbitrary network topology.
+
+   It is not possible to use explicit congestion notification [RFC2481]
+   as a surrogate for explicit transmission error notification (no
+   matter how much we wish it was!).  Some mechanism to provide explicit
+   notification of transmission error would be very helpful.  This might
+   be more easily provided in a PEP environment, especially when the PEP
+   is the "first hop" in a connection path, because current checksum
+   mechanisms do not distinguish between transmission error to a payload
+   and transmission error to the header.  Furthermore, if the header is
+   damaged, sending explicit transmission error notification to the
+   right endpoint is problematic.
+
+   Losses that take place on the ACK stream, especially while a TCP is
+   learning network characteristics, can make the data stream quite
+   bursty (resulting in losses on the data stream, as well).  Several
+   ways of limiting this burstiness have been proposed, including TCP
+   transmit pacing at the sender and ACK rate control within the
+   network.
+
+   "Appropriate Byte Counting" (ABC) [ALL99], has been proposed as a way
+   of opening the congestion window based on the number of bytes that
+   have been successfully transfered to the receiver, giving more
+   appropriate behavior for application protocols that initiate
+
+
+
+Dawkins, et al.          Best Current Practice                  [Page 9]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   connections with relatively short packets.  For SMTP [RFC2821], for
+   instance, the client might send a short HELO packet, a short MAIL
+   packet, one or more short RCPT packets, and a short DATA packet -
+   followed by the entire mail body sent as maximum-length packets.  An
+   ABC TCP sender would not use ACKs for each of these short packets to
+   increase the congestion window to allow additional full-length
+   packets.  ABC is worthy of additional study, but is not recommended
+   at this time, because ABC can lead to increased burstiness when
+   acknowledgments are lost.
+
+4.1 Achieving, and maintaining, large windows
+
+   The recommendations described in this document will aid TCPs in
+   injecting packets into ERRORed connections as fast as possible
+   without destabilizing the Internet, and so optimizing the use of
+   available bandwidth.
+
+   In addition to these TCP-level recommendations, there is still
+   additional work to do at the application level, especially with the
+   dominant application protocol on the World Wide Web, HTTP.
+
+   HTTP/1.0 (and earlier versions) closes TCP connections to signal a
+   receiver that all of a requested resource had been transmitted.
+   Because WWW objects tend to be small in size [MOGUL], TCPs carrying
+   HTTP/1.0 traffic experience difficulty in "training" on available
+   path capacity (a substantial portion of the transfer has already
+   happened by the time TCP exits slow start).
+
+   Several HTTP modifications have been introduced to improve this
+   interaction with TCP ("persistent connections" in HTTP/1.0, with
+   improvements in HTTP/1.1 [RFC2616]).  For a variety of reasons, many
+   HTTP interactions are still HTTP/1.0-style - relatively short-lived.
+
+   Proposals which reuse TCP congestion information across connections,
+   like TCP Control Block Interdependence [RFC2140], or the more recent
+   Congestion Manager [BS00] proposal, will have the effect of making
+   multiple parallel connections impact the network as if they were a
+   single connection, "trained" after a single startup transient.  These
+   proposals are critical to the long-term stability of the Internet,
+   because today's users always have the choice of clicking on the
+   "reload" button in their browsers and cutting off TCP's exponential
+   backoff - replacing connections which are building knowledge of the
+   available bandwidth with connections with no knowledge at all.
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 10]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+5.0 Security Considerations
+
+   A potential vulnerability introduced by Fast Retransmit/Fast Recovery
+   is (as pointed out in [RFC2581]) that an attacker may force TCP
+   connections to grind to a halt, or, more dangerously, behave more
+   aggressively.  The latter possibility may lead to congestion
+   collapse, at least in some regions of the network.
+
+   Selective acknowledgments is believed to neither strengthen nor
+   weaken TCP's current security properties [RFC2018].
+
+   Given that the recommendations in this document are performed on an
+   end-to-end basis, they continue working even in the presence of end-
+   to-end IPsec.  This is in direct contrast with mechanisms such as
+   PEP's which are implemented in intermediate nodes (section 1.2).
+
+6.0 IANA Considerations
+
+   This document is a pointer to other, existing IETF standards.  There
+   are no new IANA considerations.
+
+7.0 Acknowledgements
+
+   This recommendation has grown out of RFC 2757, "Long Thin Networks",
+   which was in turn based on work done in the IETF TCPSAT working
+   group.  The authors are indebted to the active members of the PILC
+   working group.  In particular, Mark Allman and Lloyd Wood gave us
+   copious and insightful feedback, and Dan Grossman and Jamshid Mahdavi
+   provided text replacements.
+
+References
+
+   [ALL99]    M. Allman, "TCP Byte Counting Refinements," ACM Computer
+              Communication Review, Volume 29, Number 3, July 1999.
+              http://www.acm.org/sigcomm/ccr/archive/ccr-toc/ccr-toc-
+              99.html
+
+   [BS00]     Balakrishnan, H. and S. Seshan, "The Congestion Manager",
+              RFC 3124, June 2001.
+
+   [BV97]     S. Biaz and N. Vaidya, "Using End-to-end Statistics to
+              Distinguish Congestion and Corruption Losses: A Negative
+              Result," Texas A&M University, Technical Report 97-009,
+              August 18, 1997.
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 11]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   [BV98]     S. Biaz and N. Vaidya, "Sender-Based heuristics for
+              Distinguishing Congestion Losses from Wireless
+              Transmission Losses," Texas A&M University, Technical
+              Report 98-013, June 1998.
+
+   [BV98a]    S. Biaz and N. Vaidya, "Discriminating Congestion Losses
+              from Wireless Losses using Inter-Arrival Times at the
+              Receiver," Texas A&M University, Technical Report 98-014,
+              June 1998.
+
+   [MOGUL]    "The Case for Persistent-Connection HTTP", J. C. Mogul,
+              Research Report 95/4, May 1995. Available as
+              http://www.research.digital.com/wrl/techreports/abstracts/
+              95.4.html
+
+   [MV97]     M. Mehta and N. Vaidya, "Delayed Duplicate-
+              Acknowledgements:  A Proposal to Improve Performance of
+              TCP on Wireless Links," Texas A&M University, December 24,
+              1997.  Available at
+              http://www.cs.tamu.edu/faculty/vaidya/mobile.html
+
+   [PILC-WEB] http://pilc.grc.nasa.gov/
+
+   [PFTK98]   Padhye, J., Firoiu, V., Towsley, D. and J.Kurose, "TCP
+              Throughput: A simple model and its empirical validation",
+              SIGCOMM Symposium on Communications Architectures and
+              Protocols, August 1998.
+
+   [RFC793]   Postel, J., "Transmission Control Protocol", STD 7, RFC
+              793, September 1981.
+
+   [RFC2821]  Klensin, J., Editor, "Simple Mail Transfer Protocol", RFC
+              2821, April 2001.
+
+   [RFC1122]  Braden, R., "Requirements for Internet Hosts --
+              Communication Layers", STD 3, RFC 1122, October 1989.
+
+   [RFC1191]  Mogul J., and S. Deering, "Path MTU Discovery", RFC 1191,
+              November 1990.
+
+   [RFC1323]  Jacobson, V., Braden, R. and D. Borman. "TCP Extensions
+              for High Performance", RFC 1323, May 1992.
+
+   [RFC2018]  Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow "TCP
+              Selective Acknowledgment Options", RFC 2018, October 1996.
+
+   [RFC2140]  Touch, J., "TCP Control Block Interdependence", RFC 2140,
+              April 1997.
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 12]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   [RFC2309]  Braden, B., Clark, D., Crowcrfot, J., Davie, B., Deering,
+              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
+              Partridge, C., Peterson, L., Ramakrishnan, K., Shecker,
+              S., Wroclawski, J. and L, Zhang, "Recommendations on Queue
+              Management and Congestion Avoidance in the Internet", RFC
+              2309, April 1998.
+
+   [RFC2481]  Ramakrishnan K. and S. Floyd, "A Proposal to add Explicit
+              Congestion Notification (ECN) to IP", RFC 2481, January
+              1999.
+
+   [RFC2488]  Allman, M., Glover, D. and L. Sanchez. "Enhancing TCP Over
+              Satellite Channels using Standard Mechanisms", BCP 28, RFC
+              2488, January 1999.
+
+   [RFC2581]  Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+              Control", RFC 2581, April 1999.
+
+   [RFC2582]  Floyd, S. and T. Henderson, "The NewReno Modification to
+              TCP's Fast Recovery Algorithm", RFC 2582, April 1999.
+
+   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+              Masinter, L., Leach P. and T. Berners-Lee, "Hypertext
+              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
+
+   [RFC2861]  Handley, H., Padhye, J. and S., Floyd, "TCP Congestion
+              Window Validation", RFC 2861, June 2000.
+
+   [RFC2883]  Floyd, S., Mahdavi, M., Mathis, M. and M. Podlosky, "An
+              Extension to the Selective Acknowledgement (SACK) Option
+              for TCP", RFC 2883, August 1999.
+
+   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery", RFC
+              2923, September 2000.
+
+   [RFC3042]  Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing
+              TCP's Loss Recovery Using Limited Transmit", RFC 3042,
+              January, 2001.
+
+   [RFC3135]  Border, J., Kojo, M., Griner, J., Montenegro, G. and Z.
+              Shelby, "Performance Enhancing Proxies Intended to
+              Mitigate Link-Related Degradations", RFC 3135, June 2001.
+
+   [VJ-DCAC]  Jacobson, V., "Dynamic Congestion Avoidance / Control" e-
+              mail dated February 11, 1988, available from
+              http://www.kohala.com/~rstevens/vanj.88feb11.txt
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 13]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   [VMPM99]   N. Vaidya, M. Mehta, C. Perkins, and G. Montenegro,
+              "Delayed Duplicate Acknowledgements: A TCP-Unaware
+              Approach to Improve Performance of TCP over Wireless,"
+              Technical Report 99-003, Computer Science Dept., Texas A&M
+              University, February 1999. Also, to appear in Journal of
+              Wireless Communications and Wireless Computing (Special
+              Issue on Reliable Transport Protocols for Mobile
+              Computing).
+
+Authors' Addresses
+
+   Questions about this document may be directed to:
+
+   Spencer Dawkins
+   Fujitsu Network Communications
+   2801 Telecom Parkway
+   Richardson, Texas 75082
+
+   Phone: +1-972-479-3782
+   EMail: spencer.dawkins@fnc.fujitsu.com
+
+
+   Gabriel E. Montenegro
+   Sun Microsystems
+   Laboratories, Europe
+   29, chemin du Vieux Chene
+   38240 Meylan
+   FRANCE
+
+   Phone: +33 476 18 80 45
+   EMail: gab@sun.com
+
+
+   Markku Kojo
+   Department of Computer Science
+   University of Helsinki
+   P.O. Box 26 (Teollisuuskatu 23)
+   FIN-00014 HELSINKI
+   Finland
+
+   Phone: +358-9-1914-4179
+   EMail: kojo@cs.helsinki.fi
+
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 14]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+   Vincent Magret
+   Alcatel Internetworking, Inc.
+   26801 W. Agoura road
+   Calabasas, CA, 91301
+
+   Phone: +1 818 878 4485
+   EMail: vincent.magret@alcatel.com
+
+
+   Nitin H. Vaidya
+   458 Coodinated Science Laboratory, MC-228
+   1308 West Main Street
+   Urbana, IL 61801
+
+   Phone: 217-265-5414
+   E-mail: nhv@crhc.uiuc.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 15]
+
+RFC 3155                PILC - Links with Errors             August 2001
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2001).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Dawkins, et al.          Best Current Practice                 [Page 16]
+
--- a/kernel/picotcp/RFC/rfc3168.txt
+++ b/kernel/picotcp/RFC/rfc3168.txt
--- a/kernel/picotcp/RFC/rfc3360.txt
+++ b/kernel/picotcp/RFC/rfc3360.txt
--- a/kernel/picotcp/RFC/rfc3366.txt
+++ b/kernel/picotcp/RFC/rfc3366.txt
--- a/kernel/picotcp/RFC/rfc3390.txt
+++ b/kernel/picotcp/RFC/rfc3390.txt
@ -0,0 +1,843 @@
+
+
+
+
+
+
+Network Working Group                                          M. Allman
+Request for Comments: 3390                                  BBN/NASA GRC
+Obsoletes: 2414                                                 S. Floyd
+Updates: 2581                                                       ICIR
+Category: Standards Track                                   C. Partridge
+                                                        BBN Technologies
+                                                            October 2002
+
+
+                    Increasing TCP's Initial Window
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2002).  All Rights Reserved.
+
+Abstract
+
+   This document specifies an optional standard for TCP to increase the
+   permitted initial window from one or two segment(s) to roughly 4K
+   bytes, replacing RFC 2414.  It discusses the advantages and
+   disadvantages of the higher initial window, and includes discussion
+   of experiments and simulations showing that the higher initial window
+   does not lead to congestion collapse.  Finally, this document
+   provides guidance on implementation issues.
+
+Terminology
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in RFC 2119 [RFC2119].
+
+1.  TCP Modification
+
+   This document obsoletes [RFC2414] and updates [RFC2581] and specifies
+   an increase in the permitted upper bound for TCP's initial window
+   from one or two segment(s) to between two and four segments.  In most
+   cases, this change results in an upper bound on the initial window of
+   roughly 4K bytes (although given a large segment size, the permitted
+   initial window of two segments may be significantly larger than 4K
+   bytes).
+
+
+
+Allman, et. al.             Standards Track                     [Page 1]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   The upper bound for the initial window is given more precisely in
+   (1):
+
+         min (4*MSS, max (2*MSS, 4380 bytes))                        (1)
+
+   Note: Sending a 1500 byte packet indicates a maximum segment size
+   (MSS) of 1460 bytes (assuming no IP or TCP options).  Therefore,
+   limiting the initial window's MSS to 4380 bytes allows the sender to
+   transmit three segments initially in the common case when using 1500
+   byte packets.
+
+   Equivalently, the upper bound for the initial window size is based on
+   the MSS, as follows:
+
+       If (MSS <= 1095 bytes)
+           then win <= 4 * MSS;
+       If (1095 bytes < MSS < 2190 bytes)
+           then win <= 4380;
+       If (2190 bytes <= MSS)
+           then win <= 2 * MSS;
+
+   This increased initial window is optional: a TCP MAY start with a
+   larger initial window.  However, we expect that most general-purpose
+   TCP implementations would choose to use the larger initial congestion
+   window given in equation (1) above.
+
+   This upper bound for the initial window size represents a change from
+   RFC 2581 [RFC2581], which specified that the congestion window be
+   initialized to one or two segments.
+
+   This change applies to the initial window of the connection in the
+   first round trip time (RTT) of data transmission following the TCP
+   three-way handshake.  Neither the SYN/ACK nor its acknowledgment
+   (ACK) in the three-way handshake should increase the initial window
+   size above that outlined in equation (1).  If the SYN or SYN/ACK is
+   lost, the initial window used by a sender after a correctly
+   transmitted SYN MUST be one segment consisting of MSS bytes.
+
+   TCP implementations use slow start in as many as three different
+   ways: (1) to start a new connection (the initial window); (2) to
+   restart transmission after a long idle period (the restart window);
+   and (3) to restart transmission after a retransmit timeout (the loss
+   window).  The change specified in this document affects the value of
+   the initial window.  Optionally, a TCP MAY set the restart window to
+   the minimum of the value used for the initial window and the current
+   value of cwnd (in other words, using a larger value for the restart
+   window should never increase the size of cwnd).  These changes do NOT
+   change the loss window, which must remain 1 segment of MSS bytes (to
+
+
+
+Allman, et. al.             Standards Track                     [Page 2]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   permit the lowest possible window size in the case of severe
+   congestion).
+
+2.  Implementation Issues
+
+   When larger initial windows are implemented along with Path MTU
+   Discovery [RFC1191], and the MSS being used is found to be too large,
+   the congestion window `cwnd' SHOULD be reduced to prevent large
+   bursts of smaller segments.  Specifically, `cwnd' SHOULD be reduced
+   by the ratio of the old segment size to the new segment size.
+
+   When larger initial windows are implemented along with Path MTU
+   Discovery [RFC1191], alternatives are to set the "Don't Fragment"
+   (DF) bit in all segments in the initial window, or to set the "Don't
+   Fragment" (DF) bit in one of the segments.  It is an open question as
+   to which of these two alternatives is best; we would hope that
+   implementation experiences will shed light on this question.  In the
+   first case of setting the DF bit in all segments, if the initial
+   packets are too large, then all of the initial packets will be
+   dropped in the network.  In the second case of setting the DF bit in
+   only one segment, if the initial packets are too large, then all but
+   one of the initial packets will be fragmented in the network.  When
+   the second case is followed, setting the DF bit in the last segment
+   in the initial window provides the least chance for needless
+   retransmissions when the initial segment size is found to be too
+   large, because it minimizes the chances of duplicate ACKs triggering
+   a Fast Retransmit.  However, more attention needs to be paid to the
+   interaction between larger initial windows and Path MTU Discovery.
+
+   The larger initial window specified in this document is not intended
+   as encouragement for web browsers to open multiple simultaneous TCP
+   connections, all with large initial windows.  When web browsers open
+   simultaneous TCP connections to the same destination, they are
+   working against TCP's congestion control mechanisms [FF99],
+   regardless of the size of the initial window.  Combining this
+   behavior with larger initial windows further increases the unfairness
+   to other traffic in the network.  We suggest the use of HTTP/1.1
+   [RFC2068] (persistent TCP connections and pipelining) as a way to
+   achieve better performance of web transfers.
+
+3.  Advantages of Larger Initial Windows
+
+   1.  When the initial window is one segment, a receiver employing
+       delayed ACKs [RFC1122] is forced to wait for a timeout before
+       generating an ACK.  With an initial window of at least two
+       segments, the receiver will generate an ACK after the second data
+       segment arrives.  This eliminates the wait on the timeout (often
+       up to 200 msec, and possibly up to 500 msec [RFC1122]).
+
+
+
+Allman, et. al.             Standards Track                     [Page 3]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   2.  For connections transmitting only a small amount of data, a
+       larger initial window reduces the transmission time (assuming at
+       most moderate segment drop rates).  For many email (SMTP [Pos82])
+       and web page (HTTP [RFC1945, RFC2068]) transfers that are less
+       than 4K bytes, the larger initial window would reduce the data
+       transfer time to a single RTT.
+
+   3.  For connections that will be able to use large congestion
+       windows, this modification eliminates up to three RTTs and a
+       delayed ACK timeout during the initial slow-start phase.  This
+       will be of particular benefit for high-bandwidth large-
+       propagation-delay TCP connections, such as those over satellite
+       links.
+
+4.  Disadvantages of Larger Initial Windows for the Individual
+    Connection
+
+   In high-congestion environments, particularly for routers that have a
+   bias against bursty traffic (as in the typical Drop Tail router
+   queues), a TCP connection can sometimes be better off starting with
+   an initial window of one segment.  There are scenarios where a TCP
+   connection slow-starting from an initial window of one segment might
+   not have segments dropped, while a TCP connection starting with an
+   initial window of four segments might experience unnecessary
+   retransmits due to the inability of the router to handle small
+   bursts.  This could result in an unnecessary retransmit timeout.  For
+   a large-window connection that is able to recover without a
+   retransmit timeout, this could result in an unnecessarily-early
+   transition from the slow-start to the congestion-avoidance phase of
+   the window increase algorithm.  These premature segment drops are
+   unlikely to occur in uncongested networks with sufficient buffering
+   or in moderately-congested networks where the congested router uses
+   active queue management (such as Random Early Detection [FJ93,
+   RFC2309]).
+
+   Some TCP connections will receive better performance with the larger
+   initial window even if the burstiness of the initial window results
+   in premature segment drops.  This will be true if (1) the TCP
+   connection recovers from the segment drop without a retransmit
+   timeout, and (2) the TCP connection is ultimately limited to a small
+   congestion window by either network congestion or by the receiver's
+   advertised window.
+
+5.  Disadvantages of Larger Initial Windows for the Network
+
+   In terms of the potential for congestion collapse, we consider two
+   separate potential dangers for the network.  The first danger would
+   be a scenario where a large number of segments on congested links
+
+
+
+Allman, et. al.             Standards Track                     [Page 4]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   were duplicate segments that had already been received at the
+   receiver.  The second danger would be a scenario where a large number
+   of segments on congested links were segments that would be dropped
+   later in the network before reaching their final destination.
+
+   In terms of the negative effect on other traffic in the network, a
+   potential disadvantage of larger initial windows would be that they
+   increase the general packet drop rate in the network.  We discuss
+   these three issues below.
+
+   Duplicate segments:
+
+       As described in the previous section, the larger initial window
+       could occasionally result in a segment dropped from the initial
+       window, when that segment might not have been dropped if the
+       sender had slow-started from an initial window of one segment.
+       However, Appendix A shows that even in this case, the larger
+       initial window would not result in the transmission of a large
+       number of duplicate segments.
+
+   Segments dropped later in the network:
+
+       How much would the larger initial window for TCP increase the
+       number of segments on congested links that would be dropped
+       before reaching their final destination?  This is a problem that
+       can only occur for connections with multiple congested links,
+       where some segments might use scarce bandwidth on the first
+       congested link along the path, only to be dropped later along the
+       path.
+
+       First, many of the TCP connections will have only one congested
+       link along the path.  Segments dropped from these connections do
+       not "waste" scarce bandwidth, and do not contribute to congestion
+       collapse.
+
+       However, some network paths will have multiple congested links,
+       and segments dropped from the initial window could use scarce
+       bandwidth along the earlier congested links before ultimately
+       being dropped on subsequent congested links.  To the extent that
+       the drop rate is independent of the initial window used by TCP
+       segments, the problem of congested links carrying segments that
+       will be dropped before reaching their destination will be similar
+       for TCP connections that start by sending four segments or one
+       segment.
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 5]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   An increased packet drop rate:
+
+       For a network with a high segment drop rate, increasing the TCP
+       initial window could increase the segment drop rate even further.
+       This is in part because routers with Drop Tail queue management
+       have difficulties with bursty traffic in times of congestion.
+       However, given uncorrelated arrivals for TCP connections, the
+       larger TCP initial window should not significantly increase the
+       segment drop rate.  Simulation-based explorations of these issues
+       are discussed in Section 7.2.
+
+   These potential dangers for the network are explored in simulations
+   and experiments described in the section below.  Our judgment is that
+   while there are dangers of congestion collapse in the current
+   Internet (see [FF99] for a discussion of the dangers of congestion
+   collapse from an increased deployment of UDP connections without
+   end-to-end congestion control), there is no such danger to the
+   network from increasing the TCP initial window to 4K bytes.
+
+6.  Interactions with the Retransmission Timer
+
+   Using a larger initial burst of data can exacerbate existing problems
+   with spurious retransmit timeouts on low-bandwidth paths, assuming
+   the standard algorithm for determining the TCP retransmission timeout
+   (RTO) [RFC2988].  The problem is that across low-bandwidth network
+   paths on which the transmission time of a packet is a large portion
+   of the round-trip time, the small packets used to establish a TCP
+   connection do not seed the RTO estimator appropriately.  When the
+   first window of data packets is transmitted, the sender's retransmit
+   timer could expire before the acknowledgments for those packets are
+   received.  As each acknowledgment arrives, the retransmit timer is
+   generally reset.  Thus, the retransmit timer will not expire as long
+   as an acknowledgment arrives at least once a second, given the one-
+   second minimum on the RTO recommended in RFC 2988.
+
+   For instance, consider a 9.6 Kbps link.  The initial RTT measurement
+   will be on the order of 67 msec, if we simply consider the
+   transmission time of 2 packets (the SYN and SYN-ACK), each consisting
+   of 40 bytes.  Using the RTO estimator given in [RFC2988], this yields
+   an initial RTO of 201 msec (67 + 4*(67/2)).  However, we round the
+   RTO to 1 second as specified in RFC 2988.  Then assume we send an
+   initial window of one or more 1500-byte packets (1460 data bytes plus
+   overhead).  Each packet will take on the order of 1.25 seconds to
+   transmit.  Therefore, the RTO will fire before the ACK for the first
+   packet returns, causing a spurious timeout.  In this case, a larger
+   initial window of three or four packets exacerbates the problems
+   caused by this spurious timeout.
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 6]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   One way to deal with this problem is to make the RTO algorithm more
+   conservative.  During the initial window of data, for instance, the
+   RTO could be updated for each acknowledgment received.  In addition,
+   if the retransmit timer expires for some packet lost in the first
+   window of data, we could leave the exponential-backoff of the
+   retransmit timer engaged until at least one valid RTT measurement,
+   that involves a data packet, is received.
+
+   Another method would be to refrain from taking an RTT sample during
+   connection establishment, leaving the default RTO in place until TCP
+   takes a sample from a data segment and the corresponding ACK.  While
+   this method likely helps prevent spurious retransmits, it also may
+   slow the data transfer down if loss occurs before the RTO is seeded.
+   The use of limited transmit [RFC3042] to aid a TCP connection in
+   recovering from loss using fast retransmit rather than the RTO timer
+   mitigates the performance degradation caused by using the high
+   default RTO during the initial window of data transmission.
+
+   This specification leaves the decision about what to do (if anything)
+   with regards to the RTO, when using a larger initial window, to the
+   implementer.  However, the RECOMMENDED approach is to refrain from
+   sampling the RTT during the three-way handshake, keeping the default
+   RTO in place until an RTT sample involving a data packet is taken.
+   In addition, it is RECOMMENDED that TCPs use limited transmit
+   [RFC3042].
+
+7.  Typical Levels of Burstiness for TCP Traffic.
+
+   Larger TCP initial windows would not dramatically increase the
+   burstiness of TCP traffic in the Internet today, because such traffic
+   is already fairly bursty.  Bursts of two and three segments are
+   already typical of TCP [Flo97]; a delayed ACK (covering two
+   previously unacknowledged segments) received during congestion
+   avoidance causes the congestion window to slide and two segments to
+   be sent.  The same delayed ACK received during slow start causes the
+   window to slide by two segments and then be incremented by one
+   segment, resulting in a three-segment burst.  While not necessarily
+   typical, bursts of four and five segments for TCP are not rare.
+   Assuming delayed ACKs, a single dropped ACK causes the subsequent ACK
+   to cover four previously unacknowledged segments.  During congestion
+   avoidance this leads to a four-segment burst, and during slow start a
+   five-segment burst is generated.
+
+   There are also changes in progress that reduce the performance
+   problems posed by moderate traffic bursts.  One such change is the
+   deployment of higher-speed links in some parts of the network, where
+   a burst of 4K bytes can represent a small quantity of data.  A second
+   change, for routers with sufficient buffering, is the deployment of
+
+
+
+Allman, et. al.             Standards Track                     [Page 7]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   queue management mechanisms such as RED, which is designed to be
+   tolerant of transient traffic bursts.
+
+8.  Simulations and Experimental Results
+
+8.1 Studies of TCP Connections using that Larger Initial Window
+
+   This section surveys simulations and experiments that explore the
+   effect of larger initial windows on TCP connections.  The first set
+   of experiments explores performance over satellite links.  Larger
+   initial windows have been shown to improve the performance of TCP
+   connections over satellite channels [All97b].  In this study, an
+   initial window of four segments (512 byte MSS) resulted in throughput
+   improvements of up to 30% (depending upon transfer size).  [KAGT98]
+   shows that the use of larger initial windows results in a decrease in
+   transfer time in HTTP tests over the ACTS satellite system.  A study
+   involving simulations of a large number of HTTP transactions over
+   hybrid fiber coax (HFC) indicates that the use of larger initial
+   windows decreases the time required to load WWW pages [Nic98].
+
+   A second set of experiments explored TCP performance over dialup
+   modem links.  In experiments over a 28.8 bps dialup channel [All97a,
+   AHO98], a four-segment initial window decreased the transfer time of
+   a 16KB file by roughly 10%, with no accompanying increase in the drop
+   rate.  A simulation study [RFC2416] investigated the effects of using
+   a larger initial window on a host connected by a slow modem link and
+   a router with a 3 packet buffer.  The study concluded that for the
+   scenario investigated, the use of larger initial windows was not
+   harmful to TCP performance.
+
+   Finally, [All00] illustrates that the percentage of connections at a
+   particular web server that experience loss in the initial window of
+   data transmission increases with the size of the initial congestion
+   window.  However, the increase is in line with what would be expected
+   from sending a larger burst into the network.
+
+8.2 Studies of Networks using Larger Initial Windows
+
+   This section surveys simulations and experiments investigating the
+   impact of the larger window on other TCP connections sharing the
+   path.  Experiments in [All97a, AHO98] show that for 16 KB transfers
+   to 100 Internet hosts, four-segment initial windows resulted in a
+   small increase in the drop rate of 0.04 segments/transfer.  While the
+   drop rate increased slightly, the transfer time was reduced by
+   roughly 25% for transfers using the four-segment (512 byte MSS)
+   initial window when compared to an initial window of one segment.
+
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 8]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   A simulation study in [RFC2415] explores the impact of a larger
+   initial window on competing network traffic.  In this investigation,
+   HTTP and FTP flows share a single congested gateway (where the number
+   of HTTP and FTP flows varies from one simulation set to another).
+   For each simulation set, the paper examines aggregate link
+   utilization and packet drop rates, median web page delay, and network
+   power for the FTP transfers.  The larger initial window generally
+   resulted in increased throughput, slightly-increased packet drop
+   rates, and an increase in overall network power.  With the exception
+   of one scenario, the larger initial window resulted in an increase in
+   the drop rate of less than 1% above the loss rate experienced when
+   using a one-segment initial window; in this scenario, the drop rate
+   increased from 3.5% with one-segment initial windows, to 4.5% with
+   four-segment initial windows.  The overall conclusions were that
+   increasing the TCP initial window to three packets (or 4380 bytes)
+   helps to improve perceived performance.
+
+   Morris [Mor97] investigated larger initial windows in a highly
+   congested network with transfers of 20K in size.  The loss rate in
+   networks where all TCP connections use an initial window of four
+   segments is shown to be 1-2% greater than in a network where all
+   connections use an initial window of one segment.  This relationship
+   held in scenarios where the loss rates with one-segment initial
+   windows ranged from 1% to 11%.  In addition, in networks where
+   connections used an initial window of four segments, TCP connections
+   spent more time waiting for the retransmit timer (RTO) to expire to
+   resend a segment than was spent using an initial window of one
+   segment.  The time spent waiting for the RTO timer to expire
+   represents idle time when no useful work was being accomplished for
+   that connection.  These results show that in a very congested
+   environment, where each connection's share of the bottleneck
+   bandwidth is close to one segment, using a larger initial window can
+   cause a perceptible increase in both loss rates and retransmit
+   timeouts.
+
+9.  Security Considerations
+
+   This document discusses the initial congestion window permitted for
+   TCP connections.  Changing this value does not raise any known new
+   security issues with TCP.
+
+10. Conclusion
+
+   This document specifies a small change to TCP that will likely be
+   beneficial to short-lived TCP connections and those over links with
+   long RTTs (saving several RTTs during the initial slow-start phase).
+
+
+
+
+
+Allman, et. al.             Standards Track                     [Page 9]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+11. Acknowledgments
+
+   We would like to acknowledge Vern Paxson, Tim Shepard, members of the
+   End-to-End-Interest Mailing List, and members of the IETF TCP
+   Implementation Working Group for continuing discussions of these
+   issues and for feedback on this document.
+
+12. References
+
+   [AHO98]   Mark Allman, Chris Hayes, and Shawn Ostermann, An
+             Evaluation of TCP with Larger Initial Windows, March 1998.
+             ACM Computer Communication Review, 28(3), July 1998.  URL
+             "http://roland.lerc.nasa.gov/~mallman/papers/initwin.ps".
+
+   [All97a]  Mark Allman.  An Evaluation of TCP with Larger Initial
+             Windows.  40th IETF Meeting -- TCP Implementations WG.
+             December, 1997.  Washington, DC.
+
+   [All97b]  Mark Allman.  Improving TCP Performance Over Satellite
+             Channels.  Master's thesis, Ohio University, June 1997.
+
+   [All00]   Mark Allman. A Web Server's View of the Transport Layer.
+             ACM Computer Communication Review, 30(5), October 2000.
+
+   [FF96]    Fall, K., and Floyd, S., Simulation-based Comparisons of
+             Tahoe, Reno, and SACK TCP.  Computer Communication Review,
+             26(3), July 1996.
+
+   [FF99]    Sally Floyd, Kevin Fall.  Promoting the Use of End-to-End
+             Congestion Control in the Internet.  IEEE/ACM Transactions
+             on Networking, August 1999.  URL
+             "http://www.icir.org/floyd/end2end-paper.html".
+
+   [FJ93]    Floyd, S., and Jacobson, V., Random Early Detection
+             gateways for Congestion Avoidance. IEEE/ACM Transactions on
+             Networking, V.1 N.4, August 1993, p. 397-413.
+
+   [Flo94]   Floyd, S., TCP and Explicit Congestion Notification.
+             Computer Communication Review, 24(5):10-23, October 1994.
+
+   [Flo96]   Floyd, S., Issues of TCP with SACK. Technical report,
+             January 1996.  Available from http://www-
+             nrg.ee.lbl.gov/floyd/.
+
+   [Flo97]   Floyd, S., Increasing TCP's Initial Window.  Viewgraphs,
+             40th IETF Meeting - TCP Implementations WG. December, 1997.
+             URL "ftp://ftp.ee.lbl.gov/talks/sf-tcp-ietf97.ps".
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 10]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   [KAGT98]  Hans Kruse, Mark Allman, Jim Griner, Diepchi Tran.  HTTP
+             Page Transfer Rates Over Geo-Stationary Satellite Links.
+             March 1998.  Proceedings of the Sixth International
+             Conference on Telecommunication Systems.  URL
+             "http://roland.lerc.nasa.gov/~mallman/papers/nash98.ps".
+
+   [Mor97]   Robert Morris.  Private communication, 1997.  Cited for
+             acknowledgement purposes only.
+
+   [Nic98]   Kathleen Nichols. Improving Network Simulation With
+             Feedback, Proceedings of LCN 98, October 1998. URL
+             "http://www.computer.org/proceedings/lcn/8810/8810toc.htm".
+
+   [Pos82]   Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC
+             821, August 1982.
+
+   [RFC1122] Braden, R., "Requirements for Internet Hosts --
+             Communication Layers", STD 3, RFC 1122, October 1989.
+
+   [RFC1191] Mogul, J. and S. Deering, "Path MTU Discovery", RFC 1191,
+             November 1990.
+
+   [RFC1945] Berners-Lee, T., Fielding, R. and H. Nielsen, "Hypertext
+             Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996.
+
+   [RFC2068] Fielding, R., Mogul, J., Gettys, J., Frystyk, H. and T.
+             Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC
+             2616, January 1997.
+
+   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+             Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
+             S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
+             Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S.,
+             Wroclawski, J. and L.  Zhang, "Recommendations on Queue
+             Management and Congestion Avoidance in the Internet", RFC
+             2309, April 1998.
+
+   [RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's
+             Initial Window", RFC 2414, September 1998.
+
+   [RFC2415] Poduri, K. and K. Nichols, "Simulation Studies of Increased
+             Initial TCP Window Size", RFC 2415, September 1998.
+
+   [RFC2416] Shepard, T. and C. Partridge, "When TCP Starts Up With Four
+             Packets Into Only Three Buffers", RFC 2416, September 1998.
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 11]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+             Control", RFC 2581, April 1999.
+
+   [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
+             April 2001.
+
+   [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
+             Timer", RFC 2988, November 2000.
+
+   [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's
+             Loss Recovery Using Limited Transmit", RFC 3042, January
+             2001.
+
+   [RFC3168] Ramakrishnan, K.K., Floyd, S. and D. Black, "The Addition
+             of Explicit Congestion Notification (ECN) to IP", RFC 3168,
+             September 2001.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 12]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+Appendix A - Duplicate Segments
+
+   In the current environment (without Explicit Congestion Notification
+   [Flo94] [RFC2481]), all TCPs use segment drops as indications from
+   the network about the limits of available bandwidth.  We argue here
+   that the change to a larger initial window should not result in the
+   sender retransmitting a large number of duplicate segments that have
+   already arrived at the receiver.
+
+   If one segment is dropped from the initial window, there are three
+   different ways for TCP to recover: (1) Slow-starting from a window of
+   one segment, as is done after a retransmit timeout, or after Fast
+   Retransmit in Tahoe TCP; (2) Fast Recovery without selective
+   acknowledgments (SACK), as is done after three duplicate ACKs in Reno
+   TCP; and (3) Fast Recovery with SACK, for TCP where both the sender
+   and the receiver support the SACK option [MMFR96].  In all three
+   cases, if a single segment is dropped from the initial window, no
+   duplicate segments (i.e., segments that have already been received at
+   the receiver) are transmitted.  Note that for a TCP sending four
+   512-byte segments in the initial window, a single segment drop will
+   not require a retransmit timeout, but can be recovered by using the
+   Fast Retransmit algorithm (unless the retransmit timer expires
+   prematurely).  In addition, a single segment dropped from an initial
+   window of three segments might be repaired using the fast retransmit
+   algorithm, depending on which segment is dropped and whether or not
+   delayed ACKs are used.  For example, dropping the first segment of a
+   three segment initial window will always require waiting for a
+   timeout, in the absence of Limited Transmit [RFC3042].  However,
+   dropping the third segment will always allow recovery via the fast
+   retransmit algorithm, as long as no ACKs are lost.
+
+   Next we consider scenarios where the initial window contains two to
+   four segments, and at least two of those segments are dropped.  If
+   all segments in the initial window are dropped, then clearly no
+   duplicate segments are retransmitted, as the receiver has not yet
+   received any segments.  (It is still a possibility that these dropped
+   segments used scarce bandwidth on the way to their drop point; this
+   issue was discussed in Section 5.)
+
+   When two segments are dropped from an initial window of three
+   segments, the sender will only send a duplicate segment if the first
+   two of the three segments were dropped, and the sender does not
+   receive a packet with the SACK option acknowledging the third
+   segment.
+
+   When two segments are dropped from an initial window of four
+   segments, an examination of the six possible scenarios (which we
+   don't go through here) shows that, depending on the position of the
+
+
+
+Allman, et. al.             Standards Track                    [Page 13]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+   dropped packets, in the absence of SACK the sender might send one
+   duplicate segment.  There are no scenarios in which the sender sends
+   two duplicate segments.
+
+   When three segments are dropped from an initial window of four
+   segments, then, in the absence of SACK, it is possible that one
+   duplicate segment will be sent, depending on the position of the
+   dropped segments.
+
+   The summary is that in the absence of SACK, there are some scenarios
+   with multiple segment drops from the initial window where one
+   duplicate segment will be transmitted.  There are no scenarios in
+   which more than one duplicate segment will be transmitted.  Our
+   conclusion is than the number of duplicate segments transmitted as a
+   result of a larger initial window should be small.
+
+Author's Addresses
+
+   Mark Allman
+   BBN Technologies/NASA Glenn Research Center
+   21000 Brookpark Rd
+   MS 54-5
+   Cleveland, OH 44135
+   EMail: mallman@bbn.com
+   http://roland.lerc.nasa.gov/~mallman/
+
+   Sally Floyd
+   ICSI Center for Internet Research
+   1947 Center St, Suite 600
+   Berkeley, CA 94704
+   Phone: +1 (510) 666-2989
+   EMail: floyd@icir.org
+   http://www.icir.org/floyd/
+
+   Craig Partridge
+   BBN Technologies
+   10 Moulton St
+   Cambridge, MA 02138
+   EMail: craig@bbn.com
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 14]
+
+RFC 3390            Increasing TCP's Initial Window         October 2002
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2002).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et. al.             Standards Track                    [Page 15]
+
--- a/kernel/picotcp/RFC/rfc3449.txt
+++ b/kernel/picotcp/RFC/rfc3449.txt
--- a/kernel/picotcp/RFC/rfc3465.txt
+++ b/kernel/picotcp/RFC/rfc3465.txt
@ -0,0 +1,563 @@
+
+
+
+
+
+
+Network Working Group                                          M. Allman
+Request for Comments: 3465                                  BBN/NASA GRC
+Category: Experimental                                     February 2003
+
+
+      TCP Congestion Control with Appropriate Byte Counting (ABC)
+
+Status of this Memo
+
+   This memo defines an Experimental Protocol for the Internet
+   community.  It does not specify an Internet standard of any kind.
+   Discussion and suggestions for improvement are requested.
+   Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+Abstract
+
+   This document proposes a small modification to the way TCP increases
+   its congestion window.  Rather than the traditional method of
+   increasing the congestion window by a constant amount for each
+   arriving acknowledgment, the document suggests basing the increase on
+   the number of previously unacknowledged bytes each ACK covers.  This
+   change improves the performance of TCP, as well as closes a security
+   hole TCP receivers can use to induce the sender into increasing the
+   sending rate too rapidly.
+
+Terminology
+
+   Much of the language in this document is taken from [RFC2581].
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in [RFC2119].
+
+1   Introduction
+
+   This document proposes a modification to the algorithm for increasing
+   TCP's congestion window (cwnd) that improves both performance and
+   security.  Rather than increasing a TCP's congestion window based on
+   the number of acknowledgments (ACKs) that arrive at the data sender
+   (per the current specification [RFC2581]), the congestion window is
+   increased based on the number of bytes acknowledged by the arriving
+   ACKs.  The algorithm improves performance by mitigating the impact of
+   delayed ACKs on the growth of cwnd.  At the same time, the algorithm
+   provides cwnd growth in direct relation to the probed capacity of a
+
+
+
+Allman                        Experimental                      [Page 1]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+   network path, therefore providing a more measured response to ACKs
+   that cover only small amounts of data (less than a full segment size)
+   than ACK counting.  This more appropriate cwnd growth can improve
+   both performance and can prevent inappropriate cwnd growth in
+   response to a misbehaving receiver.  On the other hand, in some cases
+   the modified cwnd growth algorithm causes larger bursts of segments
+   to be sent into the network.  In some cases this can lead to a non-
+   negligible increase in the drop rate and reduced performance (see
+   section 4 for a larger discussion of the issues).
+
+   This document is organized as follows.  Section 2 outlines the
+   modified algorithm for increasing TCP's congestion window.  Section 3
+   discusses the advantages of using the modified algorithm.  Section 4
+   discusses the disadvantages of the approach outlined in this
+   document.  Section 5 outlines some of the fairness issues that must
+   be considered for the modified algorithm.  Section 6 discusses
+   security considerations.
+
+   Statement of Intent
+
+      This specification contains an algorithm improving the performance
+      of TCP which is understood to be effective and safe, but which has
+      not been widely deployed.  One goal of publication as an
+      Experimental RFC is to be prudent, and encourage use and
+      deployment prior to publication in the standards track.  It is the
+      intent of the Transport Area to re-submit this specification as an
+      IETF Proposed Standard in the future, after more experience has
+      been gained.
+
+2   A Modified Algorithm for Increasing the Congestion Window
+
+   As originally outlined in [Jac88] and specified in [RFC2581], TCP
+   uses two algorithms for increasing the congestion window.  During
+   steady-state, TCP uses the Congestion Avoidance algorithm to linearly
+   increase the value of cwnd.  At the beginning of a transfer, after a
+   retransmission timeout or after a long idle period (in some
+   implementations), TCP uses the Slow Start algorithm to increase cwnd
+   exponentially.  According to RFC 2581, slow start bases the cwnd
+   increase on the number of incoming acknowledgments.  During
+   congestion avoidance RFC 2581 allows more latitude in increasing
+   cwnd, but traditionally implementations have based the increase on
+   the number of arriving ACKs.  In the following two subsections, we
+   detail modifications to these algorithms to increase cwnd based on
+   the number of bytes being acknowledged by each arriving ACK, rather
+   than by the number of ACKs that arrive.  We call these changes
+   "Appropriate Byte Counting" (ABC) [All99].
+
+
+
+
+
+Allman                        Experimental                      [Page 2]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+2.1 Congestion Avoidance
+
+   RFC 2581 specifies that cwnd should be increased by 1 segment per
+   round-trip time (RTT) during the congestion avoidance phase of a
+   transfer.  Traditionally, TCPs have approximated this increase by
+   increasing cwnd by 1/cwnd for each arriving ACK.  This algorithm
+   opens cwnd by roughly 1 segment per RTT if the receiver ACKs each
+   incoming segment and no ACK loss occurs.  However, if the receiver
+   implements delayed ACKs [Bra89], the receiver returns roughly half as
+   many ACKs, which causes the sender to open cwnd more conservatively
+   (by approximately 1 segment every second RTT).  The approach that
+   this document suggests is to store the number of bytes that have been
+   ACKed in a "bytes_acked" variable in the TCP control block.  When
+   bytes_acked becomes greater than or equal to the value of the
+   congestion window, bytes_acked is reduced by the value of cwnd.
+   Next, cwnd is incremented by a full-sized segment (SMSS).  The
+   algorithm suggested above is specifically allowed by RFC 2581 during
+   congestion avoidance because it opens the window by at most 1 segment
+   per RTT.
+
+2.2 Slow Start
+
+   RFC 2581 states that the sender increments the congestion window by
+   at most, 1*SMSS bytes for each arriving acknowledgment during slow
+   start.  This document proposes that a TCP sender SHOULD increase cwnd
+   by the number of previously unacknowledged bytes ACKed by each
+   incoming acknowledgment, provided the increase is not more than L
+   bytes.  Choosing the limit on the increase, L, is discussed in the
+   next subsection.  When the number of previously unacknowledged bytes
+   ACKed is less than or equal to 1*SMSS bytes, or L is less than or
+   equal to 1*SMSS bytes, this proposal is no more aggressive (and
+   possibly less aggressive) than allowed by RFC 2581.  However,
+   increasing cwnd by more than 1*SMSS bytes in response to a single ACK
+   is more aggressive than allowed by RFC 2581.  The more aggressive
+   version of the slow start algorithm still falls within the spirit of
+   the principles outlined in [Jac88] (i.e., of no more than doubling
+   the cwnd per RTT), and this document proposes ABC for experimentation
+   in shared networks, provided an appropriate limit is applied (see
+   next section).
+
+2.3 Choosing the Limit
+
+   The limit, L, chosen for the cwnd increase during slow start,
+   controls the aggressiveness of the algorithm.  Choosing L=1*SMSS
+   bytes provides behavior that is no more aggressive than allowed by
+   RFC 2581.  However, ABC with L=1*SMSS bytes is more conservative in a
+
+
+
+
+
+Allman                        Experimental                      [Page 3]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+   number of key ways (as discussed in the next section) and therefore,
+   this document suggests that even though with L=1*SMSS bytes TCP
+   stacks will see little performance change, ABC SHOULD be used.
+
+   A very large L could potentially lead to large line-rate bursts of
+   traffic in the face of a large amount of ACK loss or in the case when
+   the receiver sends "stretch ACKs" (ACKs for more than the two full-
+   sized segments allowed by the delayed ACK algorithm) [Pax97].
+
+   This document specifies that TCP implementations MAY use L=2*SMSS
+   bytes and MUST NOT use L > 2*SMSS bytes.  This choice balances
+   between being conservative (L=1*SMSS bytes) and being potentially
+   very aggressive.  In addition, L=2*SMSS bytes exactly balances the
+   negative impact of the delayed ACK algorithm (as discussed in more
+   detail in section 3.2).  Note that when L=2*SMSS bytes cwnd growth is
+   roughly the same as the case when the standard algorithms are used in
+   conjunction with a receiver that transmits an ACK for each incoming
+   segment [All98] (assuming no or small amounts of ACK loss in both
+   cases).
+
+   The exception to the above suggestion is during a slow start phase
+   that follows a retransmission timeout (RTO).  In this situation, a
+   TCP MUST use L=1*SMSS as specified in RFC 2581 since ACKs for large
+   amounts of previously unacknowledged data are common during this
+   phase of a transfer.  These ACKs do not necessarily indicate how much
+   data has left the network in the last RTT, and therefore ABC cannot
+   accurately determine how much to increase cwnd.  As an example, say
+   segment N is dropped by the network, and segments N+1 and N+2 arrive
+   successfully at the receiver.  The sender will receive only two
+   duplicate ACKs and therefore must rely on the retransmission timer
+   (RTO) to detect the loss.  When the RTO expires, segment N is
+   retransmitted.  The ACK sent in response to the retransmission will
+   be for segment N+2.  However, this ACK does not indicate that three
+   segments have left the network in the last RTT, but rather only a
+   single segment left the network.  Therefore, the appropriate cwnd
+   increment is at most 1*SMSS bytes.
+
+2.4 RTO Implications
+
+   [Jac88] shows that increases in cwnd of more than a factor of two in
+   succeeding RTTs can cause spurious retransmissions on slow links
+   where the bandwidth dominates the RTT, assuming the RTO estimator
+   given in [Jac88] and [RFC2988].  ABC stays within this limit of no
+   more than doubling cwnd in successive RTTs by capping the increase
+   (no matter what L is employed) by the number of previously
+   unacknowledged bytes covered by each incoming ACK.
+
+
+
+
+
+Allman                        Experimental                      [Page 4]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+3   Advantages
+
+   This section outlines several advantages of using the ABC algorithm
+   to increase cwnd, rather than the standard ACK counting algorithm
+   given in [RFC2581].
+
+3.1 More Appropriate Congestion Window Increase
+
+   The ABC algorithm outlined in section 2 increases TCP's cwnd in
+   proportion to the amount of data actually sent into the network.  ACK
+   counting, on the other hand, increments cwnd by a constant upon the
+   arrival of each ACK.  For instance, consider an interactive telnet
+   connection (e.g., ssh or telnet) in which ACKs generally cover only a
+   few bytes of data, but cwnd is increased by 1*SMSS bytes for each ACK
+   received.  When a large amount of data needs to be transmitted (e.g.,
+   displaying a large file) the data is sent in one large burst because
+   the cwnd grows by 1*SMSS bytes per ACK rather than based on the
+   actual amount of capacity used.  Such a line-rate burst of data can
+   potentially cause a large amount of segment loss.
+
+   Congestion Window Validation (CWV) [RFC2861] addresses the above
+   problem as well.  CWV limits the amount of unused cwnd a TCP
+   connection can accumulate.  ABC can be used in conjunction with CWV
+   to obtain an accurate measure of the network path.
+
+3.2 Mitigate the Impact of Delayed ACKs and Lost ACKs
+
+   Delayed ACKs [RFC1122,RFC2581] allow a TCP receiver to refrain from
+   sending an ACK for each incoming segment.  However, a receiver SHOULD
+   send an ACK for every second full-sized segment that arrives.
+   Furthermore, a receiver MUST NOT withhold an ACK for more than 500
+   ms.  By reducing the number of ACKs sent to the data originator the
+   receiver is slowing the growth of the congestion window under an ACK
+   counting system.  Using ABC with L=2*SMSS bytes can roughly negate
+   the negative impact imposed by delayed ACKs by allowing cwnd to be
+   increased for ACKs that are withheld by the receiver.  This allows
+   the congestion window to grow in a manner similar to the case when
+   the receiver ACKs each incoming segment, but without adding extra
+   traffic to the network.  Simulation studies have shown increased
+   throughput when a TCP sender uses ABC when compared to the standard
+   ACK counting algorithm [All99], especially for short transfers that
+   never leave the initial slow start period.
+
+   Note that delayed ACKs should not be an issue during slow start-based
+   loss recovery, as RFC 2581 recommends that receivers should not delay
+   ACKs that cover out-of-order segments.  Therefore, as discussed
+   above, ABC with L > 1*SMSS bytes is inappropriate for such slow start
+   based loss recovery and MUST NOT be used.
+
+
+
+Allman                        Experimental                      [Page 5]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+   Note: In the case when an entire window of data is lost, a TCP
+   receiver will likely generate delayed ACKs and an L > 1*SMSS bytes
+   would be safe.  However, detecting this scenario is difficult.
+   Therefore to keep ABC conservative, this document mandates that L
+   MUST NOT be > 1*SMSS bytes in any slow start-based loss recovery.
+
+   ACK loss can also retard the growth of a congestion window that
+   increases based on the number of ACKs that arrive.  When counting
+   ACKs, dropped ACKs represent forever-missed opportunities to increase
+   cwnd.  Using ABC with L > 1*SMSS bytes allows the sender to mitigate
+   the effect of lost ACKs.
+
+3.3 Prevents Attacks from Misbehaving Receivers
+
+   [SCWA99] outlines several methods for a receiver to induce a TCP
+   sender into violating congestion control and transmitting data at a
+   potentially inappropriate rate.  One of the outlined attacks is "ACK
+   Division".  This scheme involves the receiver sending multiple ACKs
+   for each incoming data segment, each ACKing only a small portion of
+   the original TCP data segment.  Since TCP senders have traditionally
+   used ACK counting to increase cwnd, ACK division causes
+   inappropriately rapid cwnd growth and, in turn, a potentially
+   inappropriate sending rate.  A TCP sender that uses ABC can prevent
+   this attack from being used to undermine standard congestion control
+   because the cwnd increase is based on the number of bytes ACKed,
+   rather than the number of ACKs received.
+
+   To prevent misbehaving receivers from inducing inappropriate sender
+   behavior, this document suggests TCP implementations use ABC, even if
+   L=1*SMSS bytes (i.e., not allowing ABC to provide more aggressive
+   cwnd growth than allowed by RFC 2581).
+
+4   Disadvantages
+
+   The main disadvantages of using ABC with L=2*SMSS bytes are an
+   increase in the burstiness of TCP and a small increase in the overall
+   loss rate.  [All98] discusses the two ways that ABC increases the
+   burstiness of the TCP sender.  First, the "micro burstiness" of the
+   connection is increased.  In other words, the number of segments sent
+   in response to each incoming ACK is increased by at most 1 segment
+   when using ABC with L=2*SMSS bytes in conjunction with a receiver
+   that is sending delayed ACKs.  During slow start this translates into
+   an increase from sending 2 back-to-back segments to sending 3 back-
+   to-back packets in response to an ACK for a single packet.  Or, an
+   increase from 3 packets to 4 packets when receiving a delayed ACK for
+   two outstanding packets.  Note that ACK loss can cause larger bursts.
+   However, ABC only increases the burst size by at most 1*SMSS bytes
+   per ACK received when compared to the standard behavior.  This slight
+
+
+
+Allman                        Experimental                      [Page 6]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+   increase in the burstiness should only cause problems for devices
+   that have very small buffers.  In addition, ABC increases the "macro
+   burstiness" of the TCP sender in response to delayed ACKs in slow
+   start.  Rather than increasing cwnd by roughly 1.5 times per RTT, ABC
+   roughly doubles the congestion window every RTT.  However, doubling
+   cwnd every RTT fits within the spirit of slow start, as originally
+   outlined [Jac88].
+
+   With the increased burstiness comes a modest increase in the loss
+   rate for a TCP connection employing ABC (see the next section for a
+   short discussion on the fairness of ABC to non-ABC flows).  The
+   additional loss can be directly attributable to the increased
+   aggressiveness of ABC.  During slow start cwnd is increased more
+   rapidly.  Therefore when loss occurs cwnd is larger and more drops
+   are likely.  Similarly, a congestion avoidance cycle takes roughly
+   half, as long when using ABC and delayed ACKs when compared to an ACK
+   counting implementation.  In other words, a TCP sender reaches the
+   capacity of the network path, drops a packet and reduces the
+   congestion window by half roughly twice as often when using ABC.
+   However, as discussed above, in spite of the additional loss an ABC
+   TCP sender generally obtains better overall performance than a non-
+   ABC TCP [All99].
+
+   Due to the increase in the packet drop rate we suggest ABC be
+   implemented in conjunction with selective acknowledgments [RFC2018].
+
+5   Fairness Considerations
+
+   [All99] presents several simple simulations conducted to measure the
+   impact of ABC on competing traffic (both ABC and non-ABC).  The
+   experiments show that while ABC increases the drop rate for the
+   connection using ABC, competing traffic is not greatly effected.  The
+   experiments show that standard TCP and ABC both obtain roughly the
+   same throughput, regardless of the variant of the competing traffic.
+   The simulations also reaffirm that ABC outperforms non-ABC TCP in an
+   environment with varying types of TCP connections.  On the other
+   hand, the simulations presented in [All99] are not necessarily
+   realistic.  Therefore we are encouraging more experimentation in the
+   Internet.
+
+6   Security Considerations
+
+   As discussed in section 3.3, ABC protects a TCP sender from a
+   misbehaving receiver that induces the sender into transmitting at an
+   inappropriate rate with an "ACK division" attack.  This, in turn,
+   protects the network from an overly aggressive sender.
+
+
+
+
+
+Allman                        Experimental                      [Page 7]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+7   Conclusions
+
+   This document RECOMMENDS that all TCP stacks be modified to use ABC
+   with L=1*SMSS bytes.  This change does not increase the
+   aggressiveness of TCP.  Furthermore, simulations of ABC with L=2*SMSS
+   bytes show a promising performance improvement that we encourage
+   researchers to experiment with in the Internet.
+
+Acknowledgments
+
+   This document has benefited from discussions with and encouragement
+   from Sally Floyd.  Van Jacobson and Reiner Ludwig provided valuable
+   input on the implications of byte counting on the RTO.  Reiner Ludwig
+   and Kostas Pentikousis provided valuable feedback on a draft of this
+   document.
+
+Normative References
+
+   [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts --
+             Communication Layers", STD 3, RFC 1122, October 1989.
+
+   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+             Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+             Control", RFC 2581, April 1999.
+
+Informative References
+
+   [All98]   Mark Allman.  On the Generation and Use of TCP
+             Acknowledgments. ACM Computer Communication Review, 29(3),
+             July 1998.
+
+   [All99]   Mark Allman.  TCP Byte Counting Refinements. ACM Computer
+             Communication Review, 29(3), July 1999.
+
+   [Jac88]   Van Jacobson.  Congestion Avoidance and Control.  ACM
+             SIGCOMM 1988.
+
+   [Pax97]   Vern Paxson.  Automated Packet Trace Analysis of TCP
+             Implementations.  ACM SIGCOMM, September 1997.
+
+   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
+             Selective Acknowledgment Options", RFC 2018, October 1996.
+
+   [RFC2861] Handley, M., Padhye, J. and S. Floyd, "TCP Congestion
+             Window Validation", RFC 2861, June 2000.
+
+
+
+
+Allman                        Experimental                      [Page 8]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+   [SCWA99]  Stefan Savage, Neal Cardwell, David Wetherall, Tom
+             Anderson.  TCP Congestion Control with a Misbehaving
+             Receiver.  ACM Computer Communication Review, 29(5),
+             October 1999.
+
+Author's Address
+
+   Mark Allman
+   BBN Technologies/NASA Glenn Research Center
+   Lewis Field
+   21000 Brookpark Rd.  MS 54-5
+   Cleveland, OH  44135
+
+   Fax: 216-433-8705
+   Phone: 216-433-6586
+   EMail: mallman@bbn.com
+   http://roland.grc.nasa.gov/~mallman
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman                        Experimental                      [Page 9]
+
+RFC 3465            TCP Congestion Control with ABC        February 2003
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman                        Experimental                     [Page 10]
+
--- a/kernel/picotcp/RFC/rfc3481.txt
+++ b/kernel/picotcp/RFC/rfc3481.txt
--- a/kernel/picotcp/RFC/rfc3493.txt
+++ b/kernel/picotcp/RFC/rfc3493.txt
--- a/kernel/picotcp/RFC/rfc3517.txt
+++ b/kernel/picotcp/RFC/rfc3517.txt
@ -0,0 +1,731 @@
+
+
+
+
+
+
+Network Working Group                                         E. Blanton
+Request for Comments: 3517                             Purdue University
+Category: Standards Track                                      M. Allman
+                                                            BBN/NASA GRC
+                                                                 K. Fall
+                                                          Intel Research
+                                                                 L. Wang
+                                                  University of Kentucky
+                                                              April 2003
+
+
+         A Conservative Selective Acknowledgment (SACK)-based
+                    Loss Recovery Algorithm for TCP
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+Abstract
+
+   This document presents a conservative loss recovery algorithm for TCP
+   that is based on the use of the selective acknowledgment (SACK) TCP
+   option.  The algorithm presented in this document conforms to the
+   spirit of the current congestion control specification (RFC 2581),
+   but allows TCP senders to recover more effectively when multiple
+   segments are lost from a single flight of data.
+
+Terminology
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in BCP 14, RFC 2119
+   [RFC2119].
+
+
+
+
+
+
+
+
+
+
+Blanton, et al.             Standards Track                     [Page 1]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+1   Introduction
+
+   This document presents a conservative loss recovery algorithm for TCP
+   that is based on the use of the selective acknowledgment (SACK) TCP
+   option.  While the TCP SACK [RFC2018] is being steadily deployed in
+   the Internet [All00], there is evidence that hosts are not using the
+   SACK information when making retransmission and congestion control
+   decisions [PF01].  The goal of this document is to outline one
+   straightforward method for TCP implementations to use SACK
+   information to increase performance.
+
+   [RFC2581] allows advanced loss recovery algorithms to be used by TCP
+   [RFC793] provided that they follow the spirit of TCP's congestion
+   control algorithms [RFC2581, RFC2914].  [RFC2582] outlines one such
+   advanced recovery algorithm called NewReno.  This document outlines a
+   loss recovery algorithm that uses the SACK [RFC2018] TCP option to
+   enhance TCP's loss recovery.  The algorithm outlined in this
+   document, heavily based on the algorithm detailed in [FF96], is a
+   conservative replacement of the fast recovery algorithm [Jac90,
+   RFC2581].  The algorithm specified in this document is a
+   straightforward SACK-based loss recovery strategy that follows the
+   guidelines set in [RFC2581] and can safely be used in TCP
+   implementations.  Alternate SACK-based loss recovery methods can be
+   used in TCP as implementers see fit (as long as the alternate
+   algorithms follow the guidelines provided in [RFC2581]).  Please
+   note, however, that the SACK-based decisions in this document (such
+   as what segments are to be sent at what time) are largely decoupled
+   from the congestion control algorithms, and as such can be treated as
+   separate issues if so desired.
+
+2   Definitions
+
+   The reader is expected to be familiar with the definitions given in
+   [RFC2581].
+
+   The reader is assumed to be familiar with selective acknowledgments
+   as specified in [RFC2018].
+
+   For the purposes of explaining the SACK-based loss recovery algorithm
+   we define four variables that a TCP sender stores:
+
+      "HighACK" is the sequence number of the highest byte of data that
+      has been cumulatively ACKed at a given point.
+
+      "HighData" is the highest sequence number transmitted at a given
+      point.
+
+
+
+
+
+Blanton, et al.             Standards Track                     [Page 2]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+      "HighRxt" is the highest sequence number which has been
+      retransmitted during the current loss recovery phase.
+
+      "Pipe" is a sender's estimate of the number of bytes outstanding
+      in the network.  This is used during recovery for limiting the
+      sender's sending rate.  The pipe variable allows TCP to use a
+      fundamentally different congestion control than specified in
+      [RFC2581].  The algorithm is often referred to as the "pipe
+      algorithm".
+
+   For the purposes of this specification we define a "duplicate
+   acknowledgment" as a segment that arrives with no data and an
+   acknowledgment (ACK) number that is equal to the current value of
+   HighACK, as described in [RFC2581].
+
+   We define a variable "DupThresh" that holds the number of duplicate
+   acknowledgments required to trigger a retransmission.  Per [RFC2581]
+   this threshold is defined to be 3 duplicate acknowledgments.
+   However, implementers should consult any updates to [RFC2581] to
+   determine the current value for DupThresh (or method for determining
+   its value).
+
+   Finally, a range of sequence numbers [A,B] is said to "cover"
+   sequence number S if A <= S <= B.
+
+3   Keeping Track of SACK Information
+
+   For a TCP sender to implement the algorithm defined in the next
+   section it must keep a data structure to store incoming selective
+   acknowledgment information on a per connection basis.  Such a data
+   structure is commonly called the "scoreboard".  The specifics of the
+   scoreboard data structure are out of scope for this document (as long
+   as the implementation can perform all functions required by this
+   specification).
+
+   Note that this document refers to keeping account of (marking)
+   individual octets of data transferred across a TCP connection.  A
+   real-world implementation of the scoreboard would likely prefer to
+   manage this data as sequence number ranges.  The algorithms presented
+   here allow this, but require arbitrary sequence number ranges to be
+   marked as having been selectively acknowledged.
+
+
+
+
+
+
+
+
+
+
+Blanton, et al.             Standards Track                     [Page 3]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+4   Processing and Acting Upon SACK Information
+
+   For the purposes of the algorithm defined in this document the
+   scoreboard SHOULD implement the following functions:
+
+   Update ():
+
+      Given the information provided in an ACK, each octet that is
+      cumulatively ACKed or SACKed should be marked accordingly in the
+      scoreboard data structure, and the total number of octets SACKed
+      should be recorded.
+
+      Note: SACK information is advisory and therefore SACKed data MUST
+      NOT be removed from TCP's retransmission buffer until the data is
+      cumulatively acknowledged [RFC2018].
+
+   IsLost (SeqNum):
+
+      This routine returns whether the given sequence number is
+      considered to be lost.  The routine returns true when either
+      DupThresh discontiguous SACKed sequences have arrived above
+      'SeqNum' or (DupThresh * SMSS) bytes with sequence numbers greater
+      than 'SeqNum' have been SACKed.  Otherwise, the routine returns
+      false.
+
+   SetPipe ():
+
+      This routine traverses the sequence space from HighACK to HighData
+      and MUST set the "pipe" variable to an estimate of the number of
+      octets that are currently in transit between the TCP sender and
+      the TCP receiver.  After initializing pipe to zero the following
+      steps are taken for each octet 'S1' in the sequence space between
+      HighACK and HighData that has not been SACKed:
+
+      (a) If IsLost (S1) returns false:
+
+         Pipe is incremented by 1 octet.
+
+         The effect of this condition is that pipe is incremented for
+         packets that have not been SACKed and have not been determined
+         to have been lost (i.e., those segments that are still assumed
+         to be in the network).
+
+      (b) If S1 <= HighRxt:
+
+         Pipe is incremented by 1 octet.
+
+
+
+
+
+Blanton, et al.             Standards Track                     [Page 4]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+         The effect of this condition is that pipe is incremented for
+         the retransmission of the octet.
+
+      Note that octets retransmitted without being considered lost are
+      counted twice by the above mechanism.
+
+   NextSeg ():
+
+      This routine uses the scoreboard data structure maintained by the
+      Update() function to determine what to transmit based on the SACK
+      information that has arrived from the data receiver (and hence
+      been marked in the scoreboard).  NextSeg () MUST return the
+      sequence number range of the next segment that is to be
+      transmitted, per the following rules:
+
+      (1) If there exists a smallest unSACKed sequence number 'S2' that
+          meets the following three criteria for determining loss, the
+          sequence range of one segment of up to SMSS octets starting
+          with S2 MUST be returned.
+
+          (1.a) S2 is greater than HighRxt.
+
+          (1.b) S2 is less than the highest octet covered by any
+                received SACK.
+
+          (1.c) IsLost (S2) returns true.
+
+      (2) If no sequence number 'S2' per rule (1) exists but there
+          exists available unsent data and the receiver's advertised
+          window allows, the sequence range of one segment of up to SMSS
+          octets of previously unsent data starting with sequence number
+          HighData+1 MUST be returned.
+
+      (3) If the conditions for rules (1) and (2) fail, but there exists
+          an unSACKed sequence number 'S3' that meets the criteria for
+          detecting loss given in steps (1.a) and (1.b) above
+          (specifically excluding step (1.c)) then one segment of up to
+          SMSS octets starting with S3 MAY be returned.
+
+          Note that rule (3) is a sort of retransmission "last resort".
+          It allows for retransmission of sequence numbers even when the
+          sender has less certainty a segment has been lost than as with
+          rule (1).  Retransmitting segments via rule (3) will help
+          sustain TCP's ACK clock and therefore can potentially help
+          avoid retransmission timeouts.  However, in sending these
+          segments the sender has two copies of the same data considered
+          to be in the network (and also in the Pipe estimate).  When an
+          ACK or SACK arrives covering this retransmitted segment, the
+
+
+
+Blanton, et al.             Standards Track                     [Page 5]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+          sender cannot be sure exactly how much data left the network
+          (one of the two transmissions of the packet or both
+          transmissions of the packet).  Therefore the sender may
+          underestimate Pipe by considering both segments to have left
+          the network when it is possible that only one of the two has.
+
+          We believe that the triggering of rule (3) will be rare and
+          that the implications are likely limited to corner cases
+          relative to the entire recovery algorithm.  Therefore we leave
+          the decision of whether or not to use rule (3) to
+          implementors.
+
+      (4) If the conditions for each of (1), (2), and (3) are not met,
+          then NextSeg () MUST indicate failure, and no segment is
+          returned.
+
+   Note: The SACK-based loss recovery algorithm outlined in this
+   document requires more computational resources than previous TCP loss
+   recovery strategies.  However, we believe the scoreboard data
+   structure can be implemented in a reasonably efficient manner (both
+   in terms of computation complexity and memory usage) in most TCP
+   implementations.
+
+5   Algorithm Details
+
+   Upon the receipt of any ACK containing SACK information, the
+   scoreboard MUST be updated via the Update () routine.
+
+   Upon the receipt of the first (DupThresh - 1) duplicate ACKs, the
+   scoreboard is to be updated as normal.  Note: The first and second
+   duplicate ACKs can also be used to trigger the transmission of
+   previously unsent segments using the Limited Transmit algorithm
+   [RFC3042].
+
+   When a TCP sender receives the duplicate ACK corresponding to
+   DupThresh ACKs, the scoreboard MUST be updated with the new SACK
+   information (via Update ()).  If no previous loss event has occurred
+   on the connection or the cumulative acknowledgment point is beyond
+   the last value of RecoveryPoint, a loss recovery phase SHOULD be
+   initiated, per the fast retransmit algorithm outlined in [RFC2581].
+   The following steps MUST be taken:
+
+   (1) RecoveryPoint = HighData
+
+       When the TCP sender receives a cumulative ACK for this data octet
+       the loss recovery phase is terminated.
+
+
+
+
+
+Blanton, et al.             Standards Track                     [Page 6]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+   (2) ssthresh = cwnd = (FlightSize / 2)
+
+       The congestion window (cwnd) and slow start threshold (ssthresh)
+       are reduced to half of FlightSize per [RFC2581].
+
+   (3) Retransmit the first data segment presumed dropped -- the segment
+       starting with sequence number HighACK + 1.  To prevent repeated
+       retransmission of the same data, set HighRxt to the highest
+       sequence number in the retransmitted segment.
+
+   (4) Run SetPipe ()
+
+       Set a "pipe" variable  to the number of outstanding octets
+       currently "in the pipe"; this is the data which has been sent by
+       the TCP sender but for which no cumulative or selective
+       acknowledgment has been received and the data has not been
+       determined to have been dropped in the network.  It is assumed
+       that the data is still traversing the network path.
+
+   (5) In order to take advantage of potential additional available
+       cwnd, proceed to step (C) below.
+
+   Once a TCP is in the loss recovery phase the following procedure MUST
+   be used for each arriving ACK:
+
+   (A) An incoming cumulative ACK for a sequence number greater than
+       RecoveryPoint signals the end of loss recovery and the loss
+       recovery phase MUST be terminated.  Any information contained in
+       the scoreboard for sequence numbers greater than the new value of
+       HighACK SHOULD NOT be cleared when leaving the loss recovery
+       phase.
+
+   (B) Upon receipt of an ACK that does not cover RecoveryPoint the
+       following actions MUST be taken:
+
+       (B.1) Use Update () to record the new SACK information conveyed
+       by the incoming ACK.
+
+       (B.2) Use SetPipe () to re-calculate the number of octets still
+       in the network.
+
+   (C) If cwnd - pipe >= 1 SMSS the sender SHOULD transmit one or more
+       segments as follows:
+
+       (C.1) The scoreboard MUST be queried via NextSeg () for the
+       sequence number range of the next segment to transmit (if any),
+
+
+
+
+
+Blanton, et al.             Standards Track                     [Page 7]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+       and the given segment sent.  If NextSeg () returns failure (no
+       data to send) return without sending anything (i.e., terminate
+       steps C.1 -- C.5).
+
+       (C.2) If any of the data octets sent in (C.1) are below HighData,
+       HighRxt MUST be set to the highest sequence number of the
+       retransmitted segment.
+
+       (C.3) If any of the data octets sent in (C.1) are above HighData,
+       HighData must be updated to reflect the transmission of
+       previously unsent data.
+
+       (C.4) The estimate of the amount of data outstanding in the
+       network must be updated by incrementing pipe by the number of
+       octets transmitted in (C.1).
+
+       (C.5) If cwnd - pipe >= 1 SMSS, return to (C.1)
+
+5.1 Retransmission Timeouts
+
+   In order to avoid memory deadlocks, the TCP receiver is allowed to
+   discard data that has already been selectively acknowledged.  As a
+   result, [RFC2018] suggests that a TCP sender SHOULD expunge the SACK
+   information gathered from a receiver upon a retransmission timeout
+   "since the timeout might indicate that the data receiver has
+   reneged."  Additionally, a TCP sender MUST "ignore prior SACK
+   information in determining which data to retransmit."  However, a
+   SACK TCP sender SHOULD still use all SACK information made available
+   during the slow start phase of loss recovery following an RTO.
+
+   If an RTO occurs during loss recovery as specified in this document,
+   RecoveryPoint MUST be set to HighData.  Further, the new value of
+   RecoveryPoint MUST be preserved and the loss recovery algorithm
+   outlined in this document MUST be terminated.  In addition, a new
+   recovery phase (as described in section 5) MUST NOT be initiated
+   until HighACK is greater than or equal to the new value of
+   RecoveryPoint.
+
+   As described in Sections 4 and 5, Update () SHOULD continue to be
+   used appropriately upon receipt of ACKs.  This will allow the slow
+   start recovery period to benefit from all available information
+   provided by the receiver, despite the fact that SACK information was
+   expunged due to the RTO.
+
+   If there are segments missing from the receiver's buffer following
+   processing of the retransmitted segment, the corresponding ACK will
+   contain SACK information.  In this case, a TCP sender SHOULD use this
+   SACK information when determining what data should be sent in each
+
+
+
+Blanton, et al.             Standards Track                     [Page 8]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+   segment of the slow start.  The exact algorithm for this selection is
+   not specified in this document (specifically NextSeg () is
+   inappropriate during slow start after an RTO).  A relatively
+   straightforward approach to "filling in" the sequence space reported
+   as missing should be a reasonable approach.
+
+6   Managing the RTO Timer
+
+   The standard TCP RTO estimator is defined in [RFC2988].  Due to the
+   fact that the SACK algorithm in this document can have an impact on
+   the behavior of the estimator, implementers may wish to consider how
+   the timer is managed.  [RFC2988] calls for the RTO timer to be
+   re-armed each time an ACK arrives that advances the cumulative ACK
+   point.  Because the algorithm presented in this document can keep the
+   ACK clock going through a fairly significant loss event,
+   (comparatively longer than the algorithm described in [RFC2581]), on
+   some networks the loss event could last longer than the RTO.  In this
+   case the RTO timer would expire prematurely and a segment that need
+   not be retransmitted would be resent.
+
+   Therefore we give implementers the latitude to use the standard
+   [RFC2988] style RTO management or, optionally, a more careful variant
+   that re-arms the RTO timer on each retransmission that is sent during
+   recovery MAY be used.  This provides a more conservative timer than
+   specified in [RFC2988], and so may not always be an attractive
+   alternative.  However, in some cases it may prevent needless
+   retransmissions, go-back-N transmission and further reduction of the
+   congestion window.
+
+7   Research
+
+   The algorithm specified in this document is analyzed in [FF96], which
+   shows that the above algorithm is effective in reducing transfer time
+   over standard TCP Reno [RFC2581] when multiple segments are dropped
+   from a window of data (especially as the number of drops increases).
+   [AHKO97] shows that the algorithm defined in this document can
+   greatly improve throughput in connections traversing satellite
+   channels.
+
+8   Security Considerations
+
+   The algorithm presented in this paper shares security considerations
+   with [RFC2581].  A key difference is that an algorithm based on SACKs
+   is more robust against attackers forging duplicate ACKs to force the
+   TCP sender to reduce cwnd.  With SACKs, TCP senders have an
+   additional check on whether or not a particular ACK is legitimate.
+   While not fool-proof, SACK does provide some amount of protection in
+   this area.
+
+
+
+Blanton, et al.             Standards Track                     [Page 9]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+Acknowledgments
+
+   The authors wish to thank Sally Floyd for encouraging this document
+   and commenting on early drafts.  The algorithm described in this
+   document is loosely based on an algorithm outlined by Kevin Fall and
+   Sally Floyd in [FF96], although the authors of this document assume
+   responsibility for any mistakes in the above text.  Murali Bashyam,
+   Ken Calvert, Tom Henderson, Reiner Ludwig, Jamshid Mahdavi, Matt
+   Mathis, Shawn Ostermann, Vern Paxson and Venkat Venkatsubra provided
+   valuable feedback on earlier versions of this document.  We thank
+   Matt Mathis and Jamshid Mahdavi for implementing the scoreboard in ns
+   and hence guiding our thinking in keeping track of SACK state.
+
+   The first author would like to thank Ohio University and the Ohio
+   University Internetworking Research Group for supporting the bulk of
+   his work on this project.
+
+Normative References
+
+   [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
+             793, September 1981.
+
+   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
+             Selective Acknowledgment Options", RFC 2018, October 1996.
+
+   [RFC2026] Bradner, S., "The Internet Standards Process -- Revision
+             3", BCP 9, RFC 2026, October 1996.
+
+   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+             Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC2581] Allman, M., Paxson, V. and R. Stevens, "TCP Congestion
+             Control", RFC 2581, April 1999.
+
+Informative References
+
+   [AHKO97]  Mark Allman, Chris Hayes, Hans Kruse, Shawn Ostermann.  TCP
+             Performance Over Satellite Links.  Proceedings of the Fifth
+             International Conference on Telecommunications Systems,
+             Nashville, TN, March, 1997.
+
+   [All00]   Mark Allman.  A Web Server's View of the Transport Layer.
+             ACM Computer Communication Review, 30(5), October 2000.
+
+   [FF96]    Kevin Fall and Sally Floyd.  Simulation-based Comparisons
+             of Tahoe, Reno and SACK TCP.  Computer Communication
+             Review, July 1996.
+
+
+
+
+Blanton, et al.             Standards Track                    [Page 10]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+   [Jac90]   Van Jacobson.  Modified TCP Congestion Avoidance Algorithm.
+             Technical Report, LBL, April 1990.
+
+   [PF01]    Jitendra Padhye, Sally Floyd.  Identifying the TCP Behavior
+             of Web Servers, ACM SIGCOMM, August 2001.
+
+   [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to
+             TCP's Fast Recovery Algorithm", RFC 2582, April 1999.
+
+   [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC
+             2914, September 2000.
+
+   [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
+             Timer", RFC 2988, November 2000.
+
+   [RFC3042] Allman, M., Balakrishnan, H, and S. Floyd, "Enhancing TCP's
+             Loss Recovery Using Limited Transmit", RFC 3042, January
+             2001.
+
+Intellectual Property Rights Notice
+
+   The IETF takes no position regarding the validity or scope of any
+   intellectual property or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; neither does it represent that it
+   has made any effort to identify any such rights.  Information on the
+   IETF's procedures with respect to rights in standards-track and
+   standards-related documentation can be found in BCP-11.  Copies of
+   claims of rights made available for publication and any assurances of
+   licenses to be made available, or the result of an attempt made to
+   obtain a general license or permission for the use of such
+   proprietary rights by implementors or users of this specification can
+   be obtained from the IETF Secretariat.
+
+   The IETF invites any interested party to bring to its attention any
+   copyrights, patents or patent applications, or other proprietary
+   rights which may cover technology that may be required to practice
+   this standard.  Please address the information to the IETF Executive
+   Director.
+
+
+
+
+
+
+
+
+
+
+
+Blanton, et al.             Standards Track                    [Page 11]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+Authors' Addresses
+
+   Ethan Blanton
+   Purdue University Computer Sciences
+   1398 Computer Science Building
+   West Lafayette, IN  47907
+
+   EMail: eblanton@cs.purdue.edu
+
+
+   Mark Allman
+   BBN Technologies/NASA Glenn Research Center
+   Lewis Field
+   21000 Brookpark Rd.  MS 54-5
+   Cleveland, OH  44135
+
+   Phone: 216-433-6586
+   Fax: 216-433-8705
+   EMail: mallman@bbn.com
+   http://roland.grc.nasa.gov/~mallman
+
+
+   Kevin Fall
+   Intel Research
+   2150 Shattuck Ave., PH Suite
+   Berkeley, CA 94704
+
+   EMail: kfall@intel-research.net
+
+
+   Lili Wang
+   Laboratory for Advanced Networking
+   210 Hardymon Building
+   University of Kentucky
+   Lexington, KY 40506-0495
+
+   EMail: lwang0@uky.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Blanton, et al.             Standards Track                    [Page 12]
+
+RFC 3517            SACK-based Loss Recovery for TCP          April 2003
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Blanton, et al.             Standards Track                    [Page 13]
+
--- a/kernel/picotcp/RFC/rfc3522.txt
+++ b/kernel/picotcp/RFC/rfc3522.txt
@ -0,0 +1,787 @@
+
+
+
+
+
+
+Network Working Group                                          R. Ludwig
+Request for Comments: 3522                                      M. Meyer
+Category: Experimental                                 Ericsson Research
+                                                              April 2003
+
+
+                 The Eifel Detection Algorithm for TCP
+
+Status of this Memo
+
+   This memo defines an Experimental Protocol for the Internet
+   community.  It does not specify an Internet standard of any kind.
+   Discussion and suggestions for improvement are requested.
+   Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+Abstract
+
+   The Eifel detection algorithm allows a TCP sender to detect a
+   posteriori whether it has entered loss recovery unnecessarily.  It
+   requires that the TCP Timestamps option defined in RFC 1323 be
+   enabled for a connection.  The Eifel detection algorithm makes use of
+   the fact that the TCP Timestamps option eliminates the retransmission
+   ambiguity in TCP.  Based on the timestamp of the first acceptable ACK
+   that arrives during loss recovery, it decides whether loss recovery
+   was entered unnecessarily.  The Eifel detection algorithm provides a
+   basis for future TCP enhancements.  This includes response algorithms
+   to back out of loss recovery by restoring a TCP sender's congestion
+   control state.
+
+Terminology
+
+   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
+   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
+   document, are to be interpreted as described in [RFC2119].
+
+   We refer to the first-time transmission of an octet as the 'original
+   transmit'.  A subsequent transmission of the same octet is referred
+   to as a 'retransmit'.  In most cases, this terminology can likewise
+   be applied to data segments as opposed to octets.  However, with
+   repacketization, a segment can contain both first-time transmissions
+   and retransmissions of octets.  In that case, this terminology is
+   only consistent when applied to octets.  For the Eifel detection
+   algorithm, this makes no difference as it also operates correctly
+   when repacketization occurs.
+
+
+
+Ludwig & Meyer                Experimental                      [Page 1]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+   We use the term 'acceptable ACK' as defined in [RFC793].  That is an
+   ACK that acknowledges previously unacknowledged data.  We use the
+   term 'duplicate ACK', and the variable 'dupacks' as defined in
+   [WS95].  The variable 'dupacks' is a counter of duplicate ACKs that
+   have already been received by a TCP sender before the fast retransmit
+   is sent.  We use the variable 'DupThresh' to refer to the so-called
+   duplicate acknowledgement threshold, i.e., the number of duplicate
+   ACKs that need to arrive at a TCP sender to trigger a fast
+   retransmit.  Currently, DupThresh is specified as a fixed value of
+   three [RFC2581].  Future TCPs might implement an adaptive DupThresh.
+
+1. Introduction
+
+   The retransmission ambiguity problem [Zh86], [KP87] is a TCP sender's
+   inability to distinguish whether the first acceptable ACK that
+   arrives after a retransmit was sent in response to the original
+   transmit or the retransmit.  This problem occurs after a timeout-
+   based retransmit and after a fast retransmit.  The Eifel detection
+   algorithm uses the TCP Timestamps option defined in [RFC1323] to
+   eliminate the retransmission ambiguity.  It thereby allows a TCP
+   sender to detect a posteriori whether it has entered loss recovery
+   unnecessarily.
+
+   This added capability of a TCP sender is useful in environments where
+   TCP's loss recovery and congestion control algorithms may often get
+   falsely triggered.  This can be caused by packet reordering, packet
+   duplication, or a sudden delay increase in the data or the ACK path
+   that results in a spurious timeout.  For example, such sudden delay
+   increases can often occur in wide-area wireless access networks due
+   to handovers, resource preemption due to higher priority traffic
+   (e.g., voice), or because the mobile transmitter traverses through a
+   radio coverage hole (e.g., see [Gu01]).  In such wireless networks,
+   the often unnecessary go-back-N retransmits that typically occur
+   after a spurious timeout create a serious problem.  They decrease
+   end-to-end throughput, are useless load upon the network, and waste
+   transmission (battery) power.  Note that across such networks the use
+   of timestamps is recommended anyway [RFC3481].
+
+   Based on the Eifel detection algorithm, a TCP sender may then choose
+   to implement dedicated response algorithms.  One goal of such a
+   response algorithm would be to alleviate the consequences of a
+   falsely triggered loss recovery.  This may include restoring the TCP
+   sender's congestion control state, and avoiding the mentioned
+   unnecessary go-back-N retransmits.  Another goal would be to adapt
+   protocol parameters such as the duplicate acknowledgement threshold
+   [RFC2581], and the RTT estimators [RFC2988].  This is to reduce the
+   risk of falsely triggering TCP's loss recovery again as the
+   connection progresses.  However, such response algorithms are outside
+
+
+
+Ludwig & Meyer                Experimental                      [Page 2]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+   the scope of this document.  Note: The original proposal, the "Eifel
+   algorithm" [LK00], comprises both a detection and a response
+   algorithm.  This document only defines the detection part.  The
+   response part is defined in [LG03].
+
+   A key feature of the Eifel detection algorithm is that it already
+   detects, upon the first acceptable ACK that arrives during loss
+   recovery, whether a fast retransmit or a timeout was spurious.  This
+   is crucial to be able to avoid the mentioned go-back-N retransmits.
+   Another feature is that the Eifel detection algorithm is fairly
+   robust against the loss of ACKs.
+
+   Also the DSACK option [RFC2883] can be used to detect a posteriori
+   whether a TCP sender has entered loss recovery unnecessarily [BA02].
+   However, the first ACK carrying a DSACK option usually arrives at a
+   TCP sender only after loss recovery has already terminated.  Thus,
+   the DSACK option cannot be used to eliminate the retransmission
+   ambiguity.  Consequently, it cannot be used to avoid the mentioned
+   unnecessary go-back-N retransmits.  Moreover, a DSACK-based detection
+   algorithm is less robust against ACK losses.  A recent proposal based
+   on neither the TCP timestamps nor the DSACK option does not have the
+   limitation of DSACK-based schemes, but only addresses the case of
+   spurious timeouts [SK03].
+
+2. Events that Falsely Trigger TCP Loss Recovery
+
+   The following events may falsely trigger a TCP sender's loss recovery
+   and congestion control algorithms.  This causes a so-called spurious
+   retransmit, and an unnecessary reduction of the TCP sender's
+   congestion window and slow start threshold [RFC2581].
+
+      -  Spurious timeout
+
+      -  Packet reordering
+
+      -  Packet duplication
+
+   A spurious timeout is a timeout that would not have occurred had the
+   sender "waited longer".  This may be caused by increased delay that
+   suddenly occurs in the data and/or the ACK path.  That in turn might
+   cause an acceptable ACK to arrive too late, i.e., only after a TCP
+   sender's retransmission timer has expired.  For the purpose of
+   specifying the algorithm in Section 3, we define this case as SPUR_TO
+   (equal 1).
+
+      Note: There is another case where a timeout would not have
+      occurred had the sender "waited longer": the retransmission timer
+      expires, and afterwards the TCP sender receives the duplicate ACK
+
+
+
+Ludwig & Meyer                Experimental                      [Page 3]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+      that would have triggered a fast retransmit of the oldest
+      outstanding segment.  We call this a 'fast timeout', since in
+      competition with the fast retransmit algorithm the timeout was
+      faster.  However, a fast timeout is not spurious since apparently
+      a segment was in fact lost, i.e., loss recovery was initiated
+      rightfully.  In this document, we do not consider fast timeouts.
+
+   Packet reordering in the network may occur because IP [RFC791] does
+   not guarantee in-order delivery of packets.  Additionally, a TCP
+   receiver generates a duplicate ACK for each segment that arrives
+   out-of-order.  This results in a spurious fast retransmit if three or
+   more data segments arrive out-of-order at a TCP receiver, and at
+   least three of the resulting duplicate ACKs arrive at the TCP sender.
+   This assumes that the duplicate acknowledgement threshold is set to
+   three as defined in [RFC2581].
+
+   Packet duplication may occur because a receiving IP does not (cannot)
+   remove packets that have been duplicated in the network.  A TCP
+   receiver in turn also generates a duplicate ACK for each duplicate
+   segment.  As with packet reordering, this results in a spurious fast
+   retransmit if duplication of data segments or ACKs results in three
+   or more duplicate ACKs to arrive at a TCP sender.  Again, this
+   assumes that the duplicate acknowledgement threshold is set to three.
+
+   The negative impact on TCP performance caused by packet reordering
+   and packet duplication is commonly the same: a single spurious
+   retransmit (the fast retransmit), and the unnecessary halving of a
+   TCP sender's congestion window as a result of the subsequent fast
+   recovery phase [RFC2581].
+
+   The negative impact on TCP performance caused by a spurious timeout
+   is more severe.  First, the timeout event itself causes a single
+   spurious retransmit, and unnecessarily forces a TCP sender into slow
+   start [RFC2581].  Then, as the connection progresses, a chain
+   reaction gets triggered that further decreases TCP's performance.
+   Since the timeout was spurious, at least some ACKs for original
+   transmits typically arrive at the TCP sender before the ACK for the
+   retransmit arrives.  (This is unless severe packet reordering
+   coincided with the spurious timeout in such a way that the ACK for
+   the retransmit is the first acceptable ACK to arrive at the TCP
+   sender.)  Those ACKs for original transmits then trigger an implicit
+   go-back-N loss recovery at the TCP sender [LK00].  Assuming that none
+   of the outstanding segments and none of the corresponding ACKs were
+   lost, all outstanding segments get retransmitted unnecessarily.  In
+   fact, during this phase, a TCP sender violates the packet
+   conservation principle [Jac88].  This is because the unnecessary go-
+   back-N retransmits are sent during slow start.  Thus, for each packet
+   that leaves the network and that belongs to the first half of the
+
+
+
+Ludwig & Meyer                Experimental                      [Page 4]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+   original flight, two useless retransmits are sent into the network.
+   In addition, some TCPs suffer from a spurious fast retransmit.  This
+   is because the unnecessary go-back-N retransmits arrive as duplicates
+   at the TCP receiver, which in turn triggers a series of duplicate
+   ACKs.  Note that this last spurious fast retransmit could be avoided
+   with the careful variant of 'bugfix' [RFC2582].
+
+   More detailed explanations, including TCP trace plots that visualize
+   the effects of spurious timeouts and packet reordering, can be found
+   in the original proposal [LK00].
+
+3. The Eifel Detection Algorithm
+
+3.1 The Idea
+
+   The goal of the Eifel detection algorithm is to allow a TCP sender to
+   detect a posteriori whether it has entered loss recovery
+   unnecessarily.  Furthermore, the TCP sender should be able to make
+   this decision upon the first acceptable ACK that arrives after the
+   timeout-based retransmit or the fast retransmit has been sent.  This
+   in turn requires extra information in ACKs by which the TCP sender
+   can unambiguously distinguish whether that first acceptable ACK was
+   sent in response to the original transmit or the retransmit.  Such
+   extra information is provided by the TCP Timestamps option [RFC1323].
+   Generally speaking, timestamps are monotonously increasing "serial
+   numbers" added into every segment that are then echoed within the
+   corresponding ACKs.  This is exploited by the Eifel detection
+   algorithm in the following way.
+
+   Given that timestamps are enabled for a connection, a TCP sender
+   always stores the timestamp of the retransmit sent in the beginning
+   of loss recovery, i.e., the timestamp of the timeout-based retransmit
+   or the fast retransmit.  If the timestamp of the first acceptable
+   ACK, that arrives after the retransmit was sent, is smaller then the
+   stored timestamp of that retransmit, then that ACK must have been
+   sent in response to an original transmit.  Hence, the TCP sender must
+   have entered loss recovery unnecessarily.
+
+   The fact that the Eifel detection algorithm decides upon the first
+   acceptable ACK is crucial to allow future response algorithms to
+   avoid the unnecessary go-back-N retransmits that typically occur
+   after a spurious timeout.  Also, if loss recovery was entered
+   unnecessarily, a window worth of ACKs are outstanding that all carry
+   a timestamp that is smaller than the stored timestamp of the
+   retransmit.  The arrival of any one of those ACKs is sufficient for
+   the Eifel detection algorithm to work.  Hence, the solution is fairly
+
+
+
+
+
+Ludwig & Meyer                Experimental                      [Page 5]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+   robust against ACK losses.  Even the ACK sent in response to the
+   retransmit, i.e., the one that carries the stored timestamp, may get
+   lost without compromising the algorithm.
+
+3.2 The Algorithm
+
+   Given that the TCP Timestamps option [RFC1323] is enabled for a
+   connection, a TCP sender MAY use the Eifel detection algorithm as
+   defined in this subsection.
+
+   If the Eifel detection algorithm is used, the following steps MUST be
+   taken by a TCP sender, but only upon initiation of loss recovery,
+   i.e., when either the timeout-based retransmit or the fast retransmit
+   is sent.  The Eifel detection algorithm MUST NOT be reinitiated after
+   loss recovery has already started.  In particular, it must not be
+   reinitiated upon subsequent timeouts for the same segment, and not
+   upon retransmitting segments other than the oldest outstanding
+   segment, e.g., during selective loss recovery.
+
+      (1)     Set a "SpuriousRecovery" variable to FALSE (equal 0).
+
+      (2)     Set a "RetransmitTS" variable to the value of the
+              Timestamp Value field of the Timestamps option included in
+              the retransmit sent when loss recovery is initiated.  A
+              TCP sender must ensure that RetransmitTS does not get
+              overwritten as loss recovery progresses, e.g., in case of
+              a second timeout and subsequent second retransmit of the
+              same octet.
+
+      (3)     Wait for the arrival of an acceptable ACK.  When an
+              acceptable ACK has arrived, proceed to step (4).
+
+      (4)     If the value of the Timestamp Echo Reply field of the
+              acceptable ACK's Timestamps option is smaller than the
+              value of RetransmitTS, then proceed to step (5),
+
+              else proceed to step (DONE).
+
+      (5)     If the acceptable ACK carries a DSACK option [RFC2883],
+              then proceed to step (DONE),
+
+              else if during the lifetime of the TCP connection the TCP
+              sender has previously received an ACK with a DSACK option,
+              or the acceptable ACK does not acknowledge all outstanding
+              data, then proceed to step (6),
+
+              else proceed to step (DONE).
+
+
+
+
+Ludwig & Meyer                Experimental                      [Page 6]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+      (6)     If the loss recovery has been initiated with a timeout-
+              based retransmit, then set
+                  SpuriousRecovery <- SPUR_TO (equal 1),
+
+              else set
+                  SpuriousRecovery <- dupacks+1
+
+      (RESP)  Do nothing (Placeholder for a response algorithm).
+
+      (DONE)  No further processing.
+
+   The comparison "smaller than" in step (4) is conservative.  In
+   theory, if the timestamp clock is slow or the network is fast,
+   RetransmitTS could at most be equal to the timestamp echoed by an ACK
+   sent in response to an original transmit.  In that case, it is
+   assumed that the loss recovery was not falsely triggered.
+
+   Note that the condition "if during the lifetime of the TCP connection
+   the TCP sender has previously received an ACK with a DSACK option" in
+   step (5) would be true in case the TCP receiver would signal in the
+   SYN that it is DSACK-enabled.  But unfortunately, this is not
+   required by [RFC2883].
+
+3.3 A Corner Case: "Timeout due to loss of all ACKs" (step 5)
+
+   Even though the oldest outstanding segment arrived at a TCP receiver,
+   the TCP sender is forced into a timeout if all ACKs are lost.
+   Although the resulting retransmit is unnecessary, such a timeout is
+   unavoidable.  It should therefore not be considered spurious.
+   Moreover, the subsequent reduction of the congestion window is an
+   appropriate response to the potentially heavy congestion in the ACK
+   path.  The original proposal [LK00] does not handle this case well.
+   It effectively disables this implicit form of congestion control for
+   the ACK path, which otherwise does not exist in TCP.  This problem is
+   fixed by step (5) of the Eifel detection algorithm as explained in
+   the remainder of this section.
+
+   If all ACKs are lost while the oldest outstanding segment arrived at
+   the TCP receiver, the retransmit arrives as a duplicate.  In response
+   to duplicates, RFC 1323 mandates that the timestamp of the last
+   segment that arrived in-sequence should be echoed.  That timestamp is
+   carried by the first acceptable ACK that arrives at the TCP sender
+   after loss recovery was entered, and is commonly smaller than the
+   timestamp carried by the retransmit.  Consequently, the Eifel
+   detection algorithm misinterprets such a timeout as being spurious,
+   unless the TCP receiver is DSACK-enabled [RFC2883].  In that case,
+   the acceptable ACK carries a DSACK option, and the Eifel algorithm is
+   terminated through the first part of step (5).
+
+
+
+Ludwig & Meyer                Experimental                      [Page 7]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+      Note: Not all TCP implementations strictly follow RFC 1323.  In
+      response to a duplicate data segment, some TCP receivers echo the
+      timestamp of the duplicate.  With such TCP receivers, the corner
+      case discussed in this section does not apply.  The timestamp
+      carried by the retransmit would be echoed in the first acceptable
+      ACK, and the Eifel detection algorithm would be terminated through
+      step (4).  Thus, even though all ACKs were lost and independent of
+      whether the DSACK option was enabled for a connection, the Eifel
+      detection algorithm would have no effect.
+
+   With TCP receivers that are not DSACK-enabled, disabling the
+   mentioned implicit congestion control for the ACK path is not a
+   problem as long as data segments are lost, in addition to the entire
+   flight of ACKs.  The Eifel detection algorithm misinterprets such a
+   timeout as being spurious, and the Eifel response algorithm would
+   reverse the congestion control state.  Still, the TCP sender would
+   respond to congestion (in the data path) as soon as it finds out
+   about the first loss in the outstanding flight.  I.e., the TCP sender
+   would still halve its congestion window for that flight of packets.
+   If no data segment is lost while the entire flight of ACKs is lost,
+   the first acceptable ACK that arrives at the TCP sender after loss
+   recovery was entered acknowledges all outstanding data.  In that
+   case, the Eifel algorithm is terminated through the second part of
+   step (5).
+
+   Note that there is little concern about violating the packet
+   conservation principle when entering slow start after an unavoidable
+   timeout caused by the loss of an entire flight of ACKs, i.e., when
+   the Eifel detection algorithm was terminated through step (5).  This
+   is because in that case, the acceptable ACK corresponds to the
+   retransmit, which is a strong indication that the pipe has drained
+   entirely, i.e., that no more original transmits are in the network.
+   This is different with spurious timeouts as discussed in Section 2.
+
+3.4 Protecting Against Misbehaving TCP Receivers (the Safe Variant)
+
+   A TCP receiver can easily make a genuine retransmit appear to the TCP
+   sender as a spurious retransmit by forging echoed timestamps.  This
+   may pose a security concern.
+
+   Fortunately, there is a way to modify the Eifel detection algorithm
+   in a way that makes it robust against lying TCP receivers.  The idea
+   is to use timestamps as a segment's "secret" that a TCP receiver only
+   gets to know if it receives the segment.  Conversely, a TCP receiver
+   will not know the timestamp of a segment that was lost.  Hence, to
+   "prove" that it received the original transmit of a segment that a
+   TCP sender retransmitted, the TCP receiver would need to return the
+   timestamp of that original transmit.  The Eifel detection algorithm
+
+
+
+Ludwig & Meyer                Experimental                      [Page 8]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+   could then be modified to only decide that loss recovery has been
+   unnecessarily entered if the first acceptable ACK echoes the
+   timestamp of the original transmit.
+
+   Hence, implementers may choose to implement the algorithm with the
+   following modifications.
+
+   Step (2) is replaced with step (2'):
+
+      (2')    Set a "RetransmitTS" variable to the value of the
+              Timestamp Value field of the Timestamps option that was
+              included in the original transmit corresponding to the
+              retransmit.  Note: This step requires that the TCP sender
+              stores the timestamps of all outstanding original
+              transmits.
+
+   Step (4) is replaced with step (4'):
+
+      (4')    If the value of the Timestamp Echo Reply field of the
+              acceptable ACK's Timestamps option is equal to the value
+              of the variable RetransmitTS, then proceed to step (5),
+
+              else proceed to step (DONE).
+
+   These modifications come at a cost: the modified algorithm is fairly
+   sensitive against ACK losses since it relies on the arrival of the
+   acceptable ACK that corresponds to the original transmit.
+
+      Note: The first acceptable ACK that arrives after loss recovery
+      has been unnecessarily entered should echo the timestamp of the
+      original transmit.  This assumes that the ACK corresponding to the
+      original transmit was not lost, that that ACK was not reordered in
+      the network, and that the TCP receiver does not forge timestamps
+      but complies with RFC 1323.  In case of a spurious fast
+      retransmit, this is implied by the rules for generating ACKs for
+      data segments that fill in all or part of a gap in the sequence
+      space (see section 4.2 of [RFC2581]) and by the rules for echoing
+      timestamps in that case (see rule (C) in section 3.4 of
+      [RFC1323]).  In case of a spurious timeout, it is likely that the
+      delay that has caused the spurious timeout has also caused the TCP
+      receiver's delayed ACK timer [RFC1122] to expire before the
+      original transmit arrives.  Also, in this case the rules for
+      generating ACKs and the rules for echoing timestamps (see rule (A)
+      in section 3.4 of [RFC1323]) ensure that the original transmit's
+      timestamp is echoed.
+
+
+
+
+
+
+Ludwig & Meyer                Experimental                      [Page 9]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+   A remaining problem is that a TCP receiver might guess a lost
+   segment's timestamp from observing the timestamps of recently
+   received segments.  For example, if segment N was lost while segment
+   N-1 and N+1 have arrived, a TCP receiver could guess the timestamp
+   that lies in the middle of the timestamps of segments N-1 and N+1,
+   and echo it in the ACK sent in response to the retransmit of segment
+   N.  Especially if the TCP sender implements timestamps with a coarse
+   granularity, a misbehaving TCP receiver is likely to be successful
+   with such an approach.  In fact, with the 500 ms granularity
+   suggested in [WS95], it even becomes quite likely that the timestamps
+   of segments N-1, N, N+1 are identical.
+
+   One way to reduce this risk is to implement fine grained timestamps.
+   Note that the granularity of the timestamps is independent of the
+   granularity of the retransmission timer.  For example, some TCP
+   implementations run a timestamp clock that ticks every millisecond.
+   This should make it more difficult for a TCP receiver to guess the
+   timestamp of a lost segment.  Alternatively, it might be possible to
+   combine the timestamps with a nonce, as is done for the Explicit
+   Congestion Notification (ECN) [RFC3168].  One would need to take
+   care, though, that the timestamps of consecutive segments remain
+   monotonously increasing and do not interfere with the RTT timing
+   defined in [RFC1323].
+
+4. IPR Considerations
+
+   The IETF has been notified of intellectual property rights claimed in
+   regard to some or all of the specification contained in this
+   document.  For more information consult the online list of claimed
+   rights at http://www.ietf.org/ipr.
+
+   The IETF takes no position regarding the validity or scope of any
+   intellectual property or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; neither does it represent that it
+   has made any effort to identify any such rights.  Information on the
+   IETF's procedures with respect to rights in standards-track and
+   standards-related documentation can be found in BCP-11.  Copies of
+   claims of rights made available for publication and any assurances of
+   licenses to be made available, or the result of an attempt made to
+   obtain a general license or permission for the use of such
+   proprietary rights by implementors or users of this specification can
+   be obtained from the IETF Secretariat.
+
+
+
+
+
+
+
+Ludwig & Meyer                Experimental                     [Page 10]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+5. Security Considerations
+
+   There do not seem to be any security considerations associated with
+   the Eifel detection algorithm.  This is because the Eifel detection
+   algorithm does not alter the existing protocol state at a TCP sender.
+   Note that the Eifel detection algorithm only requires changes to the
+   implementation of a TCP sender.
+
+   Moreover, a variant of the Eifel detection algorithm has been
+   proposed in Section 3.4 that makes it robust against lying TCP
+   receivers.  This may become relevant when the Eifel detection
+   algorithm is combined with a response algorithm such as the Eifel
+   response algorithm [LG03].
+
+Acknowledgments
+
+   Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally
+   Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi
+   Sarolahti, and Alexey Kuznetsov for useful discussions that
+   contributed to this work.
+
+Normative References
+
+   [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+             Control", RFC 2581, April 1999.
+
+   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+             Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., Podolsky, M. and A.
+             Romanow, "An Extension to the Selective Acknowledgement
+             (SACK) Option for TCP", RFC 2883, July 2000.
+
+   [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for
+             High Performance", RFC 1323, May 1992.
+
+   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
+             Selective Acknowledgement Options", RFC 2018, October 1996.
+
+   [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
+             793, September 1981.
+
+
+
+
+
+
+
+
+
+
+Ludwig & Meyer                Experimental                     [Page 11]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+Informative References
+
+   [BA02]    Blanton, E. and M. Allman, "Using TCP DSACKs and SCTP
+             Duplicate TSNs to Detect Spurious Retransmissions", Work in
+             Progress.
+
+   [RFC1122] Braden, R., "Requirements for Internet Hosts -
+             Communication Layers", STD 3, RFC 1122, October 1989.
+
+   [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to
+             TCP's Fast Recovery Algorithm", RFC 2582, April 1999.
+
+   [Gu01]    Gurtov, A., "Effect of Delays on TCP Performance", In
+             Proceedings of IFIP Personal Wireless Communications,
+             August 2001.
+
+   [RFC3481] Inamura, H., Montenegro, G., Ludwig, R., Gurtov, A. and F.
+             Khafizov, "TCP over Second (2.5G) and Third (3G) Generation
+             Wireless Networks", RFC 3481, February 2003.
+
+   [Jac88]   Jacobson, V., "Congestion Avoidance and Control", In
+             Proceedings of ACM SIGCOMM 88.
+
+   [KP87]    Karn, P. and C. Partridge, "Improving Round-Trip Time
+             Estimates in Reliable Transport Protocols", In Proceedings
+             of ACM SIGCOMM 87.
+
+   [LK00]    Ludwig, R. and R. H. Katz, "The Eifel Algorithm: Making TCP
+             Robust Against Spurious Retransmissions", ACM Computer
+             Communication Review, Vol. 30, No. 1, January 2000.
+
+   [LG03]    Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm for
+             TCP", Work in Progress.
+
+   [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
+             Timer", RFC 2988, November 2000.
+
+   [RFC791]  Postel, J., "Internet Protocol", STD 5, RFC 791, September
+             1981.
+
+   [RFC3168] Ramakrishnan, K., Floyd, S. and D. Black, "The Addition of
+             Explicit Congestion Notification (ECN) to IP", RFC 3168,
+             September 2001.
+
+   [SK03]    Sarolahti, P. and M. Kojo, "F-RTO: A TCP RTO Recovery
+             Algorithm for Avoiding Unnecessary Retransmissions", Work
+             in Progress.
+
+
+
+
+Ludwig & Meyer                Experimental                     [Page 12]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+   [WS95]    Wright, G. R. and W. R. Stevens, "TCP/IP Illustrated,
+             Volume 2 (The Implementation)", Addison Wesley, January
+             1995.
+
+   [Zh86]    Zhang, L., "Why TCP Timers Don't Work Well", In Proceedings
+             of ACM SIGCOMM 86.
+
+Authors' Addresses
+
+   Reiner Ludwig
+   Ericsson Research
+   Ericsson Allee 1
+   52134 Herzogenrath, Germany
+
+   EMail: Reiner.Ludwig@eed.ericsson.se
+
+
+   Michael Meyer
+   Ericsson Research
+   Ericsson Allee 1
+   52134 Herzogenrath, Germany
+
+   EMail: Michael.Meyer@eed.ericsson.se
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ludwig & Meyer                Experimental                     [Page 13]
+
+RFC 3522         The Eifel Detection Algorithm for TCP        April 2003
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ludwig & Meyer                Experimental                     [Page 14]
+
--- a/kernel/picotcp/RFC/rfc3540.txt
+++ b/kernel/picotcp/RFC/rfc3540.txt
@ -0,0 +1,731 @@
+
+
+
+
+
+
+Network Working Group                                          N. Spring
+Request for Comments: 3540                                  D. Wetherall
+Category: Experimental                                            D. Ely
+                                                University of Washington
+                                                               June 2003
+
+
+             Robust Explicit Congestion Notification (ECN)
+                         Signaling with Nonces
+
+Status of this Memo
+
+   This memo defines an Experimental Protocol for the Internet
+   community.  It does not specify an Internet standard of any kind.
+   Discussion and suggestions for improvement are requested.
+   Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+Abstract
+
+   This note describes the Explicit Congestion Notification (ECN)-nonce,
+   an optional addition to ECN that protects against accidental or
+   malicious concealment of marked packets from the TCP sender.  It
+   improves the robustness of congestion control by preventing receivers
+   from exploiting ECN to gain an unfair share of network bandwidth.
+   The ECN-nonce uses the two ECN-Capable Transport (ECT)codepoints in
+   the ECN field of the IP header, and requires a flag in the TCP
+   header.  It is computationally efficient for both routers and hosts.
+
+1.  Introduction
+
+   Statement of Intent
+
+      This specification describes an optional addition to Explicit
+      Congestion Notification [RFC3168] improving its robustness against
+      malicious or accidental concealment of marked packets.  It has not
+      been deployed widely.  One goal of publication as an Experimental
+      RFC is to be prudent, and encourage use and deployment prior to
+      publication in the standards track.  Another consideration is to
+      give time for firewall developers to recognize and accept the
+      pattern presented by the nonce.  It is the intent of the Transport
+      Area Working Group to re-submit this specification as an IETF
+      Proposed Standard in the future after more experience has been
+      gained.
+
+
+
+
+Spring, et. al.               Experimental                      [Page 1]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+   The correct operation of ECN requires the cooperation of the receiver
+   to return Congestion Experienced signals to the sender, but the
+   protocol lacks a mechanism to enforce this cooperation.  This raises
+   the possibility that an unscrupulous or poorly implemented receiver
+   could always clear ECN-Echo and simply not return congestion signals
+   to the sender.  This would give the receiver a performance advantage
+   at the expense of competing connections that behave properly.  More
+   generally, any device along the path (NAT box, firewall, QOS
+   bandwidth shapers, and so forth) could remove congestion marks with
+   impunity.
+
+   The above behaviors may or may not constitute a threat to the
+   operation of congestion control in the Internet.  However, given the
+   central role of congestion control, it is prudent to design the ECN
+   signaling loop to be robust against as many threats as possible.  In
+   this way, ECN can provide a clear incentive for improvement over the
+   prior state-of-the-art without potential incentives for abuse.  The
+   ECN-nonce is a simple, efficient mechanism to eliminate the potential
+   abuse of ECN.
+
+   The ECN-nonce enables the sender to verify the correct behavior of
+   the ECN receiver and that there is no other interference that
+   conceals marked (or dropped) packets in the signaling path.  The ECN-
+   nonce protects against both implementation errors and deliberate
+   abuse.  The ECN-nonce:
+
+   -  catches a misbehaving receiver with a high probability, and never
+      implicates an innocent receiver.
+
+   -  does not change other aspects of ECN, nor does it reduce the
+      benefits of ECN for behaving receivers.
+
+   -  is cheap in both per-packet overhead (one TCP header flag) and
+      processing requirements.
+
+   -  is simple and, to the best of our knowledge, not prone to other
+      attacks.
+
+   We also note that use of the ECN-nonce has two additional benefits,
+   even when only drop-tail routers are used.  First, packet drops
+   cannot be concealed from the sender.  Second, it prevents optimistic
+   acknowledgements [Savage], in which TCP segments are acknowledged
+   before they have been received.  These benefits also serve to
+   increase the robustness of congestion control from attacks.  We do
+   not elaborate on these benefits in this document.
+
+   The rest of this document describes the ECN-nonce.  We present an
+   overview followed by detailed behavior at senders and receivers.
+
+
+
+Spring, et. al.               Experimental                      [Page 2]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
+   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
+   document, are to be interpreted as described in [RFC2119].
+
+2.  Overview
+
+   The ECN-nonce builds on the existing ECN-Echo and Congestion Window
+   Reduced (CWR) signaling mechanism.  Familiarity with ECN [ECN] is
+   assumed.  For simplicity, we describe the ECN-nonce in one direction
+   only, though it is run in both directions in parallel.
+
+   The ECN protocol for TCP remains unchanged, except for the definition
+   of a new field in the TCP header.  As in [RFC3168], ECT(0) or ECT(1)
+   (ECN-Capable Transport) is set in the ECN field of the IP header on
+   outgoing packets.  Congested routers change this field to CE
+   (Congestion Experienced).  When TCP receivers notice CE, the ECE
+   (ECN-Echo) flag is set in subsequent acknowledgements until receiving
+   a CWR flag.  The CWR flag is sent on new data whenever the sender
+   reacts to congestion.
+
+   The ECN-nonce adds to this protocol, and enables the receiver to
+   demonstrate to the sender that segments being acknowledged were
+   received unmarked.  A random one-bit value (a nonce) is encoded in
+   the two ECT codepoints.  The one-bit sum of these nonces is returned
+   in a TCP header flag, the nonce sum (NS) bit.  Packet marking erases
+   the nonce value in the ECT codepoints because CE overwrites both ECN
+   IP header bits.  Since each nonce is required to calculate the sum,
+   the correct nonce sum implies receipt of only unmarked packets.  Not
+   only are receivers prevented from concealing marked packets, middle-
+   boxes along the network path cannot unmark a packet without
+   successfully guessing the value of the original nonce.
+
+   The sender can verify the nonce sum returned by the receiver to
+   ensure that congestion indications in the form of marked (or dropped)
+   packets are not being concealed.  Because the nonce sum is only one
+   bit long, senders have a 50-50 chance of catching a lying receiver
+   whenever an acknowledgement conceals a mark.  Because each
+   acknowledgement is an independent trial, cheaters will be caught
+   quickly if there are repeated congestion signals.
+
+   The following paragraphs describe aspects of the ECN-nonce protocol
+   in greater detail.
+
+
+
+
+
+
+
+
+
+Spring, et. al.               Experimental                      [Page 3]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+   Each acknowledgement carries a nonce sum, which is the one bit sum
+   (exclusive-or, or parity) of nonces over the byte range represented
+   by the acknowledgement.  The sum is used because not every packet is
+   acknowledged individually, nor are packets acknowledged reliably.  If
+   a sum were not used, the nonce in an unmarked packet could be echoed
+   to prove to the sender that the individual packet arrived unmarked.
+   However, since these acks are not reliably delivered, the sender
+   could not distinguish a lost ACK from one that was never sent in
+   order to conceal a marked packet.  The nonce sum prevents the
+   receiver from concealing individual marked packets by not
+   acknowledging them.  Because the nonce and nonce sum are both one bit
+   quantities, the sum is no easier to guess than the individual nonces.
+   We show the nonce sum calculation below in Figure 1.
+
+    Sender             Receiver
+                         initial sum = 1
+      -- 1:4 ECT(0)  --> NS = 1 + 0(1:4) = 1(:4)
+      <- ACK 4, NS=1 ---
+      -- 4:8 ECT(1)  --> NS = 1(:4) + 1(4:8) = 0(:8)
+      <- ACK 8, NS=0 ---
+      -- 8:12 ECT(1)  -> NS = 0(:8) + 1(8:12) = 1(:12)
+      <- ACK 12, NS=1 --
+      -- 12:16 ECT(1) -> NS = 1(:12) + 1(12:16) = 0(:16)
+      <- ACK 16, NS=0 --
+
+   Figure 1: The calculation of nonce sums at the receiver.
+
+   After congestion has occurred and packets have been marked or lost,
+   resynchronization of the sender and receiver nonce sums is needed.
+   When packets are marked, the nonce is cleared, and the sum of the
+   nonces at the receiver will no longer match the sum at the sender.
+   Once nonces have been lost, the difference between sender and
+   receiver nonce sums is constant until there is further loss.  This
+   means that it is possible to resynchronize the sender and receiver
+   after congestion by having the sender set its nonce sum to that of
+   the receiver.  Because congestion indications do not need to be
+   conveyed more frequently than once per round trip, the sender
+   suspends checking while the CWR signal is being delivered and resets
+   its nonce sum to the receiver's when new data is acknowledged.  This
+   has the benefit that the receiver is not explicitly involved in the
+   re-synchronization process.  The resynchronization process is shown
+   in Figure 2 below.  Note that the nonce sum returned in ACK 12 (NS=0)
+   differs from that in the previous example (NS=1), and it continues to
+   differ for ACK 16.
+
+
+
+
+
+
+
+Spring, et. al.               Experimental                      [Page 4]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+    Sender              Receiver
+                            initial sum = 1
+      -- 1:4 ECT(0)       -> NS = 1 + 0(1:4) = 1(:4)
+      <- ACK 4, NS=1      --
+      -- 4:8 ECT(1) -> CE -> NS = 1(:4) + ?(4:8) = 1(:8)
+      <- ACK 8, ECE NS=1  --
+      -- 8:12 ECT(1), CWR -> NS = 1(:8) + 1(8:12) = 0(:12)
+      <- ACK 12, NS=0     --
+      -- 12:16 ECT(1)     -> NS = 0(:12) + 1(12:16) = 1(:16)
+      <- ACK 16, NS=1     --
+
+   Figure 2: The calculation of nonce sums at the receiver when a
+      packet (4:8) is marked.  The receiver may calculate the wrong
+      nonce sum when the original nonce information is lost after a
+      packet is marked.
+
+   Third, we need to reconcile that nonces are sent with packets but
+   acknowledgements cover byte ranges.  Acknowledged byte boundaries
+   need not match the transmitted boundaries, and information can be
+   retransmitted in packets with different byte boundaries.  We discuss
+   the first issue, how a receiver sets a nonce when acknowledging part
+   of a segment, in Section 6.1. The second question, what nonce to send
+   when retransmitting smaller segments as a large segment, has a simple
+   answer: ECN is disabled for retransmissions, so can carry no nonce.
+   Because retransmissions are associated with congestion events, nonce
+   checking is suspended until after CWR is acknowledged and the
+   congestion event is over.
+
+   The next sections describe the detailed behavior of senders, routers
+   and receivers, starting with sender transmit behavior, then around
+   the ECN signaling loop, and finish with sender acknowledgement
+   processing.
+
+3.  Sender Behavior (Transmit)
+
+   Senders manage CWR and ECN-Echo as before.  In addition, they must
+   place nonces on packets as they are transmitted and check the
+   validity of the nonce sums in acknowledgments as they are received.
+   This section describes the transmit process.
+
+   To place a one bit nonce value on every ECN-capable IP packet, the
+   sender uses the two ECT codepoints: ECT(0) represents a nonce of 0,
+   and ECT(1) a nonce of 1.  As in ECN, retransmissions are not ECN-
+   capable, so carry no nonce.
+
+   The sender maintains a mapping from each packet's end sequence number
+   to the expected nonce sum (not the nonce placed in the original
+   transmission) in the acknowledgement bearing that sequence number.
+
+
+
+Spring, et. al.               Experimental                      [Page 5]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+4.  Router Behavior
+
+   Routers behave as specified in [RFC3168].  By marking packets to
+   signal congestion, the original value of the nonce, in ECT(0) or
+   ECT(1), is removed.  Neither the receiver nor any other party can
+   unmark the packet without successfully guessing the value of the
+   original nonce.
+
+5.  Receiver Behavior (Receive and Transmit)
+
+   ECN-nonce receivers maintain the nonce sum as in-order packets arrive
+   and return the current nonce sum in each acknowledgement.  Receiver
+   behavior is otherwise unchanged from [RFC3168].  Returning the nonce
+   sum is optional, but recommended, as senders are allowed to
+   discontinue sending ECN-capable packets to receivers that do not
+   support the ECN-nonce.
+
+   As packets are removed from the queue of out-of-order packets to be
+   acknowledged, the nonce is recovered from the IP header.  The nonce
+   is added to the current nonce sum as the acknowledgement sequence
+   number is advanced for the recent packet.
+
+   In the case of marked packets, one or more nonce values may be
+   unknown to the receiver.  In this case the missing nonce values are
+   ignored when calculating the sum (or equivalently a value of zero is
+   assumed) and ECN-Echo will be set to signal congestion to the sender.
+
+   Returning the nonce sum corresponding to a given acknowledgement is
+   straightforward.  It is carried in a single "NS" (Nonce Sum) bit in
+   the TCP header.  This bit is adjacent to the CWR and ECN-Echo bits,
+   set as Bit 7 in byte 13 of the TCP header, as shown below:
+
+     0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
+   |               |           | N | C | E | U | A | P | R | S | F |
+   | Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
+   |               |           |   | R | E | G | K | H | T | N | N |
+   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
+
+   Figure 3: The new definition of bytes 13 and 14 of the TCP Header.
+
+   The initial nonce sum is 1, and is included in the SYN/ACK and ACK of
+   the three way TCP handshake.  This allows the other endpoint to infer
+   nonce support, but is not a negotiation, in that the receiver of the
+   SYN/ACK need not check if NS is set to decide whether to set NS in
+   the subsequent ACK.
+
+
+
+
+
+Spring, et. al.               Experimental                      [Page 6]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+6.  Sender Behavior (Receive)
+
+   This section completes the description of sender behavior by
+   describing how senders check the validity of the nonce sums.
+
+   The nonce sum is checked when an acknowledgement of new data is
+   received, except during congestion recovery when additional ECN-Echo
+   signals would be ignored.  Checking consists of comparing the correct
+   nonce sum stored in a buffer to that carried in the acknowledgement,
+   with a correction described in the following subsection.
+
+   If ECN-Echo is not set, the receiver claims to have received no
+   marked packets, and can therefore compute and return the correct
+   nonce sum.  To conceal a mark, the receiver must successfully guess
+   the sum of the nonces that it did not receive, because at least one
+   packet was marked and the corresponding nonce was erased.  Provided
+   the individual nonces are equally likely to be 0 or 1, their sum is
+   equally likely to be 0 or 1.  In other words, any guess is equally
+   likely to be wrong and has a 50-50 chance of being caught by the
+   sender.  Because each new acknowledgement is an independent trial, a
+   cheating receiver is likely to be caught after a small number of
+   lies.
+
+   If ECN-Echo is set, the receiver is sending a congestion signal and
+   it is not necessary to check the nonce sum.  The congestion window
+   will be halved, CWR will be set on the next packet with new data
+   sent, and ECN-Echo will be cleared once the CWR signal is received,
+   as in [RFC3168].  During this recovery process, the sum may be
+   incorrect because one or more nonces were not received.  This does
+   not matter during recovery, because TCP invokes congestion mechanisms
+   at most once per RTT, whether there are one or more losses during
+   that period.
+
+6.1.  Resynchronization After Loss or Mark
+
+   After recovery, it is necessary to re-synchronize the sender and
+   receiver nonce sums so that further acknowledgments can be checked.
+   When the receiver's sum is incorrect, it will remain incorrect until
+   further loss.
+
+   This leads to a simple re-synchronization mechanism where the sender
+   resets its nonce sum to that of the receiver when it receives an
+   acknowledgment for new data sent after the congestion window was
+   reduced.  When responding to explicit congestion signals, this will
+   be the first acknowledgement without the ECN-Echo flag set: the
+   acknowledgement of the packet containing the CWR flag.
+
+
+
+
+
+Spring, et. al.               Experimental                      [Page 7]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+    Sender              Receiver
+                             initial sum = 1
+      -- 1:4 ECT(0)       -> NS = 1 + 0(1:4) = 1(:4)
+      <- ACK 4, NS=1      --
+      -- 4:8 ECT(1) -> LOST
+      -- 8:12 ECT(1)      -> nonce sum calculation deferred
+                               until in-order data received
+      <- ACK 4, NS=0      --
+      -- 12:16 ECT(1)     -> nonce sum calculation deferred
+      <- ACK 4, NS=0      --
+      -- 4:8 retransmit   -> NS = 1(:4) + ?(4:8) +
+                                  1(8:12) + 1(12:16) = 1(:16)
+      <- ACK 16, NS=1     --
+      -- 16:20 ECT(1) CWR ->
+      <- ACK 20, NS=0     -- NS = 1(:16) + 1(16:20) = 0(:20)
+
+   Figure 4: The calculation of nonce sums at the receiver when a
+      packet is lost, and resynchronization after loss.  The nonce sum
+      is not changed until the cumulative acknowledgement is advanced.
+
+   In practice, resynchronization can be accomplished by storing a bit
+   that has the value one if the expected nonce sum stored by the sender
+   and the received nonce sum in the acknowledgement of CWR differ, and
+   zero otherwise.  This synchronization offset bit can then be used in
+   the comparison between expected nonce sum and received nonce sum.
+
+   The sender should ignore the nonce sum returned on any
+   acknowledgements bearing the ECN-echo flag.
+
+   When an acknowledgment covers only a portion of a segment, such as
+   when a middlebox resegments at the TCP layer instead of fragmenting
+   IP packets, the sender should accept the nonce sum expected at the
+   next segment boundary.  In other words, an acknowledgement covering
+   part of an original segment will include the nonce sum expected when
+   the entire segment is acknowledged.
+
+   Finally, in ECN, senders can choose not to indicate ECN capability on
+   some packets for any reason.  An ECN-nonce sender must resynchronize
+   after sending such ECN-incapable packets, as though a CWR had been
+   sent with the first new data after the ECN-incapable packets.  The
+   sender loses protection for any unacknowledged packets until
+   resynchronization occurs.
+
+
+
+
+
+
+
+
+
+Spring, et. al.               Experimental                      [Page 8]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+6.2.  Sender Behavior - Incorrect Nonce Received
+
+   The sender's response to an incorrect nonce is a matter of policy.
+   It is separate from the checking mechanism and does not need to be
+   handled uniformly by senders.  Further, checking received nonce sums
+   at all is optional, and may be disabled.
+
+   If the receiver has never sent a non-zero nonce sum, the sender can
+   infer that the receiver does not understand the nonce, and rate limit
+   the connection, place it in a lower-priority queue, or cease setting
+   ECT in outgoing segments.
+
+   If the received nonce sum has been set in a previous acknowledgement,
+   the sender might infer that a network device has interfered with
+   correct ECN signaling between ECN-nonce supporting endpoints.  The
+   minimum response to an incorrect nonce is the same as the response to
+   a received ECE.  However, to compensate for hidden congestion
+   signals, the sender might reduce the congestion window to one segment
+   and cease setting ECT in outgoing segments.  An incorrect nonce sum
+   is a sign of misbehavior or error between ECN-nonce supporting
+   endpoints.
+
+6.2.1.  Using the ECN-nonce to Protect Against Other Misbehaviors
+
+   The ECN-nonce can provide robustness beyond checking that marked
+   packets are signaled to the sender.  It also ensures that dropped
+   packets cannot be concealed from the sender (because their nonces
+   have been lost).  Drops could potentially be concealed by a faulty
+   TCP implementation, certain attacks, or even a hypothetical TCP
+   accelerator.  Such an accelerator could gamble that it can either
+   successfully "fast start" to a preset bandwidth quickly, retry with
+   another connection, or provide reliability at the application level.
+   If robustness against these faults is also desired, then the ECN-
+   nonce should not be disabled.  Instead, reducing the congestion
+   window to one, or using a low-priority queue, would penalize faulty
+   operation while providing continued checking.
+
+   The ECN-nonce can also detect misbehavior in Eifel [Eifel], a
+   recently proposed mechanism for removing the retransmission ambiguity
+   to improve TCP performance.  A misbehaving receiver might claim to
+   have received only original transmissions to convince the sender to
+   undo congestion actions.  Since retransmissions are sent without ECT,
+   and thus no nonce, returning the correct nonce sum confirms that only
+   original transmissions were received.
+
+
+
+
+
+
+
+Spring, et. al.               Experimental                      [Page 9]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+7.  Interactions
+
+7.1.  Path MTU Discovery
+
+   As described in RFC3168, use of the Don't Fragment bit with ECN is
+   recommended.  Receivers that receive unmarked fragments can
+   reconstruct the original nonce to conceal a marked fragment.  The
+   ECN-nonce cannot protect against misbehaving receivers that conceal
+   marked fragments, so some protection is lost in situations where Path
+   MTU discovery is disabled.
+
+   When responding to a small path MTU, the sender will retransmit a
+   smaller frame in place of a larger one.  Since these smaller packets
+   are retransmissions, they will be ECN-incapable and bear no nonce.
+   The sender should resynchronize on the first newly transmitted
+   packet.
+
+7.2.  SACK
+
+   Selective acknowledgements allow receivers to acknowledge out of
+   order segments as an optimization.  It is not necessary to modify the
+   selective acknowledgment option to fit per-range nonce sums, because
+   SACKs cannot be used by a receiver to hide a congestion signal.  The
+   nonce sum corresponds only to the data acknowledged by the cumulative
+   acknowledgement.
+
+7.3.  IPv6
+
+   Although the IPv4 header is protected by a checksum, this is not the
+   case with IPv6, making undetected bit errors in the IPv6 header more
+   likely.  Bit errors that compromise the integrity of the congestion
+   notification fields may cause an incorrect nonce to be received, and
+   an incorrect nonce sum to be returned.
+
+8.  Security Considerations
+
+   The random one-bit nonces need not be from a cryptographic-quality
+   pseudo-random number generator.  A strong random number generator
+   would compromise performance.  Consequently, the sequence of random
+   nonces should not be used for any other purpose.
+
+   Conversely, the pseudo-random bit sequence should not be generated by
+   a linear feedback shift register [Schneier], or similar scheme that
+   would allow an adversary who has seen several previous bits to infer
+   the generation function and thus its future output.
+
+
+
+
+
+
+Spring, et. al.               Experimental                     [Page 10]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+   Although the ECN-nonce protects against concealment of congestion
+   signals and optimistic acknowledgement, it provides no additional
+   protection for the integrity of the connection.
+
+9.  IANA Considerations
+
+   The Nonce Sum (NS) is carried in a reserved TCP header bit that must
+   be allocated.  This document describes the use of Bit 7, adjacent to
+   the other header bits used by ECN.
+
+   The codepoint for the NS flag in the TCP header is specified by the
+   Standards Action of this RFC, as is required by RFC 2780.  The IANA
+   has added the following to the registry for "TCP Header Flags":
+
+   RFC 3540 defines bit 7 from the Reserved field to be used for the
+   Nonce Sum, as follows:
+
+     0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
+   |               |           | N | C | E | U | A | P | R | S | F |
+   | Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
+   |               |           |   | R | E | G | K | H | T | N | N |
+   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
+
+   TCP Header Flags
+
+   Bit      Name                                    Reference
+   ---      ----                                    ---------
+    7        NS (Nonce Sum)                         [RFC 3540]
+
+10.  Conclusion
+
+   The ECN-nonce is a simple modification to the ECN signaling mechanism
+   that improves ECN's robustness by preventing receivers from
+   concealing marked (or dropped) packets.  The intent of this work is
+   to help improve the robustness of congestion control in the Internet.
+   The modification retains the character and simplicity of existing ECN
+   signaling.  It is also practical for deployment in the Internet.  It
+   uses the ECT(0) and ECT(1) codepoints and one TCP header flag (as
+   well as CWR and ECN-Echo) and has simple processing rules.
+
+
+
+
+
+
+
+
+
+
+
+Spring, et. al.               Experimental                     [Page 11]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+11.  References
+
+   [ECN]      "The ECN Web Page", URL
+              "http://www.icir.org/floyd/ecn.html".
+
+   [RFC3168]  Ramakrishnan, K., Floyd, S. and D. Black, "The addition of
+              explicit congestion notification (ECN) to IP", RFC 3168,
+              September 2001.
+
+   [Eifel]    R. Ludwig and R. Katz. The Eifel Algorithm: Making TCP
+              Robust Against Spurious Retransmissions. Computer
+              Communications Review, January, 2000.
+
+   [B97]      Bradner, S., "Key words for use in RFCs to Indicate
+              Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [Savage]   S. Savage, N. Cardwell, D. Wetherall, T. Anderson.  TCP
+              congestion control with a misbehaving receiver. SIGCOMM
+              CCR, October 1999.
+
+   [Schneier] Bruce Schneier. Applied Cryptography, 2nd ed., 1996
+
+12.  Acknowledgements
+
+   This note grew out of research done by Stefan Savage, David Ely,
+   David Wetherall, Tom Anderson and Neil Spring.  We are very grateful
+   for feedback and assistance from Sally Floyd.
+
+13.  Authors' Addresses
+
+   Neil Spring
+   EMail: nspring@cs.washington.edu
+
+
+   David Wetherall
+   Department of Computer Science and Engineering, Box 352350
+   University of Washington
+   Seattle WA 98195-2350
+   EMail: djw@cs.washington.edu
+
+
+   David Ely
+   Computer Science and Engineering, 352350
+   University of Washington
+   Seattle, WA 98195-2350
+   EMail: ely@cs.washington.edu
+
+
+
+
+
+Spring, et. al.               Experimental                     [Page 12]
+
+RFC 3540                  Robust ECN Signaling                 June 2003
+
+
+14.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Spring, et. al.               Experimental                     [Page 13]
+
--- a/kernel/picotcp/RFC/rfc3562.txt
+++ b/kernel/picotcp/RFC/rfc3562.txt
@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group                                           M. Leech
+Request for Comments: 3562                               Nortel Networks
+Category:Informational                                         July 2003
+
+
+                   Key Management Considerations for
+                     the TCP MD5 Signature Option
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+Abstract
+
+   The TCP MD5 Signature Option (RFC 2385), used predominantly by BGP,
+   has seen significant deployment in critical areas of Internet
+   infrastructure.  The security of this option relies heavily on the
+   quality of the keying material used to compute the MD5 signature.
+   This document addresses the security requirements of that keying
+   material.
+
+1. Introduction
+
+   The security of various cryptographic functions lies both in the
+   strength of the functions themselves against various forms of attack,
+   and also, perhaps more importantly, in the keying material that is
+   used with them.  While theoretical attacks against the simple MAC
+   construction used in RFC 2385 are possible [MDXMAC], the number of
+   text-MAC pairs required to mount a forgery make it vastly more
+   probable that key-guessing is the main threat against RFC 2385.
+
+   We show a quantitative approach to determining the security
+   requirements of keys used with [RFC2385], which tends to suggest the
+   following:
+
+      o  Key lengths SHOULD be between 12 and 24 bytes, with larger keys
+         having effectively zero additional computational costs when
+         compared to shorter keys.
+
+
+
+
+
+
+
+Leech                        Informational                      [Page 1]
+
+RFC 3562    Considerations for the TCP MD5 Signature Option    July 2003
+
+
+      o  Key sharing SHOULD be limited so that keys aren't shared among
+         multiple BGP peering arrangements.
+
+      o  Keys SHOULD be changed at least every 90 days.
+
+1.1. Requirements Keywords
+
+   The keywords "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT",
+   and "MAY" that appear in this document are to be interpreted as
+   described in [RFC2119].
+
+2. Performance assumptions
+
+   The most recent performance study of MD5 that this author was able to
+   find was undertaken by J. Touch at ISI.  The results of this study
+   were documented in [RFC1810].  The assumption is that Moores Law
+   applies to the data in the study, which at the time showed a
+   best-possible *software* performance for MD5 of 87Mbits/second.
+   Projecting this number forward to the ca 2002 timeframe of this
+   document, would suggest a number near 2.1Gbits/second.
+
+   For purposes of simplification, we will assume that our key-guessing
+   attacker will attack short packets only.  A likely minimal packet is
+   an ACK, with no data.  This leads to having to compute the MD5 over
+   about 40 bytes of data, along with some reasonable maximum number of
+   key bytes.  MD5 effectively pads its input to 512-bit boundaries (64
+   bytes) (it's actually more complicated than that, but this
+   simplifying assumption will suffice for this analysis).  That means
+   that a minimum MD5 "block" is 64 bytes, so for a ca 2002-scaled
+   software performance of 2.1Gbits/second, we get a single-CPU software
+   MD5 performance near 4.1e6 single-block MD5 operations per second.
+
+   These numbers are, of course, assuming that any key-guessing attacker
+   is resource-constrained to a single CPU.  In reality, distributed
+   cryptographic key-guessing attacks have been remarkably successful in
+   the recent past.
+
+   It may be instructive to look at recent Internet worm infections, to
+   determine what the probable maximum number of hosts that could be
+   surreptitiously marshalled for a key-guessing attack against MD5.
+   CAIDA [CAIDA2001] has reported that the Code Red worm infected over
+   350,000 Internet hosts in the first 14 hours of operation.  It seems
+   reasonable to assume that a worm whose "payload" is a mechanism for
+   quietly performing a key-guessing attack (perhaps using idle CPU
+   cycles of the infected host) could be at least as effective as Code
+   Red was.  If one assumes that such a worm were engineered to be
+   maximally stealthy, then steady-state infection could conceivably
+   reach 1 million hosts or more.  That changes our single-CPU
+
+
+
+Leech                        Informational                      [Page 2]
+
+RFC 3562    Considerations for the TCP MD5 Signature Option    July 2003
+
+
+   performance from 4.1e6 operations per second, to somewhere between
+   1.0e11 and 1.0e13 MD5 operations per second.
+
+   In 1997, John Gilmore, and the Electronic Frontier Foundation [EFF98]
+   developed a special-purpose machine, for an investment of
+   approximately USD$250,000.  This machine was able to mount a
+   key-guessing attack against DES, and compute a key in under 1 week.
+   Given Moores Law, the same investment today would yield a machine
+   that could do the same work approximately 8 times faster.  It seems
+   reasonable to assume that a similar hardware approach could be
+   brought to bear on key-guessing attacks against MD5, for similar key
+   lengths to DES, with somewhat-reduced performance (MD5 performance in
+   hardware may be as much as 2-3 times slower than DES).
+
+3. Key Lifetimes
+
+   Operational experience with RFC 2385 would suggest that keys used
+   with this option may have lifetimes on the order of months.  It would
+   seem prudent, then, to choose a minimum key length that guarantees
+   that key-guessing runtimes are some small multiple of the key-change
+   interval under best-case (for the attacker) practical attack
+   performance assumptions.
+
+   The keys used with RFC 2385 are intended only to provide
+   authentication, and not confidentiality.  Consequently, the ability
+   of an attacker to determine the key used for old traffic (traffic
+   emitted before a key-change event) is not considered a threat.
+
+3. Key Entropy
+
+   If we make an assumption that key-change intervals are 90 days, and
+   that the reasonable upper-bound for software-based attack performance
+   is 1.0e13 MD5 operations per second, then the minimum required key
+   entropy is approximately 68 bits.  It is reasonable to round this
+   number up to at least 80 bits, or 10 bytes.  If one assumes that
+   hardware-based attacks are likely, using an EFF-like development
+   process, but with small-country-sized budgets, then the minimum key
+   size steps up considerably to around 83 bits, or 11 bytes.  Since 11
+   is such an ugly number, rounding up to 12 bytes is reasonable.
+
+   In order to achieve this much entropy with an English-language key,
+   one needs to remember that English has an entropy of approximately
+   1.3 bits per character.  Other human languages are similar.  This
+   means that a key derived from a human language would need to be
+   approximately 61 bytes long to produce 80 bits of entropy, and 73
+   bytes to produce 96 bits of entropy.
+
+
+
+
+
+Leech                        Informational                      [Page 3]
+
+RFC 3562    Considerations for the TCP MD5 Signature Option    July 2003
+
+
+   A more reasonable approach would be to use the techniques described
+   in [RFC1750] to produce a high quality random key of 96 bits or more.
+
+   It has previously been noted that an attacker will tend to choose
+   short packets to mount an attack on, since that increases the
+   key-guessing performance for the attacker.  It has also been noted
+   that MD5 operations are effectively computed in blocks of 64 bytes.
+   Given that the shortest packet an attacker could reasonably use would
+   consist of 40 bytes of IP+TCP header data, with no payload, the
+   remaining 24 bytes of the MD5 block can reasonably be used for keying
+   material without added CPU cost for routers, but substantially
+   increase the burden on the attacker.  While this practice will tend
+   to increase the CPU burden for ordinary short BGP packets, since it
+   will tend to cause the MD5 calculations to overflow into a second MD5
+   block, it isn't currently seen to be a significant extra burden to
+   BGP routing machinery.
+
+   The most reasonable practice, then, would be to choose the largest
+   possible key length smaller than 25 bytes that is operationally
+   reasonable, but at least 12 bytes.
+
+   Some implementations restrict the key to a string of ASCII
+   characters, much like simple passwords, usually of 8 bytes or less.
+   The very real risk is that such keys are quite vulnerable to
+   key-guessing attacks, as outlined above.  The worst-case scenario
+   would occur when the ASCII key/password is a human-language word, or
+   pseudo-word.  Such keys/passwords contain, at most, 12 bits of
+   entropy.  In such cases, dictionary driven attacks can yield results
+   in a fraction of the time that a brute-force approach would take.
+   Such implementations SHOULD permit users to enter a direct binary key
+   using the command line interface.  One possible implementation would
+   be to establish a convention that an ASCII key beginning with the
+   prefix "0x" be interpreted as a string of bytes represented in
+   hexadecimal.  Ideally, such byte strings will have been derived from
+   a random source, as outlined in [RFC1750].  Implementations SHOULD
+   NOT limit the length of the key unnecessarily, and SHOULD allow keys
+   of at least 16 bytes, to allow for the inevitable threat from Moores
+   Law.
+
+4. Key management practices
+
+   In current operational use, TCP MD5 Signature keys [RFC2385] may be
+   shared among significant numbers of systems.  Conventional wisdom in
+   cryptography and security is that such sharing increases the
+   probability of accidental or deliberate exposure of keys.  The more
+   frequently such keying material is handled, the more likely it is to
+   be accidentally exposed to unauthorized parties.
+
+
+
+
+Leech                        Informational                      [Page 4]
+
+RFC 3562    Considerations for the TCP MD5 Signature Option    July 2003
+
+
+   Since it is possible for anyone in possession of a key to forge
+   packets as if they originated with any of the other keyholders, the
+   most reasonable security practice would be to limit keys to use
+   between exactly two parties.  Current implementations may make this
+   difficult, but it is the most secure approach when key lifetimes are
+   long.  Reducing key lifetimes can partially mitigate widescale
+   key-sharing, by limiting the window of opportunity for a "rogue"
+   keyholder.
+
+   Keying material is extremely sensitive data, and as such, should be
+   handled with reasonable caution.  When keys are transported
+   electronically, including when configuring network elements like
+   routers, secure handling techniques MUST be used.  Use of protocols
+   such as S/MIME [RFC2633], TLS [RFC2246], Secure Shell (SSH) SHOULD be
+   used where appropriate, to protect the transport of the key.
+
+5. Security Considerations
+
+   This document is entirely about security requirements for keying
+   material used with RFC 2385.
+
+   No new security exposures are created by this document.
+
+6. Acknowledgements
+
+   Steve Bellovin, Ran Atkinson, and Randy Bush provided valuable
+   commentary in the development of this document.
+
+7. References
+
+   [RFC1771]   Rekhter, Y. and T. Li, "A Border Gateway Protocol 4
+               (BGP-4)", RFC 1771, March 1995.
+
+   [RFC1810]   Touch, J., "Report on MD5 Performance", RFC 1810, June
+               1995.
+
+   [RFC2385]   Heffernan, A., "Protection of BGP Sessions via the TCP
+               MD5 Signature Option", RFC 2385, August 1998.
+
+   [RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
+               Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [MDXMAC]    Van Oorschot, P. and B. Preneel, "MDx-MAC and Building
+               Fast MACs from Hash Functions".  Proceedings Crypto '95,
+               Springer-Verlag LNCS, August 1995.
+
+   [RFC1750]   Eastlake, D., Crocker, S. and J. Schiller, "Randomness
+               Recommendations for Security", RFC 1750, December 1994.
+
+
+
+Leech                        Informational                      [Page 5]
+
+RFC 3562    Considerations for the TCP MD5 Signature Option    July 2003
+
+
+   [EFF98]     "Cracking DES: Secrets of Encryption Research, Wiretap
+               Politics, and Chip Design".  Electronic Frontier
+               Foundation, 1998.
+
+   [RFC2633]   Ramsdell, B., "S/MIME Version 3 Message Specification",
+               RFC 2633, June 1999.
+
+   [RFC2246]   Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
+               RFC 2246, January 1999.
+
+   [CAIDA2001] "CAIDA Analysis of Code Red"
+               http://www.caida.org/analysis/security/code-red/
+
+8. Author's Address
+
+   Marcus D. Leech
+   Nortel Networks
+   P.O. Box 3511, Station C
+   Ottawa, ON
+   Canada, K1Y 4H7
+
+   Phone: +1 613-763-9145
+   EMail: mleech@nortelnetworks.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Leech                        Informational                      [Page 6]
+
+RFC 3562    Considerations for the TCP MD5 Signature Option    July 2003
+
+
+9.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (2003).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assignees.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Leech                        Informational                      [Page 7]
+
--- a/kernel/picotcp/RFC/rfc3649.txt
+++ b/kernel/picotcp/RFC/rfc3649.txt
--- a/kernel/picotcp/RFC/rfc3708.txt
+++ b/kernel/picotcp/RFC/rfc3708.txt
@ -0,0 +1,507 @@
+
+
+
+
+
+
+Network Working Group                                         E. Blanton
+Request for Comments: 3708                             Purdue University
+Category: Experimental                                         M. Allman
+                                                                    ICIR
+                                                           February 2004
+
+
+      Using TCP Duplicate Selective Acknowledgement (DSACKs) and
+         Stream Control Transmission Protocol (SCTP) Duplicate
+        Transmission Sequence Numbers (TSNs) to Detect Spurious
+                            Retransmissions
+
+Status of this Memo
+
+   This memo defines an Experimental Protocol for the Internet
+   community.  It does not specify an Internet standard of any kind.
+   Discussion and suggestions for improvement are requested.
+   Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2004).  All Rights Reserved.
+
+Abstract
+
+   TCP and Stream Control Transmission Protocol (SCTP) provide
+   notification of duplicate segment receipt through Duplicate Selective
+   Acknowledgement (DSACKs) and Duplicate Transmission Sequence Number
+   (TSN) notification, respectively.  This document presents
+   conservative methods of using this information to identify
+   unnecessary retransmissions for various applications.
+
+1.  Introduction
+
+   TCP [RFC793] and SCTP [RFC2960] provide notification of duplicate
+   segment receipt through duplicate selective acknowledgment (DSACK)
+   [RFC2883] and Duplicate TSN notifications, respectively.  Using this
+   information, a TCP or SCTP sender can generally determine when a
+   retransmission was sent in error.  This document presents two methods
+   for using duplicate notifications.  The first method is simple and
+   can be used for accounting applications.  The second method is a
+   conservative algorithm to disambiguate unnecessary retransmissions
+   from loss events for the purpose of undoing unnecessary congestion
+   control changes.
+
+
+
+
+
+
+
+Blanton & Allman              Experimental                      [Page 1]
+
+RFC 3708           TCP DSACKs and SCTP Duplicate TSNs      February 2004
+
+
+   This document is intended to outline reasonable and safe algorithms
+   for detecting spurious retransmissions and discuss some of the
+   considerations involved.  It is not intended to describe the only
+   possible method for achieving the goal, although the guidelines in
+   this document should be taken into consideration when designing
+   alternate algorithms.  Additionally, this document does not outline
+   what a TCP or SCTP sender may do after a spurious retransmission is
+   detected.  A number of proposals have been developed (e.g.,
+   [RFC3522], [SK03], [BDA03]), but it is not yet clear which of these
+   proposals are appropriate.  In addition, they all rely on detecting
+   spurious retransmits and so can share the algorithm specified in this
+   document.
+
+   Finally, we note that to simplify the text much of the following
+   discussion is in terms of TCP DSACKs, while applying to both TCP and
+   SCTP.
+
+   Terminology
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in RFC 2119 [RFC2119].
+
+2.  Counting Duplicate Notifications
+
+   For certain applications a straight count of duplicate notifications
+   will suffice.  For instance, if a stack simply wants to know (for
+   some reason) the number of spuriously retransmitted segments,
+   counting all duplicate notifications for retransmitted segments
+   should work well.  Another application of this strategy is to monitor
+   and adapt transport algorithms so that the transport is not sending
+   large amounts of spurious data into the network.  For instance,
+   monitoring duplicate notifications could be used by the Early
+   Retransmit [AAAB03] algorithm to determine whether fast
+   retransmitting [RFC2581] segments with a lower than normal duplicate
+   ACK threshold is working, or if segment reordering is causing
+   spurious retransmits.
+
+   More speculatively, duplicate notification has been proposed as an
+   integral part of estimating TCP's total loss rate [AEO03] for the
+   purposes of mitigating the impact of corruption-based losses on
+   transport protocol performance.  [EOA03] proposes altering the
+   transport's congestion response to the fraction of losses that are
+   actually due to congestion by requiring the network to provide the
+   corruption-based loss rate and making the transport sender estimate
+   the total loss rate.  Duplicate notifications are a key part of
+   estimating the total loss rate accurately [AEO03].
+
+
+
+
+Blanton & Allman              Experimental                      [Page 2]
+
+RFC 3708           TCP DSACKs and SCTP Duplicate TSNs      February 2004
+
+
+3.  Congestion/Duplicate Disambiguation Algorithm
+
+   When the purpose of detecting spurious retransmissions is to "undo"
+   unnecessary changes made to the congestion control state, as
+   suggested in [RFC2883], the data sender ideally needs to determine:
+
+   (a) That spurious retransmissions in a particular window of data do
+       not mask real segment loss (congestion).
+
+       For example, assume segments N and N+1 are retransmitted even
+       though only segment N was dropped by the network (thus, segment
+       N+1 was needlessly retransmitted).  When the sender receives the
+       notification that segment N+1 arrived more than once it can
+       conclude that segment N+1 was needlessly resent.  However, it
+       cannot conclude that it is appropriate to revert the congestion
+       control state because the window of data contained at least one
+       valid congestion indication (i.e., segment N was lost).
+
+   (b) That network duplication is not the cause of the duplicate
+       notification.
+
+       Determining whether a duplicate notification is caused by network
+       duplication of a packet or a spurious retransmit is a nearly
+       impossible task in theory.  Since [Pax97] shows that packet
+       duplication by the network is rare, the algorithm in this section
+       simply ceases to function when network duplication is detected
+       (by receiving a duplication notification for a segment that was
+       not retransmitted by the sender).
+
+   The algorithm specified below gives reasonable, but not complete,
+   protection against both of these cases.
+
+   We assume the TCP sender has a data structure to hold selective
+   acknowledgment information (e.g., as outlined in [RFC3517]).  The
+   following steps require an extension of such a 'scoreboard' to
+   incorporate a slightly longer history of retransmissions than called
+   for in [RFC3517].  The following steps MUST be taken upon the receipt
+   of each DSACK or duplicate TSN notification:
+
+   (A) Check the corresponding sequence range or TSN to determine
+       whether the segment has been retransmitted.
+
+       (A.1) If the SACK scoreboard is empty (i.e., the TCP sender has
+             received no SACK information from the receiver) and the
+             left edge of the incoming DSACK is equal to SND.UNA,
+             processing of this DSACK MUST be terminated and the
+             congestion control state MUST NOT be reverted during the
+             current window of data.  This clause intends to cover the
+
+
+
+Blanton & Allman              Experimental                      [Page 3]
+
+RFC 3708           TCP DSACKs and SCTP Duplicate TSNs      February 2004
+
+
+             case when an entire window of acknowledgments have been
+             dropped by the network.  In such a case, the reverse path
+             seems to be in a congested state and so reducing TCP's
+             sending rate is the conservative approach.
+
+       (A.2) If the segment was retransmitted exactly one time, mark it
+             as a duplicate.
+
+       (A.3) If the segment was retransmitted more than once processing
+             of this DSACK MUST be terminated and the congestion control
+             state MUST NOT be reverted to its previous state during the
+             current window of data.
+
+       (A.4) If the segment was not retransmitted the incoming DSACK
+             indicates that the network duplicated the segment in
+             question.  Processing of this DSACK MUST be terminated.  In
+             addition, the algorithm specified in this document MUST NOT
+             be used for the remainder of the connection, as future
+             DSACK reports may be indicating network duplication rather
+             than unnecessary retransmission.  Note that some techniques
+             to further disambiguate network duplication from
+             unnecessary retransmission (e.g., the TCP timestamp option
+             [RFC1323]) may be used to refine the algorithm in this
+             document further.  Using such a technique in conjunction
+             with an algorithm similar to the one presented herein may
+             allow for the continued use of the algorithm in the face of
+             duplicated segments.  We do not delve into such an
+             algorithm in this document due the current rarity of
+             network duplication.  However, future work should include
+             tackling this problem.
+
+   (B) Assuming processing is allowed to continue (per the (A) rules),
+       check all retransmitted segments in the previous window of data.
+
+       (B.1) If all segments or chunks marked as retransmitted have also
+             been marked as acknowledged and duplicated, we conclude
+             that all retransmissions in the previous window of data
+             were spurious and no loss occurred.
+
+       (B.2) If any segment or chunk is still marked as retransmitted
+             but not marked as duplicate, there are outstanding
+             retransmissions that could indicate loss within this window
+             of data.  We can make no conclusions based on this
+             particular DSACK/duplicate TSN notification.
+
+   In addition to keeping the state mentioned in [RFC3517] (for TCP) and
+   [RFC2960] (for SCTP), an implementation of this algorithm must track
+
+
+
+
+Blanton & Allman              Experimental                      [Page 4]
+
+RFC 3708           TCP DSACKs and SCTP Duplicate TSNs      February 2004
+
+
+   all sequence numbers or TSNs that have been acknowledged as
+   duplicates.
+
+4.  Related Work
+
+   In addition to the mechanism for detecting spurious retransmits
+   outlined in this document, several other proposals for finding
+   needless retransmits have been developed.
+
+   [BA02] uses the algorithm outlined in this document as the basis for
+   investigating several methods to make TCP more robust to reordered
+   packets.
+
+   The Eifel detection algorithm [RFC3522] uses the TCP timestamp option
+   [RFC1323] to determine whether the ACK for a given retransmit is for
+   the original transmission or a retransmission.  More generally,
+   [LK00] outlines the benefits of detecting spurious retransmits and
+   reverting from needless congestion control changes using the
+   timestamp-based scheme or a mechanism that uses a "retransmit bit" to
+   flag retransmits (and ACKs of retransmits).  The Eifel detection
+   algorithm can detect spurious retransmits more rapidly than a DSACK-
+   based scheme.  However, the tradeoff is that the overhead of the 12-
+   byte timestamp option must be incurred in every packet transmitted
+   for Eifel to function.
+
+   The F-RTO scheme [SK03] slightly alters TCP's sending pattern
+   immediately following a retransmission timeout and then observes the
+   pattern of the returning ACKs.  This pattern can indicate whether the
+   retransmitted segment was needed.  The advantage of F-RTO is that the
+   algorithm only needs to be implemented on the sender side of the TCP
+   connection and that nothing extra needs to cross the network (e.g.,
+   DSACKs, timestamps, special flags, etc.).  The downside is that the
+   algorithm is a heuristic that can be confused by network pathologies
+   (e.g., duplication or reordering of key packets).  Finally, note that
+   F-RTO only works for spurious retransmits triggered by the
+   transport's retransmission timer.
+
+   Finally, [AP99] briefly investigates using the time between
+   retransmitting a segment via the retransmission timeout and the
+   arrival of the next ACK as an indicator of whether the retransmit was
+   needed.  The scheme compares this time delta with a fraction (f) of
+   the minimum RTT observed thus far on the connection.  If the time
+   delta is less than f*minRTT then the retransmit is labeled spurious.
+   When f=1/2 the algorithm identifies roughly 59% of the needless
+   retransmission timeouts and identifies needed retransmits only 2.5%
+   of the time.  As with F-RTO, this scheme only detects spurious
+   retransmits sent by the transport's retransmission timer.
+
+
+
+
+Blanton & Allman              Experimental                      [Page 5]
+
+RFC 3708           TCP DSACKs and SCTP Duplicate TSNs      February 2004
+
+
+5.  Security Considerations
+
+   It is possible for the receiver to falsely indicate spurious
+   retransmissions in the case of actual loss, potentially causing a TCP
+   or SCTP sender to inaccurately conclude that no loss took place (and
+   possibly cause inappropriate changes to the senders congestion
+   control state).
+
+   Consider the following scenario: A receiver watches every segment or
+   chunk that arrives and acknowledges any segment that arrives out of
+   order by more than some threshold amount as a duplicate, assuming
+   that it is a retransmission.  A sender using the above algorithm will
+   assume that the retransmission was spurious.
+
+   The ECN nonce sum proposal [RFC3540] could possibly help mitigate the
+   ability of the receiver to hide real losses from the sender with
+   modest extension.  In the common case of receiving an original
+   transmission and a spurious retransmit a receiver will have received
+   the nonce from the original transmission and therefore can "prove" to
+   the sender that the duplication notification is valid.  In the case
+   when the receiver did not receive the original and is trying to
+   improperly induce the sender into transmitting at an inappropriately
+   high rate, the receiver will not know the ECN nonce from the original
+   segment and therefore will probabilistically not be able to fool the
+   sender for long.  [RFC3540] calls for disabling nonce sums on
+   duplicate ACKs, which means that the nonce sum is not directly
+   suitable for use as a mitigation to the problem of receivers lying
+   about DSACK information.  However, future efforts may be able to use
+   [RFC3540] as a starting point for building protection should it be
+   needed.
+
+6.  Acknowledgments
+
+   Sourabh Ladha and Reiner Ludwig made several useful comments on an
+   earlier version of this document.  The second author thanks BBN
+   Technologies and NASA's Glenn Research Center for supporting this
+   work.
+
+7. References
+
+7.1.  Normative References
+
+   [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
+             793, September 1981.
+
+   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+             Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+
+
+
+Blanton & Allman              Experimental                      [Page 6]
+
+RFC 3708           TCP DSACKs and SCTP Duplicate TSNs      February 2004
+
+
+   [RFC2883] Floyd, S., Mahdavi, J., Mathis, M. and M. Podolsky, "An
+             Extension to the Selective Acknowledgement (SACK) Option
+             for TCP", RFC 2883, July 2000.
+
+   [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
+             Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., Zhang,
+             L. and V. Paxson, "Stream Control Transmission Protocol",
+             RFC 2960, October 2000.
+
+7.2.  Informative References
+
+   [AAAB03]  Allman, M., Avrachenkov, K., Ayesta, U. and J. Blanton,
+             "Early Retransmit for TCP", Work in Progress, June 2003.
+
+   [AEO03]   Allman, M., Eddy, E. and S. Ostermann, "Estimating Loss
+             Rates With TCP", Work in Progress, August 2003.
+
+   [AP99]    Allman, M. and V. Paxson, "On Estimating End-to-End Network
+             Path Properties", SIGCOMM 99.
+
+   [BA02]    Blanton, E. and M. Allman.  On Making TCP More Robust to
+             Packet Reordering.  ACM Computer Communication Review,
+             32(1), January 2002.
+
+   [BDA03]   Blanton, E., Dimond, R. and M. Allman, "Practices for TCP
+             Senders in the Face of Segment Reordering", Work in
+             Progress, February 2003.
+
+   [EOA03]   Eddy, W., Ostermann, S. and M. Allman, "New Techniques for
+             Making Transport Protocols Robust to Corruption-Based
+             Loss", Work in Progress, July 2003.
+
+   [LK00]    R. Ludwig, R. H. Katz.  The Eifel Algorithm: Making TCP
+             Robust Against Spurious Retransmissions.  ACM Computer
+             Communication Review, 30(1), January 2000.
+
+   [Pax97]   V. Paxson.  End-to-End Internet Packet Dynamics.  In ACM
+             SIGCOMM, September 1997.
+
+   [RFC1323] Jacobson, V., Braden, R. and D. Borman,  "TCP Extensions
+             for High Performance", RFC 1323, May 1992.
+
+   [RFC3517] Blanton, E., Allman, M., Fall, K. and L. Wang, "A
+             Conservative Selective Acknowledgment (SACK)-based Loss
+             Recovery Algorithm for TCP", RFC 3517, April 2003.
+
+   [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for
+             TCP," RFC 3522, April 2003.
+
+
+
+Blanton & Allman              Experimental                      [Page 7]
+
+RFC 3708           TCP DSACKs and SCTP Duplicate TSNs      February 2004
+
+
+   [RFC3540] Spring, N., Wetherall, D. and D. Ely, "Robust Explicit
+             Congestion Notification (ECN) Signaling with Nonces", RFC
+             3540, June 2003.
+
+   [SK03]    Sarolahti, P. and M. Kojo, "F-RTO: An Algorithm for
+             Detecting Spurious Retransmission Timeouts with TCP and
+             SCTP", Work in Progress, June 2003.
+
+8.  Authors' Addresses
+
+   Ethan Blanton
+   Purdue University Computer Sciences
+   1398 Computer Science Building
+   West Lafayette, IN  47907
+
+   EMail: eblanton@cs.purdue.edu
+
+
+   Mark Allman
+   ICSI Center for Internet Research
+   1947 Center Street, Suite 600
+   Berkeley, CA 94704-1198
+   Phone: 216-243-7361
+
+   EMail: mallman@icir.org
+   http://www.icir.org/mallman/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Blanton & Allman              Experimental                      [Page 8]
+
+RFC 3708           TCP DSACKs and SCTP Duplicate TSNs      February 2004
+
+
+9.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (2004).  This document is subject
+   to the rights, licenses and restrictions contained in BCP 78 and
+   except as set forth therein, the authors retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
+   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
+   INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
+   IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
+   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed
+   to pertain to the implementation or use of the technology
+   described in this document or the extent to which any license
+   under such rights might or might not be available; nor does it
+   represent that it has made any independent effort to identify any
+   such rights.  Information on the procedures with respect to
+   rights in RFC documents can be found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use
+   of such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository
+   at http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention
+   any copyrights, patents or patent applications, or other
+   proprietary rights that may cover technology that may be required
+   to implement this standard.  Please address the information to the
+   IETF at ietf-ipr@ietf.org.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+Blanton & Allman              Experimental                      [Page 9]
+
--- a/kernel/picotcp/RFC/rfc3742.txt
+++ b/kernel/picotcp/RFC/rfc3742.txt
@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group                                           S. Floyd
+Request for Comments: 3742                                          ICSI
+Category: Experimental                                        March 2004
+
+
+        Limited Slow-Start for TCP with Large Congestion Windows
+
+Status of this Memo
+
+   This memo defines an Experimental Protocol for the Internet
+   community.  It does not specify an Internet standard of any kind.
+   Discussion and suggestions for improvement are requested.
+   Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2004).  All Rights Reserved.
+
+Abstract
+
+   This document describes an optional modification for TCP's slow-start
+   for use with TCP connections with large congestion windows.  For TCP
+   connections that are able to use congestion windows of thousands (or
+   tens of thousands) of MSS-sized segments (for MSS the sender's
+   MAXIMUM SEGMENT SIZE), the current slow-start procedure can result in
+   increasing the congestion window by thousands of segments in a single
+   round-trip time.  Such an increase can easily result in thousands of
+   packets being dropped in one round-trip time.  This is often
+   counter-productive for the TCP flow itself, and is also hard on the
+   rest of the traffic sharing the congested link.  This note describes
+   Limited Slow-Start as an optional mechanism for limiting the number
+   of segments by which the congestion window is increased for one
+   window of data during slow-start, in order to improve performance for
+   TCP connections with large congestion windows.
+
+1.  Introduction
+
+   This note describes an optional modification for TCP's slow-start for
+   use with TCP connections with large congestion windows.  For TCP
+   connections that are able to use congestion windows of thousands (or
+   tens of thousands) of MSS-sized segments (for MSS the sender's
+   MAXIMUM SEGMENT SIZE), the current slow-start procedure can result in
+   increasing the congestion window by thousands of segments in a single
+   round-trip time.  Such an increase can easily result in thousands of
+   packets being dropped in one round-trip time.  This is often
+   counter-productive for the TCP flow itself, and is also hard on the
+   rest of the traffic sharing the congested link.  This note describes
+   Limited Slow-Start, limiting the number of segments by which the
+
+
+
+Floyd                         Experimental                      [Page 1]
+
+RFC 3742     TCP's Slow-Start with Large Congestion Windows   March 2004
+
+
+   congestion window is increased for one window of data during slow-
+   start, in order to improve performance for TCP connections with large
+   congestion windows.
+
+   When slow-start results in a large increase in the congestion window
+   in one round-trip time, a large number of packets might be dropped in
+   the network (even with carefully-tuned active queue management
+   mechanisms in the routers).  This drop of a large number of packets
+   in the network can result in unnecessary retransmit timeouts for the
+   TCP connection.  The TCP connection could end up in the congestion
+   avoidance phase with a very small congestion window, and could take a
+   large number of round-trip times to recover its old congestion
+   window.  This poor performance is illustrated in [F02].
+
+2.  The Proposal for Limited Slow-Start
+
+   Limited Slow-Start introduces a parameter, "max_ssthresh", and
+   modifies the slow-start mechanism for values of the congestion window
+   where "cwnd" is greater than "max_ssthresh".  That is, during Slow-
+   Start, when
+
+      cwnd <= max_ssthresh,
+
+   cwnd is increased by one MSS (MAXIMUM SEGMENT SIZE) for every
+   arriving ACK (acknowledgement) during slow-start, as is always the
+   case.  During Limited Slow-Start, when
+
+      max_ssthresh < cwnd <= ssthresh,
+
+   the invariant is maintained so that the congestion window is
+   increased during slow-start by at most max_ssthresh/2 MSS per round-
+   trip time.  This is done as follows:
+
+      For each arriving ACK in slow-start:
+        If (cwnd <= max_ssthresh)
+           cwnd += MSS;
+        else
+           K = int(cwnd/(0.5 max_ssthresh));
+           cwnd += int(MSS/K);
+
+   Thus, during Limited Slow-Start the window is increased by 1/K MSS
+   for each arriving ACK, for K = int(cwnd/(0.5 max_ssthresh)), instead
+   of by 1 MSS as in standard slow-start [RFC2581].
+
+
+
+
+
+
+
+
+Floyd                         Experimental                      [Page 2]
+
+RFC 3742     TCP's Slow-Start with Large Congestion Windows   March 2004
+
+
+   When
+
+     ssthresh < cwnd,
+
+   slow-start is exited, and the sender is in the Congestion Avoidance
+   phase.
+
+   Our recommendation would be for max_ssthresh to be set to 100 MSS.
+   (This is illustrated in the NS [NS] simulator, for snapshots after
+   May 1, 2002, in the tests "./test-all-tcpHighspeed tcp1A" and
+   "./test-all-tcpHighspeed tcpHighspeed1" in the subdirectory
+   "tcl/lib".  Setting max_ssthresh to Infinity causes the TCP
+   connection in NS not to use Limited Slow-Start.)
+
+   With Limited Slow-Start, when the congestion window is greater than
+   max_ssthresh, the window is increased by at most 1/2 MSS for each
+   arriving ACK; when the congestion window is greater than 1.5
+   max_ssthresh, the window is increased by at most 1/3 MSS for each
+   arriving ACK, and so on.
+
+   With Limited Slow-Start it takes:
+
+      log(max_ssthresh)
+
+   round-trip times to reach a congestion window of max_ssthresh, and it
+   takes:
+
+      log(max_ssthresh) + (cwnd - max_ssthresh)/(max_ssthresh/2)
+
+   round-trip times to reach a congestion window of cwnd, for a
+   congestion window greater than max_ssthresh.
+
+   Thus, with Limited Slow-Start with max_ssthresh set to 100 MSS, it
+   would take 836 round-trip times to reach a congestion window of
+   83,000 packets, compared to 16 round-trip times without Limited
+   Slow-Start (assuming no packet drops).  With Limited Slow-Start, the
+   largest transient queue during slow-start would be 100 packets;
+   without Limited Slow-Start, the transient queue during Slow-Start
+   would reach more than 32,000 packets.
+
+   By limiting the maximum increase in the congestion window in a
+   round-trip time, Limited Slow-Start can reduce the number of drops
+   during slow-start, and improve the performance of TCP connections
+   with large congestion windows.
+
+
+
+
+
+
+
+Floyd                         Experimental                      [Page 3]
+
+RFC 3742     TCP's Slow-Start with Large Congestion Windows   March 2004
+
+
+3.  Experimental Results
+
+   Tom Dunigan has added Limited Slow-Start to the Linux 2.4.16 Web100
+   kernel, and performed experiments comparing TCP with and without
+   Limited Slow-Start [D02].  Results so far show improved performance
+   for TCPs using Limited Slow-Start.  There are also several
+   experiments comparing different values for max_ssthresh.
+
+4.  Related Proposals
+
+   There has been considerable research on mechanisms for the TCP sender
+   to learn about the limitations of the available bandwidth, and to
+   exit slow-start before receiving a congestion indication from the
+   network [VEGAS,H96].  Other proposals set TCP's slow-start parameter
+   ssthresh based on information from previous TCP connections to the
+   same destination [WS95,G00].  This document proposes a simple
+   limitation on slow-start that can be effective in some cases even in
+   the absence of such mechanisms.  The max_ssthresh parameter does not
+   replace ssthresh, but is an additional parameter.  Thus, Limited
+   Slow-Start could be used in addition to mechanisms for setting
+   ssthresh.
+
+   Rate-based pacing has also been proposed to improve the performance
+   of TCP during slow-start [VH97,AD98,KCRP99,ASA00].  We believe that
+   rate-based pacing could be of significant benefit, and could be used
+   in addition to the Limited Slow-Start in this proposal.
+
+   Appropriate Byte Counting [RFC3465] proposes that TCP increase its
+   congestion window as a function of the number of bytes acknowledged,
+   rather than as a function of the number of ACKs received.
+   Appropriate Byte Counting is largely orthogonal to this proposal for
+   Limited Slow-Start.
+
+   Limited Slow-Start is also orthogonal to other proposals to change
+   mechanisms for exiting slow-start.  For example, FACK TCP includes an
+   overdamping mechanism to decrease the congestion window somewhat more
+   aggressively when a loss occurs during slow-start [MM96].  It is also
+   true that larger values for the MSS would reduce the size of the
+   congestion window in units of MSS needed to fill a given pipe, and
+   therefore would reduce the size of the transient queue in units of
+   MSS.
+
+5.  Acknowledgements
+
+   This proposal is part of a larger proposal for HighSpeed TCP for TCP
+   connections with large congestion windows, and resulted from
+   simulations done by Evandro de Souza, in joint work with Deb Agarwal.
+   This proposal for Limited Slow-Start draws in part from discussions
+
+
+
+Floyd                         Experimental                      [Page 4]
+
+RFC 3742     TCP's Slow-Start with Large Congestion Windows   March 2004
+
+
+   with Tom Kelly, who has used a similar modified slow-start in his own
+   research with congestion control for high-bandwidth connections.  We
+   also thank Tom Dunigan for his experiments with Limited Slow-Start.
+
+   We thank Andrei Gurtov, Reiner Ludwig, members of the End-to-End
+   Research Group, and members of the Transport Area Working Group, for
+   feedback on this document.
+
+6.  Security Considerations
+
+   This proposal makes no changes to the underlying security of TCP.
+
+7.  References
+
+7.1.  Normative References
+
+   [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
+             Control", RFC 2581, April 1999.
+
+   [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
+             Counting (ABC)", RFC 3465, February 2003.
+
+7.2.  Informative References
+
+   [AD98]    Mohit Aron and Peter Druschel, "TCP: Improving Start-up
+             Dynamics by Adaptive Timers and Congestion Control"",
+             TR98-318, Rice University, 1998.  URL "http://cs-
+             tr.cs.rice.edu/Dienst/UI/2.0/Describe/ncstrl.rice_cs/TR98-
+             318/".
+
+   [ASA00]   A. Aggarwal, S. Savage, and T. Anderson, "Understanding the
+             Performance of TCP Pacing", Proceedings of the 2000 IEEE
+             Infocom Conference, Tel-Aviv, Israel, March, 2000.  URL
+             "http://www.cs.ucsd.edu/~savage/".
+
+   [D02]     T. Dunigan, "Floyd's TCP slow-start and AIMD mods", 2002.
+             URL "http://www.csm.ornl.gov/~dunigan/net100/floyd.html".
+
+   [F02]     S. Floyd, "Performance Problems with TCP's Slow-Start",
+             2002.  URL "http://www.icir.org/floyd/hstcp/slowstart/".
+
+   [G00]     A. Gurtov, "TCP Performance in the Presence of Congestion
+             and Corruption Losses", Master's Thesis, University of
+             Helsinki, Department of Computer Science, Helsinki,
+             December 2000.  URL
+             "http://www.cs.helsinki.fi/u/gurtov/papers/ms_thesis.html".
+
+
+
+
+
+Floyd                         Experimental                      [Page 5]
+
+RFC 3742     TCP's Slow-Start with Large Congestion Windows   March 2004
+
+
+   [H96]     J. C. Hoe, "Improving the Start-up Behavior of a Congestion
+             Control Scheme for TCP", SIGCOMM 96, 1996.  URL
+             "http://www.acm.org/sigcomm/sigcomm96/program.html".
+
+   [KCRP99]  J. Kulik, R. Coulter, D. Rockwell, and C. Partridge, "A
+             Simulation Study of Paced TCP", BBN Technical Memorandum
+             No. 1218, 1999.  URL
+             "http://www.ir.bbn.com/documents/techmemos/index.html".
+
+   [MM96]    M. Mathis and J. Mahdavi, "Forward Acknowledgment: Refining
+             TCP Congestion Control", SIGCOMM, August 1996.
+
+   [NS]      The Network Simulator (NS). URL
+             "http://www.isi.edu/nsnam/ns/".
+
+   [VEGAS]   Vegas Web Page, University of Arizona.  URL
+             "http://www.cs.arizona.edu/protocols/".
+
+   [VH97]    Vikram Visweswaraiah and John Heidemann, "Rate Based Pacing
+             for TCP", 1997.  URL
+             "http://www.isi.edu/lsam/publications/rate_based_pacing/".
+
+   [WS95]    G. Wright and W. Stevens, "TCP/IP Illustrated", Volume 2,
+             Addison-Wesley Publishing Company, 1995.
+
+Authors' Address
+
+   Sally Floyd
+   ICIR (ICSI Center for Internet Research)
+
+   Phone: +1 (510) 666-2989
+   EMail: floyd@icir.org
+   URL: http://www.icir.org/floyd/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Floyd                         Experimental                      [Page 6]
+
+RFC 3742     TCP's Slow-Start with Large Congestion Windows   March 2004
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2004).  This document is subject
+   to the rights, licenses and restrictions contained in BCP 78 and
+   except as set forth therein, the authors retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
+   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
+   INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
+   IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
+   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed
+   to pertain to the implementation or use of the technology
+   described in this document or the extent to which any license
+   under such rights might or might not be available; nor does it
+   represent that it has made any independent effort to identify any
+   such rights.  Information on the procedures with respect to
+   rights in RFC documents can be found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use
+   of such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository
+   at http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention
+   any copyrights, patents or patent applications, or other
+   proprietary rights that may cover technology that may be required
+   to implement this standard.  Please address the information to the
+   IETF at ietf-ipr@ietf.org.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+Floyd                         Experimental                      [Page 7]
+
--- a/kernel/picotcp/RFC/rfc3782.txt
+++ b/kernel/picotcp/RFC/rfc3782.txt
--- a/kernel/picotcp/RFC/rfc3819.txt
+++ b/kernel/picotcp/RFC/rfc3819.txt
--- a/kernel/picotcp/RFC/rfc3927.txt
+++ b/kernel/picotcp/RFC/rfc3927.txt
--- a/kernel/picotcp/RFC/rfc4015.txt
+++ b/kernel/picotcp/RFC/rfc4015.txt
@ -0,0 +1,731 @@
+
+
+
+
+
+
+Network Working Group                                          R. Ludwig
+Request for Comments: 4015                             Ericsson Research
+Category: Standards Track                                      A. Gurtov
+                                                                    HIIT
+                                                           February 2005
+
+
+                  The Eifel Response Algorithm for TCP
+
+Status of This Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2005).
+
+Abstract
+
+   Based on an appropriate detection algorithm, the Eifel response
+   algorithm provides a way for a TCP sender to respond to a detected
+   spurious timeout.  It adapts the retransmission timer to avoid
+   further spurious timeouts and (depending on the detection algorithm)
+   can avoid the often unnecessary go-back-N retransmits that would
+   otherwise be sent.  In addition, the Eifel response algorithm
+   restores the congestion control state in such a way that packet
+   bursts are avoided.
+
+1.  Introduction
+
+   The Eifel response algorithm relies on a detection algorithm such as
+   the Eifel detection algorithm, defined in [RFC3522].  That document
+   contains informative background and motivation context that may be
+   useful for implementers of the Eifel response algorithm, but it is
+   not necessary to read [RFC3522] in order to implement the Eifel
+   response algorithm.  Note that alternative response algorithms have
+   been proposed [BA02] that could also rely on the Eifel detection
+   algorithm, and alternative detection algorithms have been proposed
+   [RFC3708], [SK04] that could work together with the Eifel response
+   algorithm.
+
+   Based on an appropriate detection algorithm, the Eifel response
+   algorithm provides a way for a TCP sender to respond to a detected
+   spurious timeout.  It adapts the retransmission timer to avoid
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 1]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   further spurious timeouts and (depending on the detection algorithm)
+   can avoid the often unnecessary go-back-N retransmits that would
+   otherwise be sent.  In addition, the Eifel response algorithm
+   restores the congestion control state in such a way that packet
+   bursts are avoided.
+
+      Note: A previous version of the Eifel response algorithm also
+      included a response to a detected spurious fast retransmit.
+      However, as a consensus was not reached about how to adapt the
+      duplicate acknowledgement threshold in that case, that part of the
+      algorithm was removed for the time being.
+
+1.1.  Terminology
+
+   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
+   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
+   document, are to be interpreted as described in [RFC2119].
+
+   We refer to the first-time transmission of an octet as the 'original
+   transmit'.  A subsequent transmission of the same octet is referred
+   to as a 'retransmit'.  In most cases, this terminology can also be
+   applied to data segments.  However, when repacketization occurs, a
+   segment can contain both first-time transmissions and retransmissions
+   of octets.  In that case, this terminology is only consistent when
+   applied to octets.  For the Eifel detection and response algorithms,
+   this makes no difference, as they also operate correctly when
+   repacketization occurs.
+
+   We use the term 'acceptable ACK' as defined in [RFC793].  That is an
+   ACK that acknowledges previously unacknowledged data.  We use the
+   term 'bytes_acked' to refer to the amount (in terms of octets) of
+   previously unacknowledged data that is acknowledged by the most
+   recently received acceptable ACK.  We use the TCP sender state
+   variables 'SND.UNA' and 'SND.NXT' as defined in [RFC793].  SND.UNA
+   holds the segment sequence number of the oldest outstanding segment.
+   SND.NXT holds the segment sequence number of the next segment the TCP
+   sender will (re-)transmit.  In addition, we define as 'SND.MAX' the
+   segment sequence number of the next original transmit to be sent.
+   The definition of SND.MAX is equivalent to the definition of
+   'snd_max' in [WS95].
+
+   We use the TCP sender state variables 'cwnd' (congestion window), and
+   'ssthresh' (slow-start threshold), and the term 'FlightSize' as
+   defined in [RFC2581].  FlightSize is the amount (in terms of octets)
+   of outstanding data at a given point in time.  We use the term
+   'Initial Window' (IW) as defined in [RFC3390].  The IW is the size of
+   the sender's congestion window after the three-way handshake is
+   completed.  We use the TCP sender state variables 'SRTT' and
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 2]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   'RTTVAR', and the terms 'RTO' and 'G' as defined in [RFC2988].  G is
+   the clock granularity of the retransmission timer.  In addition, we
+   assume that the TCP sender maintains the value of the latest round-
+   trip time (RTT) measurement in the (local) variable 'RTT-SAMPLE'.
+
+   We use the TCP sender state variable 'T_last', and the term 'tcpnow'
+   as used in [RFC2861].  T_last holds the system time when the TCP
+   sender sent the last data segment, whereas tcpnow is the TCP sender's
+   current system time.
+
+2.  Appropriate Detection Algorithms
+
+   If the Eifel response algorithm is implemented at the TCP sender, it
+   MUST be implemented together with a detection algorithm that is
+   specified in a standards track or experimental RFC.
+
+   Designers of detection algorithms who want their algorithms to work
+   together with the Eifel response algorithm should reuse the variable
+   "SpuriousRecovery" with the semantics and defined values specified in
+   [RFC3522].  In addition, we define the constant LATE_SPUR_TO (set
+   equal to -1) as another possible value of the variable
+   SpuriousRecovery.  Detection algorithms should set the value of
+   SpuriousRecovery to LATE_SPUR_TO if the detection of a spurious
+   retransmit is based on the ACK for the retransmit (as opposed to an
+   ACK for an original transmit).  For example, this applies to
+   detection algorithms that are based on the DSACK option [RFC3708].
+
+3.  The Eifel Response Algorithm
+
+   The complete algorithm is specified in section 3.1.  In sections 3.2
+   - 3.6, we discuss the different steps of the algorithm.
+
+3.1.  The Algorithm
+
+   Given that a TCP sender has enabled a detection algorithm that
+   complies with the requirements set in Section 2, a TCP sender MAY use
+   the Eifel response algorithm as defined in this subsection.
+
+   If the Eifel response algorithm is used, the following steps MUST be
+   taken by the TCP sender, but only upon initiation of a timeout-based
+   loss recovery.  That is when the first timeout-based retransmit is
+   sent.  The algorithm MUST NOT be reinitiated after a timeout-based
+   loss recovery has already been started but not completed.  In
+   particular, it may not be reinitiated upon subsequent timeouts for
+   the same segment, or upon retransmitting segments other than the
+   oldest outstanding segment.
+
+
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 3]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   (0)     Before the variables cwnd and ssthresh get updated when
+           loss recovery is initiated, set a "pipe_prev" variable as
+           follows:
+               pipe_prev <- max (FlightSize, ssthresh)
+
+           Set a "SRTT_prev" variable and a "RTTVAR_prev" variable as
+           follows:
+               SRTT_prev <- SRTT + (2 * G)
+               RTTVAR_prev <- RTTVAR
+
+   (DET)   This is a placeholder for a detection algorithm that must
+           be executed at this point, and that sets the variable
+           SpuriousRecovery as outlined in Section 2.  If
+           [RFC3522] is used as the detection algorithm, steps (1) -
+           (6) of that algorithm go here.
+
+   (7)     If SpuriousRecovery equals SPUR_TO, then
+               proceed to step (8);
+
+           else if SpuriousRecovery equals LATE_SPUR_TO, then
+               proceed to step (9);
+
+           else
+               proceed to step (DONE).
+
+   (8)     Resume the transmission with previously unsent data:
+
+           Set
+               SND.NXT <- SND.MAX
+
+   (9)     Reverse the congestion control state:
+
+           If the acceptable ACK has the ECN-Echo flag [RFC3168] set,
+           then
+               proceed to step (DONE);
+
+           else set
+               cwnd <- FlightSize + min (bytes_acked, IW)
+               ssthresh <- pipe_prev
+
+           Proceed to step (DONE).
+
+   (10)    Interworking with Congestion Window Validation:
+
+           If congestion window validation is implemented according
+           to [RFC2861], then set
+               T_last <- tcpnow
+
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 4]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   (11)    Adapt the conservativeness of the retransmission timer:
+
+           Upon the first RTT-SAMPLE taken from new data; i.e., the
+           first RTT-SAMPLE that can be derived from an acceptable
+           ACK for data that was previously unsent when the spurious
+           timeout occurred,
+
+               if the retransmission timer is implemented according
+               to [RFC2988], then set
+                     SRTT   <- max (SRTT_prev, RTT-SAMPLE)
+                     RTTVAR <- max (RTTVAR_prev, RTT-SAMPLE/2)
+                     RTO    <- SRTT + max (G, 4*RTTVAR)
+
+                     Run the bounds check on the RTO (rules (2.4) and
+                     (2.5) in [RFC2988]), and restart the
+                     retransmission timer;
+
+               else
+                     appropriately adapt the conservativeness of the
+                     retransmission timer that is implemented.
+
+   (DONE)  No further processing.
+
+3.2.  Storing the Current Congestion Control State (Step 0)
+
+   The TCP sender stores in pipe_prev what is considered a safe slow-
+   start threshold (ssthresh) before loss recovery is initiated; i.e.,
+   before the loss indication is taken into account.  This is either the
+   current FlightSize, if the TCP sender is in congestion avoidance, or
+   the current ssthresh, if the TCP sender is in slow-start.  If the TCP
+   sender later detects that it has entered loss recovery unnecessarily,
+   then pipe_prev is used in step (9) to reverse the congestion control
+   state.  Thus, until the loss recovery phase is terminated, pipe_prev
+   maintains a memory of the congestion control state of the time right
+   before the loss recovery phase was initiated.  A similar approach is
+   proposed in [RFC2861], where this state is stored in ssthresh
+   directly after a TCP sender has become idle or application limited.
+
+   There had been debates about whether the value of pipe_prev should be
+   decayed over time; e.g., upon subsequent timeouts for the same
+   outstanding segment.  We do not require decaying pipe_prev for the
+   Eifel response algorithm and do not believe that such a conservative
+   approach should be in place.  Instead, we follow the idea of
+   revalidating the congestion window through slow-start, as suggested
+   in [RFC2861].  That is, in step (9), the cwnd is reset to a value
+   that avoids large packet bursts, and ssthresh is reset to the value
+   of pipe_prev.  Note that [RFC2581] and [RFC2861] also do not require
+
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 5]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   a decaying of ssthresh after it has been reset in response to a loss
+   indication, or after a TCP sender has become idle or application
+   limited.
+
+3.3.  Suppressing the Unnecessary go-back-N Retransmits (Step 8)
+
+   Without the use of the TCP timestamps option [RFC1323], the TCP
+   sender suffers from the retransmission ambiguity problem [Zh86],
+   [KP87].  Therefore, when the first acceptable ACK arrives after a
+   spurious timeout, the TCP sender must assume that this ACK was sent
+   in response to the retransmit when in fact it was sent in response to
+   an original transmit.  Furthermore, the TCP sender must further
+   assume that all other segments that were outstanding at that point
+   were lost.
+
+      Note: Except for certain cases where original ACKs were lost, the
+      first acceptable ACK cannot carry a DSACK option [RFC2883].
+
+   Consequently, once the TCP sender's state has been updated after the
+   first acceptable ACK has arrived, SND.NXT equals SND.UNA.  This is
+   what causes the often unnecessary go-back-N retransmits.  From that
+   point on every arriving acceptable ACK that was sent in response to
+   an original transmit will advance SND.NXT.  But as long as SND.NXT is
+   smaller than the value that SND.MAX had when the timeout occurred,
+   those ACKs will clock out retransmits, whether or not the
+   corresponding original transmits were lost.
+
+   In fact, during this phase the TCP sender breaks 'packet
+   conservation' [Jac88].  This is because the go-back-N retransmits are
+   sent during slow-start.  For each original transmit leaving the
+   network, two retransmits are sent into the network as long as SND.NXT
+   does not equal SND.MAX (see [LK00] for more detail).
+
+   Once a spurious timeout has been detected (upon receipt of an ACK for
+   an original transmit), it is safe to let the TCP sender resume the
+   transmission with previously unsent data.  Thus, the Eifel response
+   algorithm changes the TCP sender's state by setting SND.NXT to
+   SND.MAX.  Note that this step is only executed if the variable
+   SpuriousRecovery equals SPUR_TO, which in turn requires a detection
+   algorithm such as the Eifel detection algorithm [RFC3522] or the F-
+   RTO algorithm [SK04] that detects a spurious retransmit based upon
+   receiving an ACK for an original transmit (as opposed to the ACK for
+   the retransmit [RFC3708]).
+
+
+
+
+
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 6]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+3.4.  Reversing the Congestion Control State (Step 9)
+
+   When a TCP sender enters loss recovery, it reduces cwnd and ssthresh.
+   However, once the TCP sender detects that the loss recovery has been
+   falsely triggered, this reduction proves unnecessary.  We therefore
+   believe that it is safe to revert to the previous congestion control
+   state, following the approach of revalidating the congestion window
+   as outlined below.  This is unless the acceptable ACK signals
+   congestion through the ECN-Echo flag [RFC3168].  In that case, the
+   TCP sender MUST refrain from reversing congestion control state.
+
+   If the ECN-Echo flag is not set, cwnd is reset to the sum of the
+   current FlightSize and the minimum of bytes_acked and IW.  In some
+   cases, this can mean that the first few acceptable ACKs that arrive
+   will not clock out any data segments.  Recall that bytes_acked is the
+   number of bytes that have been acknowledged by the acceptable ACK.
+   Note that the value of cwnd must not be changed any further for that
+   ACK, and that the value of FlightSize at this point in time may be
+   different from the value of FlightSize in step (0).  The value of IW
+   puts a limit on the size of the packet burst that the TCP sender may
+   send into the network after the Eifel response algorithm has
+   terminated.  The value of IW is considered an acceptable burst size.
+   It is the amount of data that a TCP sender may send into a yet
+   "unprobed" network at the beginning of a connection.
+
+   Then ssthresh is reset to the value of pipe_prev.  As a result, the
+   TCP sender either immediately resumes probing the network for more
+   bandwidth in congestion avoidance, or it slow-starts to what is
+   considered a safe operating point for the congestion window.
+
+3.5.  Interworking with the CWV Algorithm (Step 10)
+
+   An implementation of the Congestion Window Validation (CWV) algorithm
+   [RFC2861] could potentially misinterpret a delay spike that caused a
+   spurious timeout as a phase where the TCP sender had been idle.
+   Therefore, T_last is reset to prevent the triggering of the CWV
+   algorithm in this case.
+
+      Note: The term 'idle' implies that the TCP sender has no data
+      outstanding; i.e., all data sent has been acknowledged [Jac88].
+      According to this definition, a TCP sender is not idle while it is
+      waiting for an acceptable ACK after a timeout.  Unfortunately, the
+      pseudo-code in [RFC2861] does not include a check for the
+      condition "idle" (SND.UNA == SND.MAX).  We therefore had to add
+      step (10) to the Eifel response algorithm.
+
+
+
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 7]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+3.6.  Adapting the Retransmission Timer (Step 11)
+
+   There is currently only one retransmission timer standardized for TCP
+   [RFC2988].  We therefore only address that timer explicitly.  Future
+   standards that might define alternatives to [RFC2988] should propose
+   similar measures to adapt the conservativeness of the retransmission
+   timer.
+
+   A spurious timeout often results from a delay spike, which is a
+   sudden increase of the RTT that usually cannot be predicted.  After a
+   delay spike, the RTT may have changed permanently; e.g., due to a
+   path change, or because the available bandwidth on a bandwidth-
+   dominated path has decreased.  This may often occur with wide-area
+   wireless access links.  In this case, the RTT estimators (SRTT and
+   RTTVAR) should be reinitialized from the first RTT-SAMPLE taken from
+   new data according to rule (2.2) of [RFC2988].  That is, from the
+   first RTT-SAMPLE that can be derived from an acceptable ACK for data
+   that was previously unsent when the spurious timeout occurred.
+
+   However, a delay spike may only indicate a transient phase, after
+   which the RTT returns to its previous range of values, or even to
+   smaller values.  Also, a spurious timeout may occur because the TCP
+   sender's RTT estimators were only inaccurate enough that the
+   retransmission timer expires "a tad too early".  We believe that two
+   times the clock granularity of the retransmission timer (2 * G) is a
+   reasonable upper bound on "a tad too early".  Thus, when the new RTO
+   is calculated in step (11), we ensure that it is at least (2 * G)
+   greater (see also step (0)) than the RTO was before the spurious
+   timeout occurred.
+
+   Note that other TCP sender processing will usually take place between
+   steps (10) and (11).  During this phase (i.e., before step (11) has
+   been reached), the RTO is managed according to the rules of
+   [RFC2988].  We believe that this is sufficiently conservative for the
+   following reasons.  First, the retransmission timer is restarted upon
+   the acceptable ACK that was used to detect the spurious timeout.  As
+   a result, the delay spike is already implicitly factored in for
+   segments outstanding at that time.  This is discussed in more detail
+   in [EL04], where this effect is called the "RTO offset".
+   Furthermore, if timestamps are enabled, a new and valid RTT-SAMPLE
+   can be derived from that acceptable ACK.  This RTT-SAMPLE must be
+   relatively large, as it includes the delay spike that caused the
+   spurious timeout.  Consequently, the RTT estimators will be updated
+   rather conservatively.  Without timestamps the RTO will stay
+   conservatively backed-off due to Karn's algorithm [RFC2988] until the
+   first RTT-SAMPLE can be derived from an acceptable ACK for data that
+   was previously unsent when the spurious timeout occurred.
+
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 8]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   For the new RTO to become effective, the retransmission timer has to
+   be restarted.  This is consistent with [RFC2988], which recommends
+   restarting the retransmission timer with the arrival of an acceptable
+   ACK.
+
+4.  Advanced Loss Recovery is Crucial for the Eifel Response Algorithm
+
+   We have studied environments where spurious timeouts and multiple
+   losses from the same flight of packets often coincide [GL02], [GL03].
+   In such a case, the oldest outstanding segment arrives at the TCP
+   receiver, but one or more packets from the remaining outstanding
+   flight are lost.  In those environments, end-to-end performance
+   suffers if the Eifel response algorithm is operated without an
+   advanced loss recovery scheme such as a SACK-based scheme [RFC3517]
+   or NewReno [RFC3782].  The reason is TCP-Reno's aggressiveness after
+   a spurious timeout.  Even though TCP-Reno breaks 'packet
+   conservation' (see Section 3.3) when blindly retransmitting all
+   outstanding segments, it usually recovers all packets lost from that
+   flight within a single round-trip time.  On the contrary, the more
+   conservative TCP-Reno-with-Eifel is often forced into another
+   timeout.  Thus, we recommend that the Eifel response algorithm always
+   be operated in combination with [RFC3517] or [RFC3782].  Additional
+   robustness is achieved with the Limited Transmit and Early Retransmit
+   algorithms [RFC3042], [AAAB04].
+
+      Note: The SACK-based scheme we used for our simulations in [GL02]
+      and [GL03] is different from the SACK-based scheme that later got
+      standardized [RFC3517].  The key difference is that [RFC3517] is
+      more robust to multiple losses from the same flight.  It is less
+      conservative in declaring that a packet has left the network, and
+      is therefore less dependent on timeouts to recover genuine packet
+      losses.
+
+   If the NewReno algorithm [RFC3782] is used in combination with the
+   Eifel response algorithm, step (1) of the NewReno algorithm SHOULD be
+   modified as follows, but only if SpuriousRecovery equals SPUR_TO:
+
+      (1)  Three duplicate ACKs:
+           When the third duplicate ACK is received and the sender is
+           not already in the Fast Recovery procedure, go to step 1A.
+
+   That is, the entire step 1B of the NewReno algorithm is obsolete
+   because step (8) of the Eifel response algorithm avoids the case
+   where three duplicate ACKs result from unnecessary go-back-N
+   retransmits after a timeout.  Step (8) of the Eifel response
+   algorithm avoids such unnecessary go-back-N retransmits in the first
+   place.  However, recall that step (8) is only executed if the
+   variable SpuriousRecovery equals SPUR_TO, which in turn requires a
+
+
+
+Ludwig & Gurtov             Standards Track                     [Page 9]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   detection algorithm, such as the Eifel detection algorithm [RFC3522]
+   or the F-RTO algorithm [SK04], that detects a spurious retransmit
+   based upon receiving an ACK for an original transmit (as opposed to
+   the ACK for the retransmit [RFC3708]).
+
+5.  Security Considerations
+
+   There is a risk that a detection algorithm is fooled by spoofed ACKs
+   that make genuine retransmits appear to the TCP sender as spurious
+   retransmits.  When such a detection algorithm is run together with
+   the Eifel response algorithm, this could effectively disable
+   congestion control at the TCP sender.  Should this become a concern,
+   the Eifel response algorithm SHOULD only be run together with
+   detection algorithms that are known to be safe against such "ACK
+   spoofing attacks".
+
+   For example, the safe variant of the Eifel detection algorithm
+   [RFC3522], is a reliable method to protect against this risk.
+
+6.  Acknowledgements
+
+   Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan
+   Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Pasi
+   Sarolahti, Alexey Kuznetsov, and Yogesh Swami for many discussions
+   that contributed to this work.
+
+7.  References
+
+7.1.  Normative References
+
+   [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
+             Control", RFC 2581, April 1999.
+
+   [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
+             Initial Window", RFC 3390, October 2002.
+
+   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+             Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
+             Modification to TCP's Fast Recovery Algorithm", RFC 3782,
+             April 2004.
+
+   [RFC2861] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion
+             Window Validation", RFC 2861, June 2000.
+
+   [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for
+             TCP", RFC 3522, April 2003.
+
+
+
+Ludwig & Gurtov             Standards Track                    [Page 10]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
+             Timer", RFC 2988, November 2000.
+
+   [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
+             793, September 1981.
+
+   [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
+             Explicit Congestion Notification (ECN) to IP", RFC 3168,
+             September 2001.
+
+7.2.  Informative References
+
+   [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
+             TCP's Loss Recovery Using Limited Transmit", RFC 3042,
+             January 2001.
+
+   [AAAB04]  Allman, M., Avrachenkov, K., Ayesta, U., and J. Blanton,
+             Early Retransmit for TCP and SCTP, Work in Progress, July
+             2004.
+
+   [BA02]    Blanton, E. and M. Allman, On Making TCP More Robust to
+             Packet Reordering, ACM Computer Communication Review, Vol.
+             32, No. 1, January 2002.
+
+   [RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective
+             Acknowledgement (DSACKs) and Stream Control Transmission
+             Protocol (SCTP) Duplicate Transmission Sequence Numbers
+             (TSNs) to Detect Spurious Retransmissions", RFC 3708,
+             February 2004.
+
+   [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
+             Conservative Selective Acknowledgment (SACK)-based Loss
+             Recovery Algorithm for TCP", RFC 3517, April 2003.
+
+   [EL04]    Ekstrom, H. and R. Ludwig, The Peak-Hopper: A New End-to-
+             End Retransmission Timer for Reliable Unicast Transport, In
+             Proceedings of IEEE INFOCOM 04, March 2004.
+
+   [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
+             Extension to the Selective Acknowledgement (SACK) Option
+             for TCP", RFC 2883, July 2000.
+
+   [GL02]    Gurtov, A. and R. Ludwig, Evaluating the Eifel Algorithm
+             for TCP in a GPRS Network, In Proceedings of the European
+             Wireless Conference, February 2002.
+
+   [GL03]    Gurtov, A. and R. Ludwig, Responding to Spurious Timeouts
+             in TCP, In Proceedings of IEEE INFOCOM 03, April 2003.
+
+
+
+Ludwig & Gurtov             Standards Track                    [Page 11]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+   [Jac88]   Jacobson, V., Congestion Avoidance and Control, In
+             Proceedings of ACM SIGCOMM 88.
+
+   [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
+             for High Performance", RFC 1323, May 1992.
+
+   [KP87]    Karn, P. and C. Partridge, Improving Round-Trip Time
+             Estimates in Reliable Transport Protocols, In Proceedings
+             of ACM SIGCOMM 87.
+
+   [LK00]    Ludwig, R. and R. H. Katz, The Eifel Algorithm: Making TCP
+             Robust Against Spurious Retransmissions, ACM Computer
+             Communication Review, Vol. 30, No. 1, January 2000.
+
+   [SK04]    Sarolahti, P. and M. Kojo, F-RTO: An Algorithm for
+             Detecting Spurious Retransmission Timeouts with TCP and
+             SCTP, Work in Progress, November 2004.
+
+   [WS95]    Wright, G. R. and W. R. Stevens, TCP/IP Illustrated, Volume
+             2 (The Implementation), Addison Wesley, January 1995.
+
+   [Zh86]    Zhang, L., Why TCP Timers Don't Work Well, In Proceedings
+             of ACM SIGCOMM 88.
+
+Authors' Addresses
+
+   Reiner Ludwig
+   Ericsson Research (EDD)
+   Ericsson Allee 1
+   52134 Herzogenrath, Germany
+
+   EMail: Reiner.Ludwig@ericsson.com
+
+
+   Andrei Gurtov
+   Helsinki Institute for Information Technology (HIIT)
+   P.O. Box 9800, FIN-02015
+   HUT, Finland
+
+   EMail: andrei.gurtov@cs.helsinki.fi
+   Homepage: http://www.cs.helsinki.fi/u/gurtov
+
+
+
+
+
+
+
+
+
+
+Ludwig & Gurtov             Standards Track                    [Page 12]
+
+RFC 4015          The Eifel Response Algorithm for TCP     February 2005
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2005).
+
+   This document is subject to the rights, licenses and restrictions
+   contained in BCP 78, and except as set forth therein, the authors
+   retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; nor does it represent that it has
+   made any independent effort to identify any such rights.  Information
+   on the IETF's procedures with respect to rights in IETF Documents can
+   be found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use of
+   such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository at
+   http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention any
+   copyrights, patents or patent applications, or other proprietary
+   rights that may cover technology that may be required to implement
+   this standard.  Please address the information to the IETF at ietf-
+   ipr@ietf.org.
+
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+Ludwig & Gurtov             Standards Track                    [Page 13]
+
--- a/kernel/picotcp/RFC/rfc4022.txt
+++ b/kernel/picotcp/RFC/rfc4022.txt
--- a/kernel/picotcp/RFC/rfc4138.txt
+++ b/kernel/picotcp/RFC/rfc4138.txt
--- a/kernel/picotcp/RFC/rfc4278.txt
+++ b/kernel/picotcp/RFC/rfc4278.txt
@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group                                        S. Bellovin
+Request for Comments: 4278                            AT&T Labs Research
+Category: Informational                                         A. Zinin
+                                                                 Alcatel
+                                                            January 2006
+
+
+  Standards Maturity Variance Regarding the TCP MD5 Signature Option
+                 (RFC 2385) and the BGP-4 Specification
+
+Status of This Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2006).
+
+Abstract
+
+   The IETF Standards Process requires that all normative references for
+   a document be at the same or higher level of standardization.  RFC
+   2026 section 9.1 allows the IESG to grant a variance to the standard
+   practices of the IETF.  This document explains why the IESG is
+   considering doing so for the revised version of the BGP-4
+   specification, which refers normatively to RFC 2385, "Protection of
+   BGP Sessions via the TCP MD5 Signature Option".  RFC 2385 will remain
+   at the Proposed Standard level.
+
+1.  Introduction
+
+   The IETF Standards Process [RFC2026] requires that all normative
+   references for a document be at the same or higher level of
+   standardization.  RFC 2026 section 9.1 allows the IESG to grant a
+   variance to the standard practices of the IETF.  Pursuant to that, it
+   is considering publishing the updated BGP-4 specification [RFC4271]
+   as Draft Standard, despite the normative reference to [RFC2385],
+   "Protection of BGP Sessions via the TCP MD5 Signature Option".  RFC
+   2385 will remain a Proposed Standard.  (Note that although the title
+   of [RFC2385] includes the word "signature", the technology described
+   in it is commonly known as a Message Authentication Code or MAC, and
+   should not be confused with digital signature technologies.)
+
+   [RFC2385], which is widely implemented, is the only transmission
+   security mechanism defined for BGP-4.  Other possible mechanisms,
+   such as IPsec [RFC2401] and TLS [RFC2246], are rarely, if ever, used
+
+
+
+Bellovin & Zinin             Informational                      [Page 1]
+
+RFC 4278    Standards Maturity Variance: RFC 2385 and BGP-4 January 2006
+
+
+   for this purpose.  Given the long-standing requirement for security
+   features in protocols, it is not possible to advance BGP-4 without a
+   mandated security mechanism.
+
+   The conflict of maturity levels between specifications would normally
+   be resolved by advancing the specification being referred to along
+   the standards track, to the level of maturity that the referring
+   specification needs to achieve.  However, in the particular case
+   considered here, the IESG believes that [RFC2385], though adequate
+   for BGP deployments at this moment, is not strong enough for general
+   use, and thus should not be progressed along the standards track.  In
+   this situation, the IESG believes that variance procedure should be
+   used to allow the updated BGP-4 specification to be published as
+   Draft Standard.
+
+   The following sections of the document give detailed explanations of
+   the statements above.
+
+2.  Draft Standard Requirements
+
+   The requirements for Proposed Standards and Draft Standards are given
+   in [RFC2026].  For Proposed Standards, [RFC2026] warns that:
+
+        Implementors should treat Proposed Standards as immature
+        specifications.  It is desirable to implement them in order to
+        gain experience and to validate, test, and clarify the
+        specification.  However, since the content of Proposed Standards
+        may be changed if problems are found or better solutions are
+        identified, deploying implementations of such standards into a
+        disruption-sensitive environment is not recommended.
+
+   In other words, it is considered reasonable for flaws to be
+   discovered in Proposed Standards.
+
+   The requirements for Draft Standards are higher:
+
+        A Draft Standard must be well-understood and known to be quite
+        stable, both in its semantics and as a basis for developing an
+        implementation.
+
+   In other words, any document that has known deficiencies should not
+   be promoted to Draft Standard.
+
+
+
+
+
+
+
+
+
+Bellovin & Zinin             Informational                      [Page 2]
+
+RFC 4278    Standards Maturity Variance: RFC 2385 and BGP-4 January 2006
+
+
+3.  The TCP MD5 Signature Option
+
+   [RFC2385], despite its 1998 publication date, describes a Message
+   Authentication Code (MAC) that is considerably older.  It utilizes a
+   technique known as a "keyed hash function", using MD5 [RFC1321] as
+   the hash function.  When the original code was developed, this was
+   believed to be a reasonable technique, especially if the key was
+   appended (rather than prepended) to the data being protected.  But
+   cryptographic hash functions were never intended for use as MACs, and
+   later cryptanalytic results showed that the construct was not as
+   strong as originally believed [PV1, PV2].  Worse yet, the underlying
+   hash function, MD5, has shown signs of weakness [Dobbertin, Wang].
+   Accordingly, the IETF community has adopted Hashed Message
+   Authentication Code (HMAC) [RFC2104], a scheme with provable security
+   properties, as its standard MAC.
+
+   Beyond that, [RFC2385] does not include any sort of key management
+   technique.  Common practice is to use a password as a shared secret
+   between pairs of sites, but this is not a good idea [RFC3562].
+
+   Other problems are documented in [RFC2385] itself, including the lack
+   of a type code or version number, and the inability of systems using
+   this scheme to accept certain TCP resets.
+
+   Despite the widespread deployment of [RFC2385] in BGP deployments,
+   the IESG has thus concluded that it is not appropriate for use in
+   other contexts.  [RFC2385] is not suitable for advancement to Draft
+   Standard.
+
+4.  Usage Patterns for RFC 2385
+
+   Given the above analysis, it is reasonable to ask why [RFC2385] is
+   still used for BGP.  The answer lies in the deployment patterns
+   peculiar to BGP.
+
+   BGP connections inherently tend to travel over short paths.  Indeed,
+   most external BGP links are one hop.  Although internal BGP sessions
+   are usually multi-hop, the links involved are generally inhabited
+   only by routers rather than general-purpose computers; general-
+   purpose computers are easier for attackers to use as TCP hijacking
+   tools [Joncheray].
+
+   Also, BGP peering associations tend to be long-lived and static.  By
+   contrast, many other security situations are more dynamic.
+
+   This is not to say that such attacks cannot happen.  (If they
+   couldn't happen at all, there would be no point to any security
+   measures.)  Attackers could divert links at layers 1 or 2, or they
+
+
+
+Bellovin & Zinin             Informational                      [Page 3]
+
+RFC 4278    Standards Maturity Variance: RFC 2385 and BGP-4 January 2006
+
+
+   could (in some situations) use Address Resolution Protocol (ARP)
+   spoofing at Ethernet-based exchange points.  Still, on balance, BGP
+   is employed in an environment that is less susceptible to this sort
+   of attack.
+
+   There is another class of attack against which BGP is extremely
+   vulnerable:  false route advertisements from more than one autonomous
+   system (AS) hop away.  However, neither [RFC2385] nor any other
+   transmission security mechanism can block such attacks.  Rather, a
+   scheme such as S-BGP [Kent] would be needed.
+
+5.  LDP
+
+   The Label Distribution Protocol (LDP) [RFC3036] also uses [RFC2385].
+   Deployment practices for LDP are very similar to those of BGP: LDP
+   connections are usually confined within a single autonomous system
+   and most frequently span a single link between two routers.  This
+   makes the LDP threat environment very similar to BGP's.  Given this,
+   and a considerable installed base of LDP in service provider
+   networks, we are not deprecating [RFC2385] for use with LDP.
+
+6.  Security Considerations
+
+   The IESG believes that the variance described here will not adversely
+   affect the security of the Internet.
+
+7.  Conclusions
+
+   Given the above analysis, the IESG is persuaded that waiving the
+   prerequisite requirement is the appropriate thing to do.  [RFC2385]
+   is clearly not suitable for Draft Standard.  Other existing
+   mechanisms, such as IPsec, would do its job better.  However, given
+   the current operational practices in service provider networks at the
+   moment -- and in particular the common use of long-lived standard
+   keys, [RFC3562] notwithstanding --  the marginal benefit of such
+   schemes in this situation would be low, and not worth the transition
+   effort.  We would prefer to wait for a security mechanism tailored to
+   the major threat environment for BGP.
+
+8.  Informative References
+
+   [Dobbertin]  H. Dobbertin, "The Status of MD5 After a Recent Attack",
+                RSA Labs' CryptoBytes, Vol. 2 No. 2, Summer 1996.
+
+   [Joncheray]  Joncheray, L.  "A Simple Active Attack Against TCP."
+                Proceedings of the Fifth Usenix Unix Security Symposium,
+                1995.
+
+
+
+
+Bellovin & Zinin             Informational                      [Page 4]
+
+RFC 4278    Standards Maturity Variance: RFC 2385 and BGP-4 January 2006
+
+
+   [Kent]       Kent, S., C. Lynn, and K. Seo.  "Secure Border Gateway
+                Protocol (Secure-BGP)."  IEEE Journal on Selected Areas
+                in Communications, vol. 18, no. 4, April, 2000, pp.
+                582-592.
+
+   [RFC3562]    Leech, M., "Key Management Considerations for the TCP
+                MD5 Signature Option", RFC 3562, July 2003.
+
+   [PV1]        B. Preneel and P. van Oorschot, "MD-x MAC and building
+                fast MACs from hash functions," Advances in Cryptology
+                --- Crypto 95 Proceedings, Lecture Notes in Computer
+                Science Vol. 963, D. Coppersmith, ed., Springer-Verlag,
+                1995.
+
+   [PV2]        B. Preneel and P. van Oorschot, "On the security of two
+                MAC algorithms," Advances in Cryptology --- Eurocrypt 96
+                Proceedings, Lecture Notes in Computer Science, U.
+                Maurer, ed., Springer-Verlag, 1996.
+
+   [RFC1321]    Rivest, R., "The MD5 Message-Digest Algorithm ", RFC
+                1321, April 1992.
+
+   [RFC2026]    Bradner, S., "The Internet Standards Process -- Revision
+                3", BCP 9, RFC 2026, October 1996.
+
+   [RFC2104]    Krawczyk, H., Bellare, M., and R. Canetti, "HMAC:
+                Keyed-Hashing for Message Authentication", RFC 2104,
+                February 1997.
+
+   [RFC2246]    Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
+                RFC 2246, January 1999.
+
+   [RFC2385]    Heffernan, A., "Protection of BGP Sessions via the TCP
+                MD5 Signature Option", RFC 2385, August 1998.
+
+   [RFC2401]    Kent, S. and R. Atkinson, "Security Architecture for the
+                Internet Protocol", RFC 2401, November 1998.
+
+   [RFC3036]    Andersson, L., Doolan, P., Feldman, N., Fredette, A.,
+                and B. Thomas, "LDP Specification", RFC 3036, January
+                2001.
+
+   [RFC4271]    Rekhter, Y., Li, T., and S. Hares, Eds., "A Border
+                Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006.
+
+   [Wang]       Wang, X. and H. Yu, "How to Break MD5 and Other Hash
+                Functions."  Proceedings of Eurocrypt '05, 2005.
+
+
+
+
+Bellovin & Zinin             Informational                      [Page 5]
+
+RFC 4278    Standards Maturity Variance: RFC 2385 and BGP-4 January 2006
+
+
+Authors' Addresses
+
+   Steven M. Bellovin
+   Department of Computer Science
+   Columbia University
+   1214 Amsterdam Avenue, M.C. 0401
+   New York, NY 10027-7003
+
+   Phone: +1 212-939-7149
+   EMail: bellovin@acm.org
+
+
+   Alex Zinin
+   Alcatel
+   701 E Middlefield Rd
+   Mountain View, CA 94043
+
+   EMail: zinin@psg.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Bellovin & Zinin             Informational                      [Page 6]
+
+RFC 4278    Standards Maturity Variance: RFC 2385 and BGP-4 January 2006
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2006).
+
+   This document is subject to the rights, licenses and restrictions
+   contained in BCP 78, and except as set forth therein, the authors
+   retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; nor does it represent that it has
+   made any independent effort to identify any such rights.  Information
+   on the procedures with respect to rights in RFC documents can be
+   found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use of
+   such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository at
+   http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention any
+   copyrights, patents or patent applications, or other proprietary
+   rights that may cover technology that may be required to implement
+   this standard.  Please address the information to the IETF at
+   ietf-ipr@ietf.org.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is provided by the IETF
+   Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Bellovin & Zinin             Informational                      [Page 7]
+
--- a/kernel/picotcp/RFC/rfc4614.txt
+++ b/kernel/picotcp/RFC/rfc4614.txt
--- a/kernel/picotcp/RFC/rfc6762.txt
+++ b/kernel/picotcp/RFC/rfc6762.txt