Commit c980f800 authored by Ondřej Zajíček's avatar Ondřej Zajíček

Merge branch 'bgp-grace'

parents 2e84b4e8 227af309
......@@ -98,6 +98,7 @@ config_alloc(byte *name)
c->load_time = now;
c->tf_route = c->tf_proto = (struct timeformat){"%T", "%F", 20*3600};
c->tf_base = c->tf_log = (struct timeformat){"%F %T", NULL, 0};
c->gr_wait = DEFAULT_GR_WAIT;
return c;
}
......
......@@ -38,6 +38,7 @@ struct config {
struct timeformat tf_proto; /* Time format for 'show protocol' */
struct timeformat tf_log; /* Time format for the logfile */
struct timeformat tf_base; /* Time format for other purposes */
u32 gr_wait; /* Graceful restart wait timeout */
int cli_debug; /* Tracing of CLI connections and commands */
char *err_msg; /* Parser error message */
......
......@@ -157,6 +157,9 @@ options. The most important ones are:
<tag>-f</tag>
run bird in foreground.
<tag>-R</tag>
apply graceful restart recovery after start.
</descrip>
<p>BIRD writes messages about its work to log files or syslog (according to config).
......@@ -187,6 +190,7 @@ configuration, but it is generally easy -- BIRD needs just the
standard library, privileges to read the config file and create the
control socket and the CAP_NET_* capabilities.
<chapt>About routing tables
<p>BIRD has one or more routing tables which may or may not be
......@@ -242,6 +246,20 @@ using comparison and ordering). Minor advantage is that routes are
shown sorted in <cf/show route/, minor disadvantage is that it is
slightly more computationally expensive.
<sect>Graceful restart
<p>When BIRD is started after restart or crash, it repopulates routing tables in
an uncoordinated manner, like after clean start. This may be impractical in some
cases, because if the forwarding plane (i.e. kernel routing tables) remains
intact, then its synchronization with BIRD would temporarily disrupt packet
forwarding until protocols converge. Graceful restart is a mechanism that could
help with this issue. Generally, it works by starting protocols and letting them
repopulate routing tables while deferring route propagation until protocols
acknowledge their convergence. Note that graceful restart behavior have to be
configured for all relevant protocols and requires protocol-specific support
(currently implemented for Kernel and BGP protocols), it is activated for
particular boot by option <cf/-R/.
<chapt>Configuration
......@@ -371,6 +389,12 @@ protocol rip {
would accept IPv6 routes only). Such behavior was default in
older versions of BIRD.
<tag>graceful restart wait <m/number/</tag>
During graceful restart recovery, BIRD waits for convergence of routing
protocols. This option allows to specify a timeout for the recovery to
prevent waiting indefinitely if some protocols cannot converge. Default:
240 seconds.
<tag>timeformat route|protocol|base|log "<m/format1/" [<m/limit/ "<m/format2/"]</tag>
This option allows to specify a format of date/time used by
BIRD. The first argument specifies for which purpose such
......@@ -1493,6 +1517,8 @@ extended communities
(RFC 4360<htmlurl url="ftp://ftp.rfc-editor.org/in-notes/rfc4360.txt">),
route reflectors
(RFC 4456<htmlurl url="ftp://ftp.rfc-editor.org/in-notes/rfc4456.txt">),
graceful restart
(RFC 4724<htmlurl url="ftp://ftp.rfc-editor.org/in-notes/rfc4724.txt">),
multiprotocol extensions
(RFC 4760<htmlurl url="ftp://ftp.rfc-editor.org/in-notes/rfc4760.txt">),
4B AS numbers
......@@ -1502,9 +1528,7 @@ and 4B AS numbers in extended communities
For IPv6, it uses the standard multiprotocol extensions defined in
RFC 2283<htmlurl url="ftp://ftp.rfc-editor.org/in-notes/rfc2283.txt">
including changes described in the
latest draft<htmlurl url="ftp://ftp.rfc-editor.org/internet-drafts/draft-ietf-idr-bgp4-multiprotocol-v2-05.txt">
RFC 4760<htmlurl url="ftp://ftp.rfc-editor.org/in-notes/rfc4760.txt">
and applied to IPv6 according to
RFC 2545<htmlurl url="ftp://ftp.rfc-editor.org/in-notes/rfc2545.txt">.
......@@ -1716,6 +1740,26 @@ for each neighbor using the following configuration parameters:
capability and accepts such requests. Even when disabled, BIRD
can send route refresh requests. Default: on.
<tag>graceful restart <m/switch/|aware</tag>
When a BGP speaker restarts or crashes, neighbors will discard all
received paths from the speaker, which disrupts packet forwarding even
when the forwarding plane of the speaker remains intact. RFC 4724
specifies an optional graceful restart mechanism to alleviate this
issue. This option controls the mechanism. It has three states:
Disabled, when no support is provided. Aware, when the graceful restart
support is announced and the support for restarting neighbors is
provided, but no local graceful restart is allowed (i.e. receiving-only
role). Enabled, when the full graceful restart support is provided
(i.e. both restarting and receiving role). Note that proper support for
local graceful restart requires also configuration of other protocols.
Default: aware.
<tag>graceful restart time <m/number/</tag>
The restart time is announced in the BGP graceful restart capability
and specifies how long the neighbor would wait for the BGP session to
re-establish after a restart before deleting stale routes. Default:
120 seconds.
<tag>interpret communities <m/switch/</tag> RFC 1997 demands
that BGP speaker should process well-known communities like
no-export (65535, 65281) or no-advertise (65535, 65282). For
......@@ -2063,25 +2107,36 @@ overcome using another routing table and the pipe protocol.
<sect1>Configuration
<p><descrip>
<tag>persist <m/switch/</tag> Tell BIRD to leave all its routes in the
routing tables when it exits (instead of cleaning them up).
<tag>scan time <m/number/</tag> Time in seconds between two consecutive scans of the
kernel routing table.
<tag>learn <m/switch/</tag> Enable learning of routes added to the kernel
routing tables by other routing daemons or by the system administrator.
This is possible only on systems which support identification of route
authorship.
<tag>device routes <m/switch/</tag> Enable export of device
routes to the kernel routing table. By default, such routes
are rejected (with the exception of explicitly configured
device routes from the static protocol) regardless of the
export filter to protect device routes in kernel routing table
(managed by OS itself) from accidental overwriting or erasing.
<tag>kernel table <m/number/</tag> Select which kernel table should
this particular instance of the Kernel protocol work with. Available
only on systems supporting multiple routing tables.
<tag>persist <m/switch/</tag>
Tell BIRD to leave all its routes in the routing tables when it exits
(instead of cleaning them up).
<tag>scan time <m/number/</tag>
Time in seconds between two consecutive scans of the kernel routing
table.
<tag>learn <m/switch/</tag>
Enable learning of routes added to the kernel routing tables by other
routing daemons or by the system administrator. This is possible only on
systems which support identification of route authorship.
<tag>device routes <m/switch/</tag>
Enable export of device routes to the kernel routing table. By default,
such routes are rejected (with the exception of explicitly configured
device routes from the static protocol) regardless of the export filter
to protect device routes in kernel routing table (managed by OS itself)
from accidental overwriting or erasing.
<tag>kernel table <m/number/</tag>
Select which kernel table should this particular instance of the Kernel
protocol work with. Available only on systems supporting multiple
routing tables.
<tag>graceful restart <m/switch/</tag>
Participate in graceful restart recovery. If this option is enabled and
a graceful restart recovery is active, the Kernel protocol will defer
synchronization of routing tables until the end of the recovery. Note
that import of kernel routes to BIRD is not affected.
</descrip>
<sect1>Attributes
......
......@@ -32,6 +32,7 @@ Reply codes of BIRD command-line interface
0021 Undo requested
0022 Undo scheduled
0023 Evaluation of expression
0024 Graceful restart status report
1000 BIRD version
1001 Interface list
......
......@@ -36,6 +36,8 @@ typedef struct list { /* In fact two overlayed nodes */
#define NODE_NEXT(n) ((void *)((NODE (n))->next))
#define NODE_VALID(n) ((NODE (n))->next)
#define WALK_LIST(n,list) for(n=HEAD(list); NODE_VALID(n); n=NODE_NEXT(n))
#define WALK_LIST2(n,nn,list,pos) \
for(nn=(list).head; NODE_VALID(nn) && (n=SKIP_BACK(typeof(*n),pos,nn)); nn=nn->next)
#define WALK_LIST_DELSAFE(n,nxt,list) \
for(n=HEAD(list); nxt=NODE_NEXT(n); n=(void *) nxt)
/* WALK_LIST_FIRST supposes that called code removes each processed node */
......
......@@ -7,6 +7,7 @@
*/
#include "nest/bird.h"
#include "nest/protocol.h"
#include "nest/route.h"
#include "nest/cli.h"
#include "conf/conf.h"
......@@ -32,6 +33,8 @@ cmd_show_status(void)
tm_format_datetime(tim, &config->tf_base, config->load_time);
cli_msg(-1011, "Last reconfiguration on %s", tim);
graceful_restart_show_status();
if (shutting_down)
cli_msg(13, "Shutdown in progress");
else if (configuring)
......
......@@ -49,6 +49,7 @@ CF_KEYWORDS(PASSWORD, FROM, PASSIVE, TO, ID, EVENTS, PACKETS, PROTOCOLS, INTERFA
CF_KEYWORDS(PRIMARY, STATS, COUNT, FOR, COMMANDS, PREEXPORT, GENERATE, ROA, MAX, FLUSH, AS)
CF_KEYWORDS(LISTEN, BGP, V6ONLY, DUAL, ADDRESS, PORT, PASSWORDS, DESCRIPTION, SORTED)
CF_KEYWORDS(RELOAD, IN, OUT, MRTDUMP, MESSAGES, RESTRICT, MEMORY, IGP_METRIC, CLASS, DSCP)
CF_KEYWORDS(GRACEFUL, RESTART, WAIT)
CF_ENUM(T_ENUM_RTS, RTS_, DUMMY, STATIC, INHERIT, DEVICE, STATIC_DEVICE, REDIRECT,
RIP, OSPF, OSPF_IA, OSPF_EXT1, OSPF_EXT2, BGP, PIPE)
......@@ -110,6 +111,11 @@ listen_opt:
;
CF_ADDTO(conf, gr_opts)
gr_opts: GRACEFUL RESTART WAIT expr ';' { new_config->gr_wait = $4; } ;
/* Creation of routing tables */
tab_sorted:
......
This diff is collapsed.
......@@ -148,10 +148,13 @@ struct proto {
byte disabled; /* Manually disabled */
byte proto_state; /* Protocol state machine (PS_*, see below) */
byte core_state; /* Core state machine (FS_*, see below) */
byte core_goal; /* State we want to reach (FS_*, see below) */
byte export_state; /* Route export state (ES_*, see below) */
byte reconfiguring; /* We're shutting down due to reconfiguration */
byte refeeding; /* We are refeeding (valid only if core_state == FS_FEEDING) */
byte refeeding; /* We are refeeding (valid only if export_state == ES_FEEDING) */
byte flushing; /* Protocol is flushed in current flush loop round */
byte gr_recovery; /* Protocol should participate in graceful restart recovery */
byte gr_lock; /* Graceful restart mechanism should wait for this proto */
byte gr_wait; /* Route export to protocol is postponed until graceful restart */
byte down_sched; /* Shutdown is scheduled for later (PDS_*) */
byte down_code; /* Reason for shutdown (PDC_* codes) */
u32 hash_key; /* Random key used for hashing of neighbors */
......@@ -175,6 +178,7 @@ struct proto {
* reload_routes Request protocol to reload all its routes to the core
* (using rte_update()). Returns: 0=reload cannot be done,
* 1= reload is scheduled and will happen (asynchronously).
* feed_done Notify protocol about finish of route feeding.
*/
void (*if_notify)(struct proto *, unsigned flags, struct iface *i);
......@@ -185,6 +189,7 @@ struct proto {
void (*store_tmp_attrs)(struct rte *rt, struct ea_list *attrs);
int (*import_control)(struct proto *, struct rte **rt, struct ea_list **attrs, struct linpool *pool);
int (*reload_routes)(struct proto *);
void (*feed_done)(struct proto *);
/*
* Routing entry hooks (called only for routes belonging to this protocol):
......@@ -242,6 +247,13 @@ static inline void
proto_copy_rest(struct proto_config *dest, struct proto_config *src, unsigned size)
{ memcpy(dest + 1, src + 1, size - sizeof(struct proto_config)); }
void graceful_restart_recovery(void);
void graceful_restart_init(void);
void graceful_restart_show_status(void);
void proto_graceful_restart_lock(struct proto *p);
void proto_graceful_restart_unlock(struct proto *p);
#define DEFAULT_GR_WAIT 240
void proto_show_limit(struct proto_limit *l, const char *dsc);
void proto_show_basic_info(struct proto *p);
......@@ -343,10 +355,17 @@ void proto_notify_state(struct proto *p, unsigned state);
* as a result of received ROUTE-REFRESH request).
*/
#define FS_HUNGRY 0
#define FS_FEEDING 1
#define FS_HAPPY 2
#define FS_FLUSHING 3
#define FS_HUNGRY 0
#define FS_FEEDING 1 /* obsolete */
#define FS_HAPPY 2
#define FS_FLUSHING 3
#define ES_DOWN 0
#define ES_FEEDING 1
#define ES_READY 2
/*
* Debugging flags
......
......@@ -148,6 +148,10 @@ typedef struct rtable {
struct fib_iterator nhu_fit; /* Next Hop Update FIB iterator */
} rtable;
#define RPS_NONE 0
#define RPS_SCHEDULED 1
#define RPS_RUNNING 2
typedef struct network {
struct fib_node n; /* FIB flags reserved for kernel syncer */
struct rte *routes; /* Available routes for this network */
......@@ -222,6 +226,8 @@ typedef struct rte {
#define REF_COW 1 /* Copy this rte on write */
#define REF_FILTERED 2 /* Route is rejected by import filter */
#define REF_STALE 4 /* Route is stale in a refresh cycle */
#define REF_DISCARD 8 /* Route is scheduled for discard */
/* Route is valid for propagation (may depend on other flags in the future), accepts NULL */
static inline int rte_is_valid(rte *r) { return r && !(r->flags & REF_FILTERED); }
......@@ -257,6 +263,8 @@ void rte_update2(struct announce_hook *ah, net *net, rte *new, struct rte_src *s
static inline void rte_update(struct proto *p, net *net, rte *new) { rte_update2(p->main_ahook, net, new, p->main_source); }
void rte_discard(rtable *tab, rte *old);
int rt_examine(rtable *t, ip_addr prefix, int pxlen, struct proto *p, struct filter *filter);
void rt_refresh_begin(rtable *t, struct announce_hook *ah);
void rt_refresh_end(rtable *t, struct announce_hook *ah);
void rte_dump(rte *);
void rte_free(rte *);
rte *rte_do_cow(rte *);
......@@ -268,6 +276,15 @@ void rt_feed_baby_abort(struct proto *p);
int rt_prune_loop(void);
struct rtable_config *rt_new_table(struct symbol *s);
static inline void
rt_mark_for_prune(rtable *tab)
{
if (tab->prune_state == RPS_RUNNING)
fit_get(&tab->fib, &tab->prune_fit);
tab->prune_state = RPS_SCHEDULED;
}
struct rt_show_data {
ip_addr prefix;
unsigned pxlen;
......
......@@ -55,8 +55,10 @@ static void rt_free_hostcache(rtable *tab);
static void rt_notify_hostcache(rtable *tab, net *net);
static void rt_update_hostcache(rtable *tab);
static void rt_next_hop_update(rtable *tab);
static inline int rt_prune_table(rtable *tab);
static inline void rt_schedule_gc(rtable *tab);
static inline void rt_schedule_prune(rtable *tab);
static inline struct ea_list *
make_tmp_attrs(struct rte *rt, struct linpool *pool)
......@@ -570,7 +572,7 @@ rte_announce(rtable *tab, unsigned type, net *net, rte *new, rte *old, rte *befo
struct announce_hook *a;
WALK_LIST(a, tab->hooks)
{
ASSERT(a->proto->core_state == FS_HAPPY || a->proto->core_state == FS_FEEDING);
ASSERT(a->proto->export_state != ES_DOWN);
if (a->proto->accept_ra_types == type)
if (type == RA_ACCEPTED)
rt_notify_accepted(a, net, new, old, before_old, tmpa, 0);
......@@ -1108,6 +1110,69 @@ rt_examine(rtable *t, ip_addr prefix, int pxlen, struct proto *p, struct filter
return v > 0;
}
/**
* rt_refresh_begin - start a refresh cycle
* @t: related routing table
* @ah: related announce hook
*
* This function starts a refresh cycle for given routing table and announce
* hook. The refresh cycle is a sequence where the protocol sends all its valid
* routes to the routing table (by rte_update()). After that, all protocol
* routes (more precisely routes with @ah as @sender) not sent during the
* refresh cycle but still in the table from the past are pruned. This is
* implemented by marking all related routes as stale by REF_STALE flag in
* rt_refresh_begin(), then marking all related stale routes with REF_DISCARD
* flag in rt_refresh_end() and then removing such routes in the prune loop.
*/
void
rt_refresh_begin(rtable *t, struct announce_hook *ah)
{
net *n;
rte *e;
FIB_WALK(&t->fib, fn)
{
n = (net *) fn;
for (e = n->routes; e; e = e->next)
if (e->sender == ah)
e->flags |= REF_STALE;
}
FIB_WALK_END;
}
/**
* rt_refresh_end - end a refresh cycle
* @t: related routing table
* @ah: related announce hook
*
* This function starts a refresh cycle for given routing table and announce
* hook. See rt_refresh_begin() for description of refresh cycles.
*/
void
rt_refresh_end(rtable *t, struct announce_hook *ah)
{
int prune = 0;
net *n;
rte *e;
FIB_WALK(&t->fib, fn)
{
n = (net *) fn;
for (e = n->routes; e; e = e->next)
if ((e->sender == ah) && (e->flags & REF_STALE))
{
e->flags |= REF_DISCARD;
prune = 1;
}
}
FIB_WALK_END;
if (prune)
rt_schedule_prune(t);
}
/**
* rte_dump - dump a route
* @e: &rte to be dumped
......@@ -1169,6 +1234,13 @@ rt_dump_all(void)
rt_dump(t);
}
static inline void
rt_schedule_prune(rtable *tab)
{
rt_mark_for_prune(tab);
ev_schedule(tab->rt_event);
}
static inline void
rt_schedule_gc(rtable *tab)
{
......@@ -1199,6 +1271,7 @@ rt_schedule_nhu(rtable *tab)
tab->nhu_state |= 1;
}
static void
rt_prune_nets(rtable *tab)
{
......@@ -1242,6 +1315,14 @@ rt_event(void *ptr)
if (tab->nhu_state)
rt_next_hop_update(tab);
if (tab->prune_state)
if (!rt_prune_table(tab))
{
/* Table prune unfinished */
ev_schedule(tab->rt_event);
return;
}
if (tab->gc_scheduled)
{
rt_prune_nets(tab);
......@@ -1283,8 +1364,8 @@ rt_init(void)
}
static inline int
rt_prune_step(rtable *tab, int step, int *max_feed)
static int
rt_prune_step(rtable *tab, int step, int *limit)
{
static struct rate_limit rl_flush;
struct fib_iterator *fit = &tab->prune_fit;
......@@ -1294,13 +1375,13 @@ rt_prune_step(rtable *tab, int step, int *max_feed)
fib_check(&tab->fib);
#endif
if (tab->prune_state == 0)
if (tab->prune_state == RPS_NONE)
return 1;
if (tab->prune_state == 1)
if (tab->prune_state == RPS_SCHEDULED)
{
FIB_ITERATE_INIT(fit, &tab->fib);
tab->prune_state = 2;
tab->prune_state = RPS_RUNNING;
}
again:
......@@ -1312,9 +1393,10 @@ again:
rescan:
for (e=n->routes; e; e=e->next)
if (e->sender->proto->flushing ||
(e->flags & REF_DISCARD) ||
(step && e->attrs->src->proto->flushing))
{
if (*max_feed <= 0)
if (*limit <= 0)
{
FIB_ITERATE_PUT(fit, fn);
return 0;
......@@ -1325,7 +1407,7 @@ again:
n->n.prefix, n->n.pxlen, e->attrs->src->proto->name, tab->name);
rte_discard(tab, e);
(*max_feed)--;
(*limit)--;
goto rescan;
}
......@@ -1342,41 +1424,60 @@ again:
fib_check(&tab->fib);
#endif
tab->prune_state = 0;
tab->prune_state = RPS_NONE;
return 1;
}
/**
* rt_prune_table - prune a routing table
*
* This function scans the routing table @tab and removes routes belonging to
* flushing protocols, discarded routes and also stale network entries, in a
* similar fashion like rt_prune_loop(). Returns 1 when all such routes are
* pruned. Contrary to rt_prune_loop(), this function is not a part of the
* protocol flushing loop, but it is called from rt_event() for just one routing
* table.
*
* Note that rt_prune_table() and rt_prune_loop() share (for each table) the
* prune state (@prune_state) and also the pruning iterator (@prune_fit).
*/
static inline int
rt_prune_table(rtable *tab)
{
int limit = 512;
return rt_prune_step(tab, 0, &limit);
}
/**
* rt_prune_loop - prune routing tables
*
* The prune loop scans routing tables and removes routes belonging to
* flushing protocols and also stale network entries. Returns 1 when
* all such routes are pruned. It is a part of the protocol flushing
* loop.
* The prune loop scans routing tables and removes routes belonging to flushing
* protocols, discarded routes and also stale network entries. Returns 1 when
* all such routes are pruned. It is a part of the protocol flushing loop.
*
* The prune loop runs in two steps. In the first step it prunes just
* the routes with flushing senders (in explicitly marked tables) so
* the route removal is propagated as usual. In the second step, all
* remaining relevant routes are removed. Ideally, there shouldn't be
* any, but it happens when pipe filters are changed.
* The prune loop runs in two steps. In the first step it prunes just the routes
* with flushing senders (in explicitly marked tables) so the route removal is
* propagated as usual. In the second step, all remaining relevant routes are
* removed. Ideally, there shouldn't be any, but it happens when pipe filters
* are changed.
*/
int
rt_prune_loop(void)
{
static int step = 0;
int max_feed = 512;
int limit = 512;
rtable *t;
again:
WALK_LIST(t, routing_tables)
if (! rt_prune_step(t, step, &max_feed))
if (! rt_prune_step(t, step, &limit))
return 0;
if (step == 0)
{
/* Prepare for the second step */
WALK_LIST(t, routing_tables)
t->prune_state = 1;
t->prune_state = RPS_SCHEDULED;
step = 1;
goto again;
......@@ -1721,7 +1822,7 @@ again:
(p->accept_ra_types == RA_ACCEPTED))
if (rte_is_valid(e))
{
if (p->core_state != FS_FEEDING)
if (p->export_state != ES_FEEDING)
return 1; /* In the meantime, the protocol fell down. */
do_feed_baby(p, p->accept_ra_types, h, n, e);
max_feed--;
......@@ -1730,7 +1831,7 @@ again:
if (p->accept_ra_types == RA_ANY)
for(e = n->routes; rte_is_valid(e); e = e->next)
{
if (p->core_state != FS_FEEDING)
if (p->export_state != ES_FEEDING)
return 1; /* In the meantime, the protocol fell down. */
do_feed_baby(p, RA_ANY, h, n, e);
max_feed--;
......@@ -2223,9 +2324,7 @@ rt_show_cont(struct cli *c)
cli_printf(c, 8004, "Stopped due to reconfiguration");
goto done;
}
if (d->export_protocol &&
d->export_protocol->core_state != FS_HAPPY &&
d->export_protocol->core_state != FS_FEEDING)
if (d->export_protocol && (d->export_protocol->export_state == ES_DOWN))
{
cli_printf(c, 8005, "Protocol is down");
goto done;
......
......@@ -51,6 +51,16 @@
* and bgp_encode_attrs() which does the converse. Both functions are built around a
* @bgp_attr_table array describing all important characteristics of all known attributes.
* Unknown transitive attributes are attached to the route as %EAF_TYPE_OPAQUE byte streams.
*
* BGP protocol implements graceful restart in both restarting (local restart)
* and receiving (neighbor restart) roles. The first is handled mostly by the
* graceful restart code in the nest, BGP protocol just handles capabilities,
* sets @gr_wait and locks graceful restart until end-of-RIB mark is received.
* The second is implemented by internal restart of the BGP state to %BS_IDLE
* and protocol state to %PS_START, but keeping the protocol up from the core
* point of view and therefore maintaining received routes. Routing table
* refresh cycle (rt_refresh_begin(), rt_refresh_end()) is used for removing
* stale routes after reestablishment of BGP session during graceful restart.
*/
#undef LOCAL_DEBUG
......@@ -319,6 +329,7 @@ bgp_decision(void *vp)
DBG("BGP: Decision start\n");
if ((p->p.proto_state == PS_START)
&& (p->outgoing_conn.state == BS_IDLE)
&& (p->incoming_conn.state != BS_OPENCONFIRM)
&& (!p->cf->passive))
bgp_active(p);
......@@ -363,7 +374,7 @@ bgp_conn_enter_established_state(struct bgp_conn *conn)
/* For multi-hop BGP sessions */
if (ipa_zero(p->source_addr))
p->source_addr = conn->sk->saddr;
p->source_addr = conn->sk->saddr;
p->conn = conn;
p->last_error_class = 0;
......@@ -371,6 +382,20 @@ bgp_conn_enter_established_state(struct bgp_conn *conn)
bgp_init_bucket_table(p);
bgp_init_prefix_table(p, 8);
int peer_gr_ready = conn->peer_gr_aware && !(conn->peer_gr_flags & BGP_GRF_RESTART);
if (p->p.gr_recovery && !peer_gr_ready)
proto_graceful_restart_unlock(&p->p);
if (p->p.gr_recovery && (p->cf->gr_mode == BGP_GR_ABLE) && peer_gr_ready)
p->p.gr_wait = 1;
if (p->gr_active)
tm_stop(p->gr_timer);
if (p->gr_active && (!conn->peer_gr_able || !(conn->peer_gr_aflags & BGP_GRF_FORWARDING)))
bgp_graceful_restart_done(p);
bgp_conn_set_state(conn, BS_ESTABLISHED);
proto_notify_state(&p->p, PS_UP);
}
......@@ -416,16 +441,86 @@ bgp_conn_enter_idle_state(struct bgp_conn *conn)
bgp_conn_leave_established_state(p);
}
/**
* bgp_handle_graceful_restart - handle detected BGP graceful restart
* @p: BGP instance
*
* This function is called when a BGP graceful restart of the neighbor is
* detected (when the TCP connection fails or when a new TCP connection
* appears). The function activates processing of the restart - starts routing
* table refresh cycle and activates BGP restart timer. The protocol state goes
* back to %PS_START, but changing BGP state back to %BS_IDLE is left for the
* caller.
*/
void
bgp_handle_graceful_restart(struct bgp_proto *p)
{
ASSERT(p->conn && (p->conn->state == BS_ESTABLISHED) && p->gr_ready);
BGP_TRACE(D_EVENTS, "Neighbor graceful restart detected%s",
p->gr_active ? " - already pending" : "");
proto_notify_state(&p->p, PS_START);
if (p->gr_active)
rt_refresh_end(p->p.main_ahook->table, p->p.main_ahook);
p->gr_active = 1;
bgp_start_timer(p->gr_timer, p->conn->peer_gr_time);
rt_refresh_begin(p->p.main_ahook->table, p->p.main_ahook);
}
/**
* bgp_graceful_restart_done - finish active BGP graceful restart
* @p: BGP instance
*
* This function is called when the active BGP graceful restart of the neighbor
* should be finished - either successfully (the neighbor sends all paths and
* reports end-of-RIB on the new session) or unsuccessfully (the neighbor does
* not support BGP graceful restart on the new session). The function ends
* routing table refresh cycle and stops BGP restart timer.
*/
void
bgp_graceful_restart_done(struct bgp_proto *p)
{
BGP_TRACE(D_EVENTS, "Neighbor graceful restart done");
p->gr_active = 0;
tm_stop(p->gr_timer);
rt_refresh_end(p->p.main_ahook->table, p->p.main_ahook);
}
/**
* bgp_graceful_restart_timeout - timeout of graceful restart 'restart timer'
* @t: timer
*
* This function is a timeout hook for @gr_timer, implementing BGP restart time
* limit for reestablisment of the BGP session after the graceful restart. When
* fired, we just proceed with the usual protocol restart.
*/
static void
bgp_graceful_restart_timeout(timer *t)
{
struct bgp_proto *p = t->data;
BGP_TRACE(D_EVENTS, "Neighbor graceful restart timeout");
bgp_stop(p, 0);
}
static void
bgp_send_open(struct bgp_conn *conn)
{
conn->start_state = conn->bgp->start_state;
// Default values, possibly changed by receiving capabilities.
conn->advertised_as = 0;
conn->peer_refresh_support = 0;
conn->peer_as4_support = 0;
conn->peer_add_path = 0;
conn->advertised_as = 0;
conn->peer_gr_aware = 0;
conn->peer_gr_able = 0;
conn->peer_gr_time = 0;
conn->peer_gr_flags = 0;
conn->peer_gr_aflags = 0;
DBG("BGP: Sending open\n");
conn->sk->rx_hook = bgp_rx;
......@@ -484,6 +579,9 @@ bgp_sock_err(sock *sk, int err)
else
BGP_TRACE(D_EVENTS, "Connection closed");
if ((conn->state == BS_ESTABLISHED) && p->gr_ready)
bgp_handle_graceful_restart(p);
bgp_conn_enter_idle_state(conn);