Changelog |
* Wed Feb 09 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-51
- Merging upstream branch-2.16 [RH git: 7b6570c65f]
Commit list:
0954c2911d ofproto: Fix ipfix not always sampling on egress. (#2016346)
* Wed Feb 09 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-50
- Merging upstream branch-2.16 [RH git: c5ad7f71c5]
Commit list:
867e586b45 tc: Fix incorrect TC rule for decap+encap datapath flow.
* Tue Feb 08 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-49
- Merging upstream branch-2.16 [RH git: 4541c91b99]
Commit list:
418e6a0b8e dpif-netdev: fix vlan and ipv4 parsing in avx512
* Mon Feb 07 2022 Michael Santana <msantana@redhat.com> - 2.16.0-48
- Merging upstream branch-2.16 [RH git: 9d51785142]
Commit list:
1ec567a752 ci: Install wheel before installing any other python packages.
031a99cef0 odp-util: Fix tunnel key attr for GTP-U.
558699c73c ovsdb-idl: Only process successful txn in ovsdb_idl_loop_run.
* Wed Feb 02 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-47
- Merging upstream branch-2.16 [RH git: 6e6f66ffd0]
Commit list:
0276bdb30a ofproto-dpif-upcall: Fix n_revalidators on upcall show.
* Wed Feb 02 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-46
- Merging upstream branch-2.16 [RH git: 513117cbb0]
Commit list:
16575362dc acinclude: Detect avx512 vpopcntdq compiler support.
* Tue Feb 01 2022 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-45
- ovsdb: transaction: Keep one entry in the transaction history. [RH git: 7665f42d12] (#2044621)
commit 6e13565dd32fb2cf5517f51ca06956e2052c4bba
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Sun Dec 19 15:09:38 2021 +0100
ovsdb: transaction: Keep one entry in the transaction history.
If a single transaction exceeds the size of the whole database (e.g.,
a lot of rows got removed and new ones added), transaction history will
be drained. This leads to sending UUID_ZERO to the clients as the last
transaction id in the next monitor update, because monitor doesn't
know what was the actual last transaction id. In case of a re-connect
that will cause re-downloading of the whole database, since the
client's last_id will be out of sync.
One solution would be to store the last transaction ID separately
from the actual transactions, but that will require a careful
management in cases where database gets reset and the history needs
to be cleared. Keeping the one last transaction instead to avoid
the problem. That should not be a big concern in terms of memory
consumption, because this last transaction will be removed from the
history once the next transaction appeared. This is also not a concern
for a fast re-sync, because this last transaction will not be used
for the monitor reply; it's either client already has it, so no need
to send, or it's a history miss.
The test updated to not check the number of atoms if there is only
one transaction in the history.
Fixes: 317b1bfd7dd3 ("ovsdb: Don't let transaction history grow larger than the database.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/2044621
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Mon Jan 31 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-44
- Merging upstream branch-2.16 [RH git: d202cd6da1]
Commit list:
34c830c540 ovsdb-idl: ovsdb_idl_loop_destroy must also destroy the committing txn.
13009736b2 ovsdb-cs: Clear last_id on reconnect if condition changes in-flight.
017e2ae50e ofp-flow: Skip flow reply if it exceeds the maximum message size.
e0c6f92a95 ovsdb-cs: Fix ignoring of the last id from the initial monitor reply. (#2044624)
* Fri Jan 28 2022 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-43
- ovsdb: storage: Randomize should_snapshot checks when the minimum time passed. [RH git: abe61535ca] (#2044614)
commit 339f97044e3c2312fbb65b932fa14a181acf40d5
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Mon Dec 13 16:43:33 2021 +0100
ovsdb: storage: Randomize should_snapshot checks when the minimum time passed.
Snapshots are scheduled for every 10-20 minutes. It's a random value
in this interval for each server. Once the time is up, but the maximum
time (24 hours) not reached yet, ovsdb will start checking if the log
grew a lot on every iteration. Once the growth is detected, compaction
is triggered.
OTOH, it's very common for an OVSDB cluster to not have the log growing
very fast. If the log didn't grow 2x in 20 minutes, the randomness of
the initial scheduled time is gone and all the servers are checking if
they need to create snapshot on every iteration. And since all of them
are part of the same cluster, their logs are growing with the same
speed. Once the critical mass is reached, all the servers will start
creating snapshots at the same time. If the database is big enough,
that might leave the cluster unresponsive for an extended period of
time (e.g. 10-15 seconds for OVN_Southbound database in a larger scale
OVN deployment) until the compaction completed.
Fix that by re-scheduling a quick retry if the minimal time already
passed. Effectively, this will work as a randomized 1-2 min delay
between checks, so the servers will not synchronize.
Scheduling function updated to not change the upper limit on quick
reschedules to avoid delaying the snapshot creation indefinitely.
Currently quick re-schedules are only used for the error cases, and
there is always a 'slow' re-schedule after the successful compaction.
So, the change of a scheduling function doesn't change the current
behavior much.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/2044614
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Fri Jan 28 2022 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-42
- raft: Only allow followers to snapshot. [RH git: 915efc8c00] (#2044614)
commit bf07cc9cdb2f37fede8c0363937f1eb9f4cfd730
Author: Dumitru Ceara <dceara@redhat.com>
Date: Mon Dec 13 20:46:03 2021 +0100
raft: Only allow followers to snapshot.
Commit 3c2d6274bcee ("raft: Transfer leadership before creating
snapshots.") made it such that raft leaders transfer leadership before
snapshotting. However, there's still the case when the next leader to
be is in the process of snapshotting. To avoid delays in that case too,
we now explicitly allow snapshots only on followers. Cluster members
will have to wait until the current election is settled before
snapshotting.
Given the following logs taken from an OVN_Southbound 3-server cluster
during a scale test:
S1 (old leader):
19:07:51.226Z|raft|INFO|Transferring leadership to write a snapshot.
19:08:03.830Z|ovsdb|INFO|OVN_Southbound: Database compaction took 12601ms
19:08:03.940Z|raft|INFO|server 8b8d is leader for term 43
S2 (follower):
19:08:00.870Z|raft|INFO|server 8b8d is leader for term 43
S3 (new leader):
19:07:51.242Z|raft|INFO|received leadership transfer from f5c9 in term 42
19:07:51.244Z|raft|INFO|term 43: starting election
19:08:00.805Z|ovsdb|INFO|OVN_Southbound: Database compaction took 9559ms
19:08:00.869Z|raft|INFO|term 43: elected leader by 2+ of 3 servers
We see that the leader to be (S3) receives the leadership transfer,
initiates the election and immediately after starts a snapshot that
takes ~9.5 seconds. During this time, S2 votes for S3 electing it
as cluster leader but S3 doesn't effectively become leader until it
finishes snapshotting, essentially keeping the cluster without a
leader for up to ~9.5 seconds.
With the current change, S3 will delay compaction and snapshotting until
the election is finished.
The only exception is the case of single-node clusters for which we
allow the node to snapshot regardless of role.
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/2044614
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Wed Jan 26 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-41
- Merging upstream branch-2.16 [RH git: f1ca7b8ac3]
Commit list:
2571b1a464 ofproto-dpif: Fix issue with non-reversible actions on a patch ports.
* Fri Jan 21 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-40
- Merging upstream branch-2.16 [RH git: 60b19f443c]
Commit list:
07a115f7d9 ovs-monitor-ipsec: Fix generated strongSwan ipsec.conf for IPv6.
* Thu Jan 20 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-39
- Merging upstream branch-2.16 [RH git: 349d687673]
Commit list:
f2ee013f73 datapath-windows: Pickup Ct tuple as CT lookup key in function OvsCtSetupLookupCtx
* Tue Jan 18 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-38
- Merging upstream branch-2.16 [RH git: e370e283cf]
Commit list:
bd8ebcd10c Documentation: Fix Rx/Tx queue configuration section.
* Mon Jan 17 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-37
- Merging upstream branch-2.16 [RH git: c9297f5ef7]
Commit list:
29936a853f ofproto-dpif: Fix memory leak in dpif/show-dp-features appctl.
* Thu Jan 13 2022 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-36
- Merging upstream branch-2.16 [RH git: edae801e00]
Commit list:
ba7fffb832 dpif-netdev: Improve loading of packet data for undersized packets.
* Sat Dec 18 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-35
- Merging upstream branch-2.16 [RH git: 6ad0375ff5]
Commit list:
2595b7b3d1 Prepare for 2.16.3.
6caaae525c Set release date for 2.16.2.
443e3657d7 ofproto-dpif-xlate: Snoop ingress packets and update neigh cache if needed.
75d2ef9a60 tnl-neigh-cache: Do not refresh the entry while revalidating.
5d88836566 tnl-neigh-cache: Read/write expires atomically.
fb42c99c15 dpif-netdev: Improve handling of IP/TCP in avx512 mfex.
* Thu Dec 09 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-34
- Merging upstream branch-2.16 [RH git: 07b9bf085a]
Commit list:
f42c484445 compat: handle NF_REPEAT error on nf_conntrack_in.
* Mon Dec 06 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-33
- Merging upstream branch-2.16 [RH git: 8708b55152]
Commit list:
3e527f21cf flow: Consider dataofs when parsing TCP packets.
b537e049ad tests/flowgen: Fix packet data endianness.
35244b4980 ofproto: Fix resource usage explosion due to removal of large number of flows.
a201297639 ofproto: Fix resource usage explosion while processing bundled FLOW_MOD.
cd0133402c tests/flowgen: Fix length field of 802.2 data link header.
2d65b8ffd2 ovs-lib: Backup and remove existing DB when joining cluster.
ab01177637 docs/dpdk: Fix install doc.
38a2129524 ovs-save: Save igmp flows in ofp_parse syntax.
dc77857ce2 faq: Update OVS/DPDK version table for OVS 2.13/2.14.
* Thu Nov 18 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-32
- Merging upstream branch-2.16 [RH git: e90e06a818]
Commit list:
1d8e0f861f ofproto-dpif-xlate: Fix check_pkt_larger incomplete translation.
* Mon Nov 15 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-31
- Merging upstream branch-2.16 [RH git: 77a249d38b]
Commit list:
f8f2f7c9cb datapath-windows: Reset flow key after Ipv4 fragments are reassembled
* Wed Nov 10 2021 Timothy Redaelli <tredaelli@redhat.com> - 2.16.0-30
- python: Replace pyOpenSSL with ssl. [RH git: 0cd5867531] (#1988429)
Currently, pyOpenSSL is half-deprecated upstream and so it's removed on
some distributions (for example on CentOS Stream 9,
https://issues.redhat.com/browse/CS-336), but since OVS only
supports Python 3 it's possible to replace pyOpenSSL with "import ssl"
included in base Python 3.
Stream recv and send had to be splitted as _recv and _send, since SSLError
is a subclass of socket.error and so it was not possible to except for
SSLWantReadError and SSLWantWriteError in recv and send of SSLStream.
TCPstream._open cannot be used in SSLStream, since Python ssl module
requires the SSL socket to be created before connecting it, so
SSLStream._open needs to create the socket, create SSL socket and then
connect the SSL socket.
Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-at: https://bugzilla.redhat.com/1988429
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Terry Wilson <twilson@redhat.com>
Tested-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
* Wed Nov 10 2021 Timothy Redaelli <tredaelli@redhat.com> - 2.16.0-29
- python: socket-util: Split inet_open_active function and use connect_ex. [RH git: 2e704b371c]
In an upcoming patch, PyOpenSSL will be replaced with Python ssl module,
but in order to do an async connection with Python ssl module the ssl
socket must be created when the socket is created, but before the
socket is connected.
So, inet_open_active function is splitted in 3 parts:
- inet_create_socket_active: creates the socket and returns the family and
the socket, or (error, None) if some error needs to be returned.
- inet_connect_active: connect the socket and returns the errno (it
returns 0 if errno is EINPROGRESS or EWOULDBLOCK).
connect is replaced by connect_ex, since Python suggest to use it for
asynchronous connects and it's also cleaner since inet_connect_active
returns errno that connect_ex already returns, moreover due to a Python
limitation connect cannot not be used with ssl module.
inet_open_active function is changed in order to use the new functions
inet_create_socket_active and inet_connect_active.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Terry Wilson <twilson@redhat.com>
Tested-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
* Wed Nov 10 2021 Timothy Redaelli <tredaelli@redhat.com> - 2.16.0-28
- redhat: remove mlx4 support [RH git: 4c846afd24] (#1998122)
Resolves: #1998122
* Tue Nov 09 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-27
- ovsdb: Don't let transaction history grow larger than the database. [RH git: 93d1fa0bdf] (#2012949)
commit 317b1bfd7dd315e241c158e6d4095002ff391ee3
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Sep 28 13:17:21 2021 +0200
ovsdb: Don't let transaction history grow larger than the database.
If user frequently changes a lot of rows in a database, transaction
history could grow way larger than the database itself. This wastes
a lot of memory and also makes monitor_cond_since slower than
usual monotor_cond if the transaction id is old enough, because
re-construction of the changes from a history is slower than just
creation of initial database snapshot. This is also the case if
user deleted a lot of data, so transaction history still holds all of
it while the database itself doesn't.
In case of current lb-per-service model in ovn-kubernetes, each
load-balancer is added to every logical switch/router. Such a
transaction touches more than a half of a OVN_Northbound database.
And each of these transactions is added to the transaction history.
Since transaction history depth is 100, in worst case scenario,
it will hold 100 copies of a database increasing memory consumption
dramatically. In tests with 3000 LBs and 120 LSs, memory goes up
to 3 GB, while holding at 30 MB if transaction history disabled in
the code.
Fixing that by keeping count of the number of ovsdb_atom's in the
database and not allowing the total number of atoms in transaction
history to grow larger than this value. Counting atoms is fairly
cheap because we don't need to iterate over them, so it doesn't have
significant performance impact. It would be ideal to measure the
size of individual atoms, but that will hit the performance.
Counting cells instead of atoms is not sufficient, because OVN
users are adding hundreds or thousands of atoms to a single cell,
so they are largely different in size.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/2012949
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Tue Nov 09 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-26
- ovsdb: transaction: Incremental reassessment of weak refs. [RH git: e8a363db49] (#2005958)
commit 4dbff9f0a68579241ac1a040726be3906afb8fe9
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Sat Oct 16 03:20:23 2021 +0200
ovsdb: transaction: Incremental reassessment of weak refs.
The main idea is to not store list of weak references in the source
row, so they all don't need to be re-checked/updated on every
modification of that source row. The point is that source row already
knows UUIDs of all destination rows stored in the data, so there is no
much profit in storing this information somewhere else. If needed,
destination row can be looked up and reference can be looked up in the
destination row. For the fast lookup, destination row now stores
references in a hash map.
Weak reference structure now contains the table and uuid of a source
row instead of a direct pointer. This allows to replace/update the
source row without breaking any weak references stored in destination
rows.
Structure also now contains the key-value pair of atoms that triggered
creation of this reference. These atoms can be used to quickly
subtract removed references from a source row. During reassessment,
ovsdb now only needs to care about new added or removed atoms, and
atoms that got removed due to removal of the destination rows, but
these are marked for reassessment by the destination row.
ovsdb_datum_subtract() is used to remove atoms that points to removed
or incorrect rows, so there is no need to re-sort datum in the end.
Results of an OVN load-balancer benchmark that adds 3K load-balancers
to each of 120 logical switches and 120 logical routers in the OVN
sandbox with clustered Northbound database and then removes them:
Before:
%CPU CPU Time CMD
86.8 00:16:05 ovsdb-server nb1.db
44.1 00:08:11 ovsdb-server nb2.db
43.2 00:08:00 ovsdb-server nb3.db
After:
%CPU CPU Time CMD
54.9 00:02:58 ovsdb-server nb1.db
33.3 00:01:48 ovsdb-server nb2.db
32.2 00:01:44 ovsdb-server nb3.db
So, on a cluster leader the processing time dropped by 5.4x, on
followers - by 4.5x. More load-balancers - larger the performance
difference. There is a slight increase of memory usage, because new
reference structure is larger, but the difference is not significant.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/2005958
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Thu Oct 28 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-25
- Merging upstream branch-2.16 [RH git: f5366890c5]
Commit list:
c221c8e613 datapath-windows:Reset PseudoChecksum value only for TX direction offload case
* Wed Oct 27 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-24
- Merging upstream branch-2.16 [RH git: 4682b76694]
Commit list:
b79f0369f2 ci: Make linux-prepare trust system installs.
* Mon Oct 25 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-23
- Merging upstream branch-2.16 [RH git: cce913794e]
Commit list:
2a4c87f300 Prepare for 2.16.2.
aaa1439b8e Set release date for 2.16.1.
* Thu Oct 21 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-22
- Merging upstream branch-2.16 [RH git: 29f01c4fdb]
Commit list:
108176ab5a github: Stick to python 3.9.
* Tue Oct 19 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-21
- Merging upstream branch-2.16 [RH git: 2546fa9646]
Commit list:
5c5e34603b datapath-windows: add layers when adding the deferred actions
* Thu Oct 14 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-20
- Merging upstream branch-2.16 [RH git: d572c95f69]
Commit list:
458a4f75f3 ofproto-dpif-xlate: Fix zone set from non-frozen-metadata fields.
* Wed Oct 13 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-19
- Merging upstream branch-2.16 [RH git: 557ca689f7]
Commit list:
6d8190584a dpif-netdev: Fix use-after-free on PACKET_OUT of IP fragments.
44a66cc1d0 tunnel-push-pop.at: Mask source port in tunnel header.
* Tue Oct 12 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-18
- Merging upstream branch-2.16 [RH git: a6c4770398]
Commit list:
27a5848a33 ovs-ctl: Add missing description for --ovs-vswitchd-options and --ovsdb-server-options to usage().
0300d0c0c2 dpdk-stub: Change the ERR log to DBG.
cdd6dd821d dpif-netlink: Fix feature negotiation for older kernels.
c2682c42cb dpif-netdev: Fix pmd thread comments to include SMC.
9377f4a465 python: idl: Avoid sending transactions when the DB is not synced up.
* Tue Oct 12 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-17
- Merging upstream branch-2.16 [RH git: c1145b5236]
Commit list:
0fd17fbb09 ipf: release unhandled packets from the batch
* Thu Sep 30 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-16
- Merging upstream branch-2.16 [RH git: 5c05133179]
Commit list:
3f692fba98 datapath-windows:adjust Offset when processing packet in POP_VLAN action
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-15
- ovsdb-data: Deduplicate string atoms. [RH git: 24e7d1140e] (#2006839)
commit 429b114c5aadee24ccfb16ad7d824f45cdcea75a
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Wed Sep 22 09:28:50 2021 +0200
ovsdb-server spends a lot of time cloning atoms for various reasons,
e.g. to create a diff of two rows or to clone a row to the transaction.
All atoms, except for strings, contains a simple value that could be
copied in efficient way, but duplicating strings every time has a
significant performance impact.
Introducing a new reference-counted structure 'ovsdb_atom_string'
that allows to not copy strings every time, but just increase a
reference counter.
This change allows to increase transaction throughput in benchmarks
up to 2x for standalone databases and 3x for clustered databases, i.e.
number of transactions that ovsdb-server can handle per second.
It also noticeably reduces memory consumption of ovsdb-server.
Next step will be to consolidate this structure with json strings,
so we will not need to duplicate strings while converting database
objects to json and back.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2006839
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-14
- ovsdb-data: Add function to apply diff in-place. [RH git: df0e4bda98] (#2006851)
commit 32b51326ef9c307b4acd0bacafb0218dd1372f3d
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Thu Sep 23 01:47:24 2021 +0200
ovsdb_datum_apply_diff() is heavily used in ovsdb transactions, but
it's linear in terms of number of comparisons. And it also clones
all the atoms along the way. In most cases size of a diff is much
smaller than the size of the original datum, this allows to perform
the same operation in-place with only O(diff->n * log2(old->n))
comparisons and O(old->n + diff->n) memory copies with memcpy.
Using this function while applying diffs read from the storage gives
a significant performance boost and allows to execute much more
transactions per second.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2006851
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-13
- ovsdb-data: Optimize subtraction of sets. [RH git: 5bace82405] (#2005483)
commit bb12b63176389e516ddfefce20dfa165f24430fb
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Thu Sep 23 01:47:23 2021 +0200
Current algorithm for ovsdb_datum_subtract looks like this:
for-each atom in a:
if atom in b:
swap(atom, <last atom in 'a'>)
destroy(atom)
quicksort(a)
Complexity:
Na * log2(Nb) + (Na - Nb) * log2(Na - Nb)
Search Comparisons for quicksort
It's not optimal, especially because Nb << Na in a vast majority of
cases.
Reversing the search phase to look up atoms from 'b' in 'a', and
closing gaps from deleted elements in 'a' by plain memory copy to
avoid quicksort.
Resulted complexity:
Nb * log2(Na) + (Na - Nb)
Search Memory copies
Subtraction is heavily used while executing database transactions.
For example, to remove one port from a logical switch in OVN.
Complexity of such operation if original logical switch had 100 ports
goes down from
100 * log2(1) = 100 comparisons for search and
99 * log2(99) = 656 comparisons for quicksort
------------------------------
756 comparisons in total
to only
1 * log2(100) = 7 comparisons for search
+ memory copy of 99 * sizeof (union ovsdb_atom) bytes.
We could use memmove to close the gaps after removing atoms, but
it will lead to 2 memory copies inside the call, while we can perform
only one to the temporary 'result' and swap pointers.
Performance in cases, where sizes of 'a' and 'b' are comparable,
should not change. Cases with Nb >> Na should not happen in practice.
All in all, this change allows ovsdb-server to perform several times
more transactions, that removes elements from sets, per second.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-12
- ovsdb-data: Optimize union of sets. [RH git: e2a4c7d794] (#2005483)
commit 51946d22274cd591dc061358fb507056fbd91420
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Thu Sep 23 01:47:22 2021 +0200
Current algorithm of ovsdb_datum_union looks like this:
for-each atom in b:
if not bin_search(a, atom):
push(a, clone(atom))
quicksort(a)
So, the complexity looks like this:
Nb * log2(Na) + Nb + (Na + Nb) * log2(Na + Nb)
Comparisons clones Comparisons for quicksort
for search
ovsdb_datum_union() is heavily used in database transactions while
new element is added to a set. For example, if new logical switch
port is added to a logical switch in OVN. This is a very common
use case where CMS adds one new port to an existing switch that
already has, let's say, 100 ports. For this case ovsdb-server will
have to perform:
1 * log2(100) + 1 clone + 101 * log2(101)
Comparisons Comparisons for
for search quicksort.
~7 1 ~707
Roughly 714 comparisons of atoms and 1 clone.
Since binary search can give us position, where new atom should go
(it's the 'low' index after the search completion) for free, the
logic can be re-worked like this:
copied = 0
for-each atom in b:
desired_position = bin_search(a, atom)
push(result, a[ copied : desired_position - 1 ])
copied = desired_position
push(result, clone(atom))
push(result, a[ copied : Na ])
swap(a, result)
Complexity of this schema:
Nb * log2(Na) + Nb + Na
Comparisons clones memory copy on push
for search
'swap' is just a swap of a few pointers. 'push' is not a 'clone',
but a simple memory copy of 'union ovsdb_atom'.
In general, this schema substitutes complexity of a quicksort
with complexity of a memory copy of Na atom structures, where we're
not even copying strings that these atoms are pointing to.
Complexity in the example above goes down from 714 comparisons
to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes.
General complexity of a memory copy should always be lower than
complexity of a quicksort, especially because these copies usually
performed in bulk, so this new schema should work faster for any input.
All in all, this change allows to execute several times more
transactions per second for transactions that adds new entries to sets.
Alternatively, union can be implemented as a linear merge of two
sorted arrays, but this will result in O(Na) comparisons, which
is more than Nb * log2(Na) in common case, since Na is usually
far bigger than Nb. Linear merge will also mean per-atom memory
copies instead of copying in bulk.
'replace' functionality of ovsdb_datum_union() had no users, so it
just removed. But it can easily be added back if needed in the future.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-11
- ovsdb: transaction: Use diffs for strong reference counting. [RH git: 85da133eaa] (#2003203)
commit b2712d026eae2d9a5150c2805310eaf506e1f162
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Sep 14 00:19:57 2021 +0200
Currently, even if one reference added to the set of strong references
or removed from it, ovsdb-server will walk through the whole set and
re-count references to other rows. These referenced rows will also be
added to the transaction in order to re-count their references.
For example, every time Logical Switch Port added to a Logical Switch,
OVN Northbound database server will walk through all ports of this
Logical Switch, clone their rows, and re-count references. This is
not very efficient. Instead, it can only increase reference counters
for added references and reduce for removed ones. In many cases this
will be only one row affected in the Logical_Switch_Port table.
Introducing new function that generates a diff of two datum objects,
but stores added and removed atoms separately, so they can be used
to increase or decrease row reference counters accordingly.
This change allows to perform several times more transactions that
adds or removes strong references to/from sets per second, because
ovsdb-server no longer clones and re-counts rows that are irrelevant
to current transaction.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2003203
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Mon Sep 27 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-10
- Merging upstream branch-2.16 [RH git: 2114714012]
Commit list:
547371ecdb cirrus: Reduce memory requirements for FreeBSD VMs.
* Thu Sep 23 2021 Timothy Redaelli <tredaelli@redhat.com> - 2.16.0-9
- redhat: use hugetlbfs group for /var/log/openvswitch when dpdk is enabled [RH git: 4e5928b671] (#2004543)
Resolves: #2004543
* Thu Sep 16 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-8
- Merging upstream branch-2.16 [RH git: 7332b410fc]
Commit list:
facaf5bc71 netdev-linux: Fix a null pointer dereference in netdev_linux_notify_sock().
6e203d4873 pcap-file: Fix memory leak in ovs_pcap_open().
f50da0b267 odp-util: Fix a null pointer dereference in odp_flow_format().
7da752e43f odp-util: Fix a null pointer dereference in odp_nsh_key_from_attr__().
bc22b01459 netdev-dpdk: Fix RSS configuration for virtio.
81706c5d43 ipf: Fix only nat the first fragment in the reass process.
* Wed Sep 08 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-7
- Merging upstream branch-2.16 [RH git: e71f31dfd6]
Commit list:
242c280f0e dpif-netdev: Fix crash when PACKET_OUT is metered.
* Tue Aug 31 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-6
- ovsdb: monitor: Store serialized json in a json cache. [RH git: bc20330c85] (#1996152)
commit 43e66fc27659af2a5c976bdd27fe747b442b5554
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Aug 24 21:00:39 2021 +0200
Same json from a json cache is typically sent to all the clients,
e.g., in case of OVN deployment with ovn-monitor-all=true.
There could be hundreds or thousands connected clients and ovsdb
will serialize the same json object for each of them before sending.
Serializing it once before storing into json cache to speed up
processing.
This change allows to save a lot of CPU cycles and a bit of memory
since we need to store in memory only a string and not the full json
object.
Testing with ovn-heater on 120 nodes using density-heavy scenario
shows reduction of the total CPU time used by Southbound DB processes
from 256 minutes to 147. Duration of unreasonably long poll intervals
also reduced dramatically from 7 to 2 seconds:
Count Min Max Median Mean 95 percentile
-------------------------------------------------------------
Before 1934 1012 7480 4302.5 4875.3 7034.3
After 1909 1004 2730 1453.0 1532.5 2053.6
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1996152
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Tue Aug 31 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-5
- raft: Don't keep full json objects in memory if no longer needed. [RH git: 4606423e8b] (#1990058)
commit 0de882954032aa37dc943bafd72c33324aa0c95a
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Aug 24 21:00:38 2021 +0200
raft: Don't keep full json objects in memory if no longer needed.
Raft log entries (and raft database snapshot) contains json objects
of the data. Follower receives append requests with data that gets
parsed and added to the raft log. Leader receives execution requests,
parses data out of them and adds to the log. In both cases, later
ovsdb-server reads the log with ovsdb_storage_read(), constructs
transaction and updates the database. On followers these json objects
in common case are never used again. Leader may use them to send
append requests or snapshot installation requests to followers.
However, all these operations (except for ovsdb_storage_read()) are
just serializing the json in order to send it over the network.
Json objects are significantly larger than their serialized string
representation. For example, the snapshot of the database from one of
the ovn-heater scale tests takes 270 MB as a string, but 1.6 GB as
a json object from the total 3.8 GB consumed by ovsdb-server process.
ovsdb_storage_read() for a given raft entry happens only once in a
lifetime, so after this call, we can serialize the json object, store
the string representation and free the actual json object that ovsdb
will never need again. This can save a lot of memory and can also
save serialization time, because each raft entry for append requests
and snapshot installation requests serialized only once instead of
doing that every time such request needs to be sent.
JSON_SERIALIZED_OBJECT can be used in order to seamlessly integrate
pre-serialized data into raft_header and similar json objects.
One major special case is creation of a database snapshot.
Snapshot installation request received over the network will be parsed
and read by ovsdb-server just like any other raft log entry. However,
snapshots created locally with raft_store_snapshot() will never be
read back, because they reflect the current state of the database,
hence already applied. For this case we can free the json object
right after writing snapshot on disk.
Tests performed with ovn-heater on 60 node density-light scenario,
where on-disk database goes up to 97 MB, shows average memory
consumption of ovsdb-server Southbound DB processes decreased by 58%
(from 602 MB to 256 MB per process) and peak memory consumption
decreased by 40% (from 1288 MB to 771 MB).
Test with 120 nodes on density-heavy scenario with 270 MB on-disk
database shows 1.5 GB memory consumption decrease as expected.
Also, total CPU time consumed by the Southbound DB process reduced
from 296 to 256 minutes. Number of unreasonably long poll intervals
reduced from 2896 down to 1934.
Deserialization is also implemented just in case. I didn't see this
function being invoked in practice.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990058
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Tue Aug 31 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-4
- json: Add support for partially serialized json objects. [RH git: 885e5ce1b5] (#1990058)
commit b0bca6f27aae845c3ca8b48d66a7dbd3d978162a
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Aug 24 21:00:37 2021 +0200
json: Add support for partially serialized json objects.
Introducing a new json type JSON_SERIALIZED_OBJECT. It's not an
actual type that can be seen in a json message on a wire, but
internal type that is intended to hold a serialized version of
some other json object. For this reason it's defined after the
JSON_N_TYPES to not confuse parsers and other parts of the code
that relies on compliance with RFC 4627.
With this JSON type internal users may construct large JSON objects,
parts of which are already serialized. This way, while serializing
the larger object, data from JSON_SERIALIZED_OBJECT can be added
directly to the result, without additional processing.
This will be used by next commits to add pre-serialized JSON data
to the raft_header structure, that can be converted to a JSON
before writing the file transaction on disk or sending to other
servers. Same technique can also be used to pre-serialize json_cache
for ovsdb monitors, this should allow to not perform serialization
for every client and will save some more memory.
Since serialized JSON is just a string, reusing the 'json->string'
pointer for it.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990058
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Tue Aug 31 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-3
- json: Optimize string serialization. [RH git: bb1654da63] (#1990069)
commit 748010ff304b7cd2c43f4eb98a554433f0df07f9
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Aug 24 23:07:22 2021 +0200
json: Optimize string serialization.
Current string serialization code puts all characters one by one.
This is slow because dynamic string needs to perform length checks
on every ds_put_char() and it's also doesn't allow compiler to use
better memory copy operations, i.e. doesn't allow copying few bytes
at once.
Special symbols are rare in a typical database. Quotes are frequent,
but not too frequent. In databases created by ovn-kubernetes, for
example, usually there are at least 10 to 50 chars between quotes.
So, it's better to count characters that doesn't require escaping
and use fast data copy for the whole sequential block.
Testing with a synthetic benchmark (included) on my laptop shows
following performance improvement:
Size Q S Before After Diff
-----------------------------------------------------
100000 0 0 : 0.227 ms 0.142 ms -37.4 %
100000 2 1 : 0.277 ms 0.186 ms -32.8 %
100000 10 1 : 0.361 ms 0.309 ms -14.4 %
10000000 0 0 : 22.720 ms 12.160 ms -46.4 %
10000000 2 1 : 27.470 ms 19.300 ms -29.7 %
10000000 10 1 : 37.950 ms 31.250 ms -17.6 %
100000000 0 0 : 239.600 ms 126.700 ms -47.1 %
100000000 2 1 : 292.400 ms 188.600 ms -35.4 %
100000000 10 1 : 387.700 ms 321.200 ms -17.1 %
Here Q - probability (%) for a character to be a '\"' and
S - probability (%) to be a special character ( < 32).
Testing with a closer to real world scenario shows overall decrease
of the time needed for database compaction by ~5-10 %. And this
change also decreases CPU consumption in general, because string
serialization is used in many different places including ovsdb
monitors and raft.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990069
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Fri Aug 20 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-2
- Merging upstream branch-2.16 [RH git: 7d7567e339]
Commit list:
0991ea8d19 Prepare for 2.16.1.
* Wed Aug 18 2021 Flavio Leitner <fbl@redhat.com> - 2.16.0-1
- redhat: First 2.16.0 release. [RH git: 0a1c4276cc]
|