Changelog |
* Mon Dec 06 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-33
- Merging upstream branch-2.16 [RH git: 8708b55152]
Commit list:
3e527f21cf flow: Consider dataofs when parsing TCP packets.
b537e049ad tests/flowgen: Fix packet data endianness.
35244b4980 ofproto: Fix resource usage explosion due to removal of large number of flows.
a201297639 ofproto: Fix resource usage explosion while processing bundled FLOW_MOD.
cd0133402c tests/flowgen: Fix length field of 802.2 data link header.
2d65b8ffd2 ovs-lib: Backup and remove existing DB when joining cluster.
ab01177637 docs/dpdk: Fix install doc.
38a2129524 ovs-save: Save igmp flows in ofp_parse syntax.
dc77857ce2 faq: Update OVS/DPDK version table for OVS 2.13/2.14.
* Thu Nov 18 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-32
- Merging upstream branch-2.16 [RH git: e90e06a818]
Commit list:
1d8e0f861f ofproto-dpif-xlate: Fix check_pkt_larger incomplete translation.
* Mon Nov 15 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-31
- Merging upstream branch-2.16 [RH git: 77a249d38b]
Commit list:
f8f2f7c9cb datapath-windows: Reset flow key after Ipv4 fragments are reassembled
* Wed Nov 10 2021 Timothy Redaelli <tredaelli@redhat.com> - 2.16.0-30
- python: Replace pyOpenSSL with ssl. [RH git: 0cd5867531] (#1988429)
Currently, pyOpenSSL is half-deprecated upstream and so it's removed on
some distributions (for example on CentOS Stream 9,
https://issues.redhat.com/browse/CS-336), but since OVS only
supports Python 3 it's possible to replace pyOpenSSL with "import ssl"
included in base Python 3.
Stream recv and send had to be splitted as _recv and _send, since SSLError
is a subclass of socket.error and so it was not possible to except for
SSLWantReadError and SSLWantWriteError in recv and send of SSLStream.
TCPstream._open cannot be used in SSLStream, since Python ssl module
requires the SSL socket to be created before connecting it, so
SSLStream._open needs to create the socket, create SSL socket and then
connect the SSL socket.
Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-at: https://bugzilla.redhat.com/1988429
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Terry Wilson <twilson@redhat.com>
Tested-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
* Wed Nov 10 2021 Timothy Redaelli <tredaelli@redhat.com> - 2.16.0-29
- python: socket-util: Split inet_open_active function and use connect_ex. [RH git: 2e704b371c]
In an upcoming patch, PyOpenSSL will be replaced with Python ssl module,
but in order to do an async connection with Python ssl module the ssl
socket must be created when the socket is created, but before the
socket is connected.
So, inet_open_active function is splitted in 3 parts:
- inet_create_socket_active: creates the socket and returns the family and
the socket, or (error, None) if some error needs to be returned.
- inet_connect_active: connect the socket and returns the errno (it
returns 0 if errno is EINPROGRESS or EWOULDBLOCK).
connect is replaced by connect_ex, since Python suggest to use it for
asynchronous connects and it's also cleaner since inet_connect_active
returns errno that connect_ex already returns, moreover due to a Python
limitation connect cannot not be used with ssl module.
inet_open_active function is changed in order to use the new functions
inet_create_socket_active and inet_connect_active.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Terry Wilson <twilson@redhat.com>
Tested-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
* Wed Nov 10 2021 Timothy Redaelli <tredaelli@redhat.com> - 2.16.0-28
- redhat: remove mlx4 support [RH git: 4c846afd24] (#1998122)
Resolves: #1998122
* Tue Nov 09 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-27
- ovsdb: Don't let transaction history grow larger than the database. [RH git: 93d1fa0bdf] (#2012949)
commit 317b1bfd7dd315e241c158e6d4095002ff391ee3
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Sep 28 13:17:21 2021 +0200
ovsdb: Don't let transaction history grow larger than the database.
If user frequently changes a lot of rows in a database, transaction
history could grow way larger than the database itself. This wastes
a lot of memory and also makes monitor_cond_since slower than
usual monotor_cond if the transaction id is old enough, because
re-construction of the changes from a history is slower than just
creation of initial database snapshot. This is also the case if
user deleted a lot of data, so transaction history still holds all of
it while the database itself doesn't.
In case of current lb-per-service model in ovn-kubernetes, each
load-balancer is added to every logical switch/router. Such a
transaction touches more than a half of a OVN_Northbound database.
And each of these transactions is added to the transaction history.
Since transaction history depth is 100, in worst case scenario,
it will hold 100 copies of a database increasing memory consumption
dramatically. In tests with 3000 LBs and 120 LSs, memory goes up
to 3 GB, while holding at 30 MB if transaction history disabled in
the code.
Fixing that by keeping count of the number of ovsdb_atom's in the
database and not allowing the total number of atoms in transaction
history to grow larger than this value. Counting atoms is fairly
cheap because we don't need to iterate over them, so it doesn't have
significant performance impact. It would be ideal to measure the
size of individual atoms, but that will hit the performance.
Counting cells instead of atoms is not sufficient, because OVN
users are adding hundreds or thousands of atoms to a single cell,
so they are largely different in size.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/2012949
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Tue Nov 09 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-26
- ovsdb: transaction: Incremental reassessment of weak refs. [RH git: e8a363db49] (#2005958)
commit 4dbff9f0a68579241ac1a040726be3906afb8fe9
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Sat Oct 16 03:20:23 2021 +0200
ovsdb: transaction: Incremental reassessment of weak refs.
The main idea is to not store list of weak references in the source
row, so they all don't need to be re-checked/updated on every
modification of that source row. The point is that source row already
knows UUIDs of all destination rows stored in the data, so there is no
much profit in storing this information somewhere else. If needed,
destination row can be looked up and reference can be looked up in the
destination row. For the fast lookup, destination row now stores
references in a hash map.
Weak reference structure now contains the table and uuid of a source
row instead of a direct pointer. This allows to replace/update the
source row without breaking any weak references stored in destination
rows.
Structure also now contains the key-value pair of atoms that triggered
creation of this reference. These atoms can be used to quickly
subtract removed references from a source row. During reassessment,
ovsdb now only needs to care about new added or removed atoms, and
atoms that got removed due to removal of the destination rows, but
these are marked for reassessment by the destination row.
ovsdb_datum_subtract() is used to remove atoms that points to removed
or incorrect rows, so there is no need to re-sort datum in the end.
Results of an OVN load-balancer benchmark that adds 3K load-balancers
to each of 120 logical switches and 120 logical routers in the OVN
sandbox with clustered Northbound database and then removes them:
Before:
%CPU CPU Time CMD
86.8 00:16:05 ovsdb-server nb1.db
44.1 00:08:11 ovsdb-server nb2.db
43.2 00:08:00 ovsdb-server nb3.db
After:
%CPU CPU Time CMD
54.9 00:02:58 ovsdb-server nb1.db
33.3 00:01:48 ovsdb-server nb2.db
32.2 00:01:44 ovsdb-server nb3.db
So, on a cluster leader the processing time dropped by 5.4x, on
followers - by 4.5x. More load-balancers - larger the performance
difference. There is a slight increase of memory usage, because new
reference structure is larger, but the difference is not significant.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/2005958
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Thu Oct 28 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-25
- Merging upstream branch-2.16 [RH git: f5366890c5]
Commit list:
c221c8e613 datapath-windows:Reset PseudoChecksum value only for TX direction offload case
* Wed Oct 27 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-24
- Merging upstream branch-2.16 [RH git: 4682b76694]
Commit list:
b79f0369f2 ci: Make linux-prepare trust system installs.
* Mon Oct 25 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-23
- Merging upstream branch-2.16 [RH git: cce913794e]
Commit list:
2a4c87f300 Prepare for 2.16.2.
aaa1439b8e Set release date for 2.16.1.
* Thu Oct 21 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-22
- Merging upstream branch-2.16 [RH git: 29f01c4fdb]
Commit list:
108176ab5a github: Stick to python 3.9.
* Tue Oct 19 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-21
- Merging upstream branch-2.16 [RH git: 2546fa9646]
Commit list:
5c5e34603b datapath-windows: add layers when adding the deferred actions
* Thu Oct 14 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-20
- Merging upstream branch-2.16 [RH git: d572c95f69]
Commit list:
458a4f75f3 ofproto-dpif-xlate: Fix zone set from non-frozen-metadata fields.
* Wed Oct 13 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-19
- Merging upstream branch-2.16 [RH git: 557ca689f7]
Commit list:
6d8190584a dpif-netdev: Fix use-after-free on PACKET_OUT of IP fragments.
44a66cc1d0 tunnel-push-pop.at: Mask source port in tunnel header.
* Tue Oct 12 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-18
- Merging upstream branch-2.16 [RH git: a6c4770398]
Commit list:
27a5848a33 ovs-ctl: Add missing description for --ovs-vswitchd-options and --ovsdb-server-options to usage().
0300d0c0c2 dpdk-stub: Change the ERR log to DBG.
cdd6dd821d dpif-netlink: Fix feature negotiation for older kernels.
c2682c42cb dpif-netdev: Fix pmd thread comments to include SMC.
9377f4a465 python: idl: Avoid sending transactions when the DB is not synced up.
* Tue Oct 12 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-17
- Merging upstream branch-2.16 [RH git: c1145b5236]
Commit list:
0fd17fbb09 ipf: release unhandled packets from the batch
* Thu Sep 30 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-16
- Merging upstream branch-2.16 [RH git: 5c05133179]
Commit list:
3f692fba98 datapath-windows:adjust Offset when processing packet in POP_VLAN action
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-15
- ovsdb-data: Deduplicate string atoms. [RH git: 24e7d1140e] (#2006839)
commit 429b114c5aadee24ccfb16ad7d824f45cdcea75a
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Wed Sep 22 09:28:50 2021 +0200
ovsdb-server spends a lot of time cloning atoms for various reasons,
e.g. to create a diff of two rows or to clone a row to the transaction.
All atoms, except for strings, contains a simple value that could be
copied in efficient way, but duplicating strings every time has a
significant performance impact.
Introducing a new reference-counted structure 'ovsdb_atom_string'
that allows to not copy strings every time, but just increase a
reference counter.
This change allows to increase transaction throughput in benchmarks
up to 2x for standalone databases and 3x for clustered databases, i.e.
number of transactions that ovsdb-server can handle per second.
It also noticeably reduces memory consumption of ovsdb-server.
Next step will be to consolidate this structure with json strings,
so we will not need to duplicate strings while converting database
objects to json and back.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2006839
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-14
- ovsdb-data: Add function to apply diff in-place. [RH git: df0e4bda98] (#2006851)
commit 32b51326ef9c307b4acd0bacafb0218dd1372f3d
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Thu Sep 23 01:47:24 2021 +0200
ovsdb_datum_apply_diff() is heavily used in ovsdb transactions, but
it's linear in terms of number of comparisons. And it also clones
all the atoms along the way. In most cases size of a diff is much
smaller than the size of the original datum, this allows to perform
the same operation in-place with only O(diff->n * log2(old->n))
comparisons and O(old->n + diff->n) memory copies with memcpy.
Using this function while applying diffs read from the storage gives
a significant performance boost and allows to execute much more
transactions per second.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2006851
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-13
- ovsdb-data: Optimize subtraction of sets. [RH git: 5bace82405] (#2005483)
commit bb12b63176389e516ddfefce20dfa165f24430fb
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Thu Sep 23 01:47:23 2021 +0200
Current algorithm for ovsdb_datum_subtract looks like this:
for-each atom in a:
if atom in b:
swap(atom, <last atom in 'a'>)
destroy(atom)
quicksort(a)
Complexity:
Na * log2(Nb) + (Na - Nb) * log2(Na - Nb)
Search Comparisons for quicksort
It's not optimal, especially because Nb << Na in a vast majority of
cases.
Reversing the search phase to look up atoms from 'b' in 'a', and
closing gaps from deleted elements in 'a' by plain memory copy to
avoid quicksort.
Resulted complexity:
Nb * log2(Na) + (Na - Nb)
Search Memory copies
Subtraction is heavily used while executing database transactions.
For example, to remove one port from a logical switch in OVN.
Complexity of such operation if original logical switch had 100 ports
goes down from
100 * log2(1) = 100 comparisons for search and
99 * log2(99) = 656 comparisons for quicksort
------------------------------
756 comparisons in total
to only
1 * log2(100) = 7 comparisons for search
+ memory copy of 99 * sizeof (union ovsdb_atom) bytes.
We could use memmove to close the gaps after removing atoms, but
it will lead to 2 memory copies inside the call, while we can perform
only one to the temporary 'result' and swap pointers.
Performance in cases, where sizes of 'a' and 'b' are comparable,
should not change. Cases with Nb >> Na should not happen in practice.
All in all, this change allows ovsdb-server to perform several times
more transactions, that removes elements from sets, per second.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-12
- ovsdb-data: Optimize union of sets. [RH git: e2a4c7d794] (#2005483)
commit 51946d22274cd591dc061358fb507056fbd91420
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Thu Sep 23 01:47:22 2021 +0200
Current algorithm of ovsdb_datum_union looks like this:
for-each atom in b:
if not bin_search(a, atom):
push(a, clone(atom))
quicksort(a)
So, the complexity looks like this:
Nb * log2(Na) + Nb + (Na + Nb) * log2(Na + Nb)
Comparisons clones Comparisons for quicksort
for search
ovsdb_datum_union() is heavily used in database transactions while
new element is added to a set. For example, if new logical switch
port is added to a logical switch in OVN. This is a very common
use case where CMS adds one new port to an existing switch that
already has, let's say, 100 ports. For this case ovsdb-server will
have to perform:
1 * log2(100) + 1 clone + 101 * log2(101)
Comparisons Comparisons for
for search quicksort.
~7 1 ~707
Roughly 714 comparisons of atoms and 1 clone.
Since binary search can give us position, where new atom should go
(it's the 'low' index after the search completion) for free, the
logic can be re-worked like this:
copied = 0
for-each atom in b:
desired_position = bin_search(a, atom)
push(result, a[ copied : desired_position - 1 ])
copied = desired_position
push(result, clone(atom))
push(result, a[ copied : Na ])
swap(a, result)
Complexity of this schema:
Nb * log2(Na) + Nb + Na
Comparisons clones memory copy on push
for search
'swap' is just a swap of a few pointers. 'push' is not a 'clone',
but a simple memory copy of 'union ovsdb_atom'.
In general, this schema substitutes complexity of a quicksort
with complexity of a memory copy of Na atom structures, where we're
not even copying strings that these atoms are pointing to.
Complexity in the example above goes down from 714 comparisons
to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes.
General complexity of a memory copy should always be lower than
complexity of a quicksort, especially because these copies usually
performed in bulk, so this new schema should work faster for any input.
All in all, this change allows to execute several times more
transactions per second for transactions that adds new entries to sets.
Alternatively, union can be implemented as a linear merge of two
sorted arrays, but this will result in O(Na) comparisons, which
is more than Nb * log2(Na) in common case, since Na is usually
far bigger than Nb. Linear merge will also mean per-atom memory
copies instead of copying in bulk.
'replace' functionality of ovsdb_datum_union() had no users, so it
just removed. But it can easily be added back if needed in the future.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Wed Sep 29 2021 Dumitru Ceara <dceara@redhat.com> - 2.16.0-11
- ovsdb: transaction: Use diffs for strong reference counting. [RH git: 85da133eaa] (#2003203)
commit b2712d026eae2d9a5150c2805310eaf506e1f162
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Sep 14 00:19:57 2021 +0200
Currently, even if one reference added to the set of strong references
or removed from it, ovsdb-server will walk through the whole set and
re-count references to other rows. These referenced rows will also be
added to the transaction in order to re-count their references.
For example, every time Logical Switch Port added to a Logical Switch,
OVN Northbound database server will walk through all ports of this
Logical Switch, clone their rows, and re-count references. This is
not very efficient. Instead, it can only increase reference counters
for added references and reduce for removed ones. In many cases this
will be only one row affected in the Logical_Switch_Port table.
Introducing new function that generates a diff of two datum objects,
but stores added and removed atoms separately, so they can be used
to increase or decrease row reference counters accordingly.
This change allows to perform several times more transactions that
adds or removes strong references to/from sets per second, because
ovsdb-server no longer clones and re-counts rows that are irrelevant
to current transaction.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2003203
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
* Mon Sep 27 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-10
- Merging upstream branch-2.16 [RH git: 2114714012]
Commit list:
547371ecdb cirrus: Reduce memory requirements for FreeBSD VMs.
* Thu Sep 23 2021 Timothy Redaelli <tredaelli@redhat.com> - 2.16.0-9
- redhat: use hugetlbfs group for /var/log/openvswitch when dpdk is enabled [RH git: 4e5928b671] (#2004543)
Resolves: #2004543
* Thu Sep 16 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-8
- Merging upstream branch-2.16 [RH git: 7332b410fc]
Commit list:
facaf5bc71 netdev-linux: Fix a null pointer dereference in netdev_linux_notify_sock().
6e203d4873 pcap-file: Fix memory leak in ovs_pcap_open().
f50da0b267 odp-util: Fix a null pointer dereference in odp_flow_format().
7da752e43f odp-util: Fix a null pointer dereference in odp_nsh_key_from_attr__().
bc22b01459 netdev-dpdk: Fix RSS configuration for virtio.
81706c5d43 ipf: Fix only nat the first fragment in the reass process.
* Wed Sep 08 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-7
- Merging upstream branch-2.16 [RH git: e71f31dfd6]
Commit list:
242c280f0e dpif-netdev: Fix crash when PACKET_OUT is metered.
* Tue Aug 31 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-6
- ovsdb: monitor: Store serialized json in a json cache. [RH git: bc20330c85] (#1996152)
commit 43e66fc27659af2a5c976bdd27fe747b442b5554
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Aug 24 21:00:39 2021 +0200
Same json from a json cache is typically sent to all the clients,
e.g., in case of OVN deployment with ovn-monitor-all=true.
There could be hundreds or thousands connected clients and ovsdb
will serialize the same json object for each of them before sending.
Serializing it once before storing into json cache to speed up
processing.
This change allows to save a lot of CPU cycles and a bit of memory
since we need to store in memory only a string and not the full json
object.
Testing with ovn-heater on 120 nodes using density-heavy scenario
shows reduction of the total CPU time used by Southbound DB processes
from 256 minutes to 147. Duration of unreasonably long poll intervals
also reduced dramatically from 7 to 2 seconds:
Count Min Max Median Mean 95 percentile
-------------------------------------------------------------
Before 1934 1012 7480 4302.5 4875.3 7034.3
After 1909 1004 2730 1453.0 1532.5 2053.6
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1996152
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Tue Aug 31 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-5
- raft: Don't keep full json objects in memory if no longer needed. [RH git: 4606423e8b] (#1990058)
commit 0de882954032aa37dc943bafd72c33324aa0c95a
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Aug 24 21:00:38 2021 +0200
raft: Don't keep full json objects in memory if no longer needed.
Raft log entries (and raft database snapshot) contains json objects
of the data. Follower receives append requests with data that gets
parsed and added to the raft log. Leader receives execution requests,
parses data out of them and adds to the log. In both cases, later
ovsdb-server reads the log with ovsdb_storage_read(), constructs
transaction and updates the database. On followers these json objects
in common case are never used again. Leader may use them to send
append requests or snapshot installation requests to followers.
However, all these operations (except for ovsdb_storage_read()) are
just serializing the json in order to send it over the network.
Json objects are significantly larger than their serialized string
representation. For example, the snapshot of the database from one of
the ovn-heater scale tests takes 270 MB as a string, but 1.6 GB as
a json object from the total 3.8 GB consumed by ovsdb-server process.
ovsdb_storage_read() for a given raft entry happens only once in a
lifetime, so after this call, we can serialize the json object, store
the string representation and free the actual json object that ovsdb
will never need again. This can save a lot of memory and can also
save serialization time, because each raft entry for append requests
and snapshot installation requests serialized only once instead of
doing that every time such request needs to be sent.
JSON_SERIALIZED_OBJECT can be used in order to seamlessly integrate
pre-serialized data into raft_header and similar json objects.
One major special case is creation of a database snapshot.
Snapshot installation request received over the network will be parsed
and read by ovsdb-server just like any other raft log entry. However,
snapshots created locally with raft_store_snapshot() will never be
read back, because they reflect the current state of the database,
hence already applied. For this case we can free the json object
right after writing snapshot on disk.
Tests performed with ovn-heater on 60 node density-light scenario,
where on-disk database goes up to 97 MB, shows average memory
consumption of ovsdb-server Southbound DB processes decreased by 58%
(from 602 MB to 256 MB per process) and peak memory consumption
decreased by 40% (from 1288 MB to 771 MB).
Test with 120 nodes on density-heavy scenario with 270 MB on-disk
database shows 1.5 GB memory consumption decrease as expected.
Also, total CPU time consumed by the Southbound DB process reduced
from 296 to 256 minutes. Number of unreasonably long poll intervals
reduced from 2896 down to 1934.
Deserialization is also implemented just in case. I didn't see this
function being invoked in practice.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990058
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Tue Aug 31 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-4
- json: Add support for partially serialized json objects. [RH git: 885e5ce1b5] (#1990058)
commit b0bca6f27aae845c3ca8b48d66a7dbd3d978162a
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Aug 24 21:00:37 2021 +0200
json: Add support for partially serialized json objects.
Introducing a new json type JSON_SERIALIZED_OBJECT. It's not an
actual type that can be seen in a json message on a wire, but
internal type that is intended to hold a serialized version of
some other json object. For this reason it's defined after the
JSON_N_TYPES to not confuse parsers and other parts of the code
that relies on compliance with RFC 4627.
With this JSON type internal users may construct large JSON objects,
parts of which are already serialized. This way, while serializing
the larger object, data from JSON_SERIALIZED_OBJECT can be added
directly to the result, without additional processing.
This will be used by next commits to add pre-serialized JSON data
to the raft_header structure, that can be converted to a JSON
before writing the file transaction on disk or sending to other
servers. Same technique can also be used to pre-serialize json_cache
for ovsdb monitors, this should allow to not perform serialization
for every client and will save some more memory.
Since serialized JSON is just a string, reusing the 'json->string'
pointer for it.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990058
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Tue Aug 31 2021 Ilya Maximets <i.maximets@redhat.com> - 2.16.0-3
- json: Optimize string serialization. [RH git: bb1654da63] (#1990069)
commit 748010ff304b7cd2c43f4eb98a554433f0df07f9
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Tue Aug 24 23:07:22 2021 +0200
json: Optimize string serialization.
Current string serialization code puts all characters one by one.
This is slow because dynamic string needs to perform length checks
on every ds_put_char() and it's also doesn't allow compiler to use
better memory copy operations, i.e. doesn't allow copying few bytes
at once.
Special symbols are rare in a typical database. Quotes are frequent,
but not too frequent. In databases created by ovn-kubernetes, for
example, usually there are at least 10 to 50 chars between quotes.
So, it's better to count characters that doesn't require escaping
and use fast data copy for the whole sequential block.
Testing with a synthetic benchmark (included) on my laptop shows
following performance improvement:
Size Q S Before After Diff
-----------------------------------------------------
100000 0 0 : 0.227 ms 0.142 ms -37.4 %
100000 2 1 : 0.277 ms 0.186 ms -32.8 %
100000 10 1 : 0.361 ms 0.309 ms -14.4 %
10000000 0 0 : 22.720 ms 12.160 ms -46.4 %
10000000 2 1 : 27.470 ms 19.300 ms -29.7 %
10000000 10 1 : 37.950 ms 31.250 ms -17.6 %
100000000 0 0 : 239.600 ms 126.700 ms -47.1 %
100000000 2 1 : 292.400 ms 188.600 ms -35.4 %
100000000 10 1 : 387.700 ms 321.200 ms -17.1 %
Here Q - probability (%) for a character to be a '\"' and
S - probability (%) to be a special character ( < 32).
Testing with a closer to real world scenario shows overall decrease
of the time needed for database compaction by ~5-10 %. And this
change also decreases CPU consumption in general, because string
serialization is used in many different places including ovsdb
monitors and raft.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990069
Signed-off-by: Ilya Maximets <i.maximets@redhat.com>
* Fri Aug 20 2021 Open vSwitch CI <ovs-ci@redhat.com> - 2.16.0-2
- Merging upstream branch-2.16 [RH git: 7d7567e339]
Commit list:
0991ea8d19 Prepare for 2.16.1.
* Wed Aug 18 2021 Flavio Leitner <fbl@redhat.com> - 2.16.0-1
- redhat: First 2.16.0 release. [RH git: 0a1c4276cc]
|