Age | Commit message (Collapse) | Author |
|
Add the RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capability to the
DSW event device.
This feature may be used by an EAL thread to pull more work from the
work scheduler, without giving up the option to forward events
originating from a previous dequeue batch. This in turn allows an EAL
thread to be productive while waiting for a hardware accelerator to
complete some operation.
Prior to this change, DSW didn't make any distinction between
RTE_EVENT_OP_FORWARD and RTE_EVENT_OP_NEW type events, other than that
new events would be backpressured earlier.
After this change, DSW tracks the number of released events (i.e.,
events of type RTE_EVENT_OP_FORWARD and RTE_EVENT_OP_RELEASE) that has
been enqueued.
For efficiency reasons, DSW does not track the identity of individual
events. This in turn implies that a certain stage in the flow
migration process, DSW must wait for all pending releases (on the
migration source port, only) to be received from the application, to
assure that no event pertaining to any of the to-be-migrated flows are
being processed.
With this change, DSW starts making a distinction between forward and
new type events for credit allocation purposes. Only RTE_EVENT_OP_NEW
events needs credits. All events marked as RTE_EVENT_OP_FORWARD must
have a corresponding dequeued event from a previous dequeue batch.
Flow migration for flows on RTE_SCHED_TYPE_PARALLEL queues remains
unaffected by this change.
A side-effect of the tweaked DSW migration logic is that the migration
latency is reduced, regardless if implicit release is enabled or not.
Another side-effect is that migrated flows are now not processed
during any part of the migration procedure. An upside of this change
it reduces the load of the overloaded port. A downside is it
introduces slightly more jitter for the migrated flows.
This patch is contains various minor refactorings, improved
formatting, fixed spelling, and the removal of unnessary memory
barriers.
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
|
|
Use DMA ops to store metadata, remove use of completion pool.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Vamsi Attunuru <vattunuru@marvell.com>
|
|
Re-organize event DMA ops structure to allow holding
source and destination pointers without the need for
additional memory, the mempool allocating memory for
rte_event_dma_adapter_ops can size the structure to
accommodate all the needed source and destination
pointers.
Add multiple words for holding user metadata, adapter
implementation specific metadata and event metadata.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Amit Prakash Shukla <amitprakashs@marvell.com>
|
|
For session-less crypto operations, event info is contained in
crypto op metadata for each event which is restored in event
from the crypto op metadata response info.
For session based crypto operations, crypto op contains per session
based event info in crypto op metadata. If any PMD passes any
implementation specific data in "struct rte_event::impl_opaque"
on each event, it's not getting restored.
This patch stores "struct rte_event::impl_opaque" in mbuf dynamic
field before enqueueing to cryptodev and restores
"struct rte_event::impl_opaque" from mbuf dynamic field after
dequeueing crypto op from cryptodev for session based crypto operations.
Fixes: 7901eac3409a ("eventdev: add crypto adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
|
|
With GCC-14, this warning is generated:
drivers/event/sw/sw_evdev.c:263:3: warning:
snprintf' will always be truncated;
specified size is 12, but format string expands to at least 13
snprintf(buf, sizeof(buf), "sw%d_iq_%d_rob", dev_id, i);
^
Yet the whole printf to the buf is unnecessary. The type string argument
has never been implemented, and should just be NULL. Removing the
unnecessary snprintf, then means IQ_ROB_NAMESIZE can be removed.
Fixes: 5ffb2f142d95 ("event/sw: support event queues")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
|
|
Minor cosmetic log change.
No functional impact.
Signed-off-by: Hernan Vargas <hernan.vargas@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
Remove dead code for error and update description of one error print.
Signed-off-by: Hernan Vargas <hernan.vargas@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
HARQ pruning is not an ACC100 feature. Removing in effect dead code.
Signed-off-by: Hernan Vargas <hernan.vargas@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
Remove dead code and unused function in ACC100 driver.
Signed-off-by: Hernan Vargas <hernan.vargas@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
Moving memory barrier so that dequeue thread can be in sync with enqueue
thread.
Fixes: 32e8b7ea35dd ("baseband/acc100: refactor to segregate common code")
Cc: stable@dpdk.org
Signed-off-by: Hernan Vargas <hernan.vargas@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
Don't send NULL MAC addresses in MAC table update.
Fixes: 1b306359e58c ("virtio: suport multiple MAC addresses")
Cc: stable@dpdk.org
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
Switch to epoll so that the concern over the poll() fd array
is removed.
Add a simple list of used entries and track the next free entry.
epoll() is thread safe, we no more need a synchronization
mechanism and so can remove the notification pipe.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
|
|
This patch heavily reworks fdset initialization:
- fdsets are now dynamically allocated by the FD manager
- the event dispatcher is now created by the FD manager
- struct fdset is now opaque to VDUSE and Vhost
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
|
|
This patch forces synchronization for all FDs additions
or deletions in the FD set. With that, it is no more
necessary for the user to know about the FD set pipe, so
hide its initialization in the FD manager.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
|
|
Instead of statically initialize the fdset, this patch
converts VDUSE and Vhost-user to use fdset_init() function,
which now also initialize the mutexes.
This is preliminary rework to hide FDs manager pipe from
its users.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
|
|
This trivial patch fixes a typo in fd's manager polling
mutex name.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
|
|
It is possible to have the control queue without the
device advertising VIRTIO_NET_F_MQ.
Rely on the VIRTIO_NET_F_CTRL_VQ feature being advertised
instead.
Fixes: 6fdf32d1e318 ("net/virtio-user: remove max queues limitation")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
|
|
The Virtio-user control queue kick and call FDs were not
uninitialized at device stop time.
This patch fixes this using the queues iterator helper for
both initialization and uninitialization.
Fixes: 90966e8e5b67 ("net/virtio-user: send shadow virtqueue info to the backend")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
|
|
This patch uses the freshly renamed iterator to destroy
queues at stop time. Doing this, we fix the missing
control queue destruction.
Fixes: 90966e8e5b67 ("net/virtio-user: send shadow virtqueue info to the backend")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
|
|
This is a preliminary rework to prepare for iterating
over queues for non-setup operations.
Also, remove the error log that does not provide much
information given the callbacks already provide one.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
|
|
VIRTIO_F_ORDER_PLATFORM is needed feature when working with
real HW platforms that are exposing virtio-net devices
via VDPA framework. This feature helps in having more
real ordering requirements between descriptor updates and
notification data updates. Hence enable it if the
device supports the feature.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
|
|
vhost_user_get_protocol_features() does not need to know
about Virtio features, but only about Vhost-user protocol
features.
Signed-off-by: Yuan Zhiyuan <yuanzhiyuan0928@outlook.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
This patch introduces a new flag RTE_VHOST_USER_ASYNC_CONNECT,
which in combination with the flag RTE_VHOST_USER_CLIENT makes
rte_vhost_driver_start connect asynchronously to the vhost server.
Signed-off-by: Daniil Ushkov <daniil.ushkov@yandex.ru>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
This patch fixes a potential VM hang bug when the VM reboots after
vhost live recovery due to missing cleanup virtqueue resubmit info.
Specifically, if inflight IO that should be resubmitted during
the latest vhost reconnection has not been submitted yet while
VM rebooting, so GET_VRING_BASE would not wait for the inflight
IO, at this time the resubmit info has been. When the VM restarts,
SET_VRING_KICK will resubmit the inflight IO (If resubmit info
is not null, function set_vring_kick will return without updating
resubmit info).
It’s an error, any stale inflight IO should not be resubmitted
after the VM restart.
The solution is to clean up virtqueue resubmit info when function
set_inflight_fd before function set_vring_kick.
Fixes: ad0a4ae491fe ("vhost: checkout resubmit inflight information")
Cc: stable@dpdk.org
Signed-off-by: Haoqian He <haoqian.he@smartx.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
This patch resolves a build error with GCC 13 and arm/aarch32 as
targets:
In function ‘mbuf_to_desc’,
inlined from ‘vhost_enqueue_async_packed’ at
../lib/vhost/virtio_net.c:1828:6,
inlined from ‘virtio_dev_rx_async_packed’ at
../lib/vhost/virtio_net.c:1842:6,
inlined from ‘virtio_dev_rx_async_submit_packed’ at
../lib/vhost/virtio_net.c:1900:7:
../lib/vhost/virtio_net.c:1159:18: error: ‘buf_vec[0].buf_addr’ may
be used uninitialized [-Werror=maybe-uninitialized]
1159 | buf_addr = buf_vec[vec_idx].buf_addr;
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
<snip>
../lib/vhost/virtio_net.c:1160:18: error: ‘buf_vec[0].buf_iova’ may
be used uninitialized [-Werror=maybe-uninitialized]
1160 | buf_iova = buf_vec[vec_idx].buf_iova;
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
<snip>
../lib/vhost/virtio_net.c:1161:35: error: ‘buf_vec[0].buf_len’ may
be used uninitialized [-Werror=maybe-uninitialized]
1161 | buf_len = buf_vec[vec_idx].buf_len;
| ~~~~~~~~~~~~~~~~^~~~~~~~
GCC complains about the possible runtime path where the while loop
which fills buf_vec (in vhost_enqueue_async_packed) is not run. As a
consequence it correctly thinks that buf_vec is not initialized while
being accessed anyways.
This scenario is actually very unlikely as the only way this can occur
is if size has overflowed to 0. Meaning that the total packet length
would be close to UINT64_MAX (or actually UINT32_MAX). At first glance,
the code suggests that this may never happen as the type of size has
been changed to 64-bit. For a 32-bit architecture such as arm
(e.g. armv7-a) and aarch32, this still happens because the operand types
(pkt->pkt_len and sizeof) are 32-bit wide, performing 32-bit arithmetic
first (where the overflow can happen) and widening to 64-bit later.
The proposed fix simply guarantees to the compiler that the scope which
fills buf_vec is accessed at least once, while not disrupting the actual
logic. This is based on the assumption that size will always be greater
than 0, as suggested by the sizeof, and the packet length will never be
as big as UINT32_MAX, and causing an overflow.
Fixes: 873e8dad6f49 ("vhost: support packed ring in async datapath")
Cc: stable@dpdk.org
Signed-off-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Paul Szczepanek <paul.szczepanek@arm.com>
Reviewed-by: Nick Connolly <nick.connolly@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
Currently virtio_dev_tx_packed() always allocates requested @count of
packets, no matter how many packets are really available on the virtio
Tx ring. Later it has to free all packets it didn't use and if, for
example, there were zero available packets on the ring, then all @count
mbufs would be allocated just to be freed afterwards.
This wastes CPU cycles since rte_pktmbuf_alloc_bulk() /
rte_pktmbuf_free_bulk() do quite a lot of work.
Optimize it by using the same idea as the virtio_dev_tx_split() uses on
the Tx split path: estimate the number of available entries on the ring
and allocate only that number of mbufs.
On the split path it's pretty easy to estimate.
On the packed path it's more work since it requires checking flags for
up to @count of descriptors. Still it's much less expensive than the
alloc/free pair.
The new get_nb_avail_entries_packed() function doesn't change how
virtio_dev_tx_packed() works with regard to memory barriers since the
barrier between checking flags and other descriptor fields is still in
place later in virtio_dev_tx_batch_packed() and
virtio_dev_tx_single_packed().
The difference for a guest transmitting ~17Gbps with MTU 1500 on a `perf
record` / `perf report` (on lower pps the savings will be bigger):
* Before the change:
Samples: 18K of event 'cycles:P', Event count (approx.): 19206831288
Children Self Pid:Command
- 100.00% 100.00% 798808:dpdk-worker1
<... skip ...>
- 99.09% pkt_burst_io_forward
- 90.26% common_fwd_stream_receive
- 90.04% rte_eth_rx_burst
- 75.53% eth_vhost_rx
- 74.29% rte_vhost_dequeue_burst
- 71.48% virtio_dev_tx_packed_compliant
+ 17.11% rte_pktmbuf_alloc_bulk
+ 11.80% rte_pktmbuf_free_bulk
+ 2.11% vhost_user_inject_irq
0.75% rte_pktmbuf_reset
0.53% __rte_pktmbuf_free_seg_via_array
0.88% vhost_queue_stats_update
+ 13.66% mlx5_rx_burst_vec
+ 8.69% common_fwd_stream_transmit
* After:
Samples: 18K of event 'cycles:P', Event count (approx.): 19225310840
Children Self Pid:Command
- 100.00% 100.00% 859754:dpdk-worker1
<... skip ...>
- 98.61% pkt_burst_io_forward
- 86.29% common_fwd_stream_receive
- 85.84% rte_eth_rx_burst
- 61.94% eth_vhost_rx
- 60.05% rte_vhost_dequeue_burst
- 55.98% virtio_dev_tx_packed_compliant
+ 3.43% rte_pktmbuf_alloc_bulk
+ 2.50% vhost_user_inject_irq
1.17% vhost_queue_stats_update
0.76% rte_rwlock_read_unlock
0.54% rte_rwlock_read_trylock
+ 22.21% mlx5_rx_burst_vec
+ 12.00% common_fwd_stream_transmit
It can be seen that virtio_dev_tx_packed_compliant() goes from 71.48% to
55.98% with rte_pktmbuf_alloc_bulk() going from 17.11% to 3.43% and
rte_pktmbuf_free_bulk() going away completely.
Signed-off-by: Andrey Ignatov <rdna@apple.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
_mm_prefetch does not take a volatile qualified pointer, cast it away.
Additionally the pointer type should be char * not void * so adjust the
cast to match.
_mm_cldemote does not take a volatile qualified pointer, cast it away.
Fixes: 28a5e0b9c7f0 ("eal/x86: implement prefetch for MSVC")
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
|
|
MSVC does not have an equivalent of __builtin_prefetch that allows read
or read-write parameter. Introduce conditional compile expansion of
rte_prefetch[0-2] inline functions when building with MSVC.
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
|
|
Applying __rte_unused to a variable has no effect with MS
windows compiler. The temporary variable used if debug
enabled can just be eliminated.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
|
|
Building mempool with MSVC generates a warning
because of this pragma (same with clang when debug is enabled).
The issue the pragma was working around can be better solved
by using an additional cast.
Fixes: af75078fece3 ("first public release")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
|
|
Added information about the memory chunks holding the objects in the
mempool when dumping the status of the mempool to a file.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Paul Szczepanek <paul.szczepanek@arm.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
|
|
Add NULL pointer check to params->name, which is later
copied into the hash datastructure. Without this check
the code segfaults on the strlcpy() of a NULL pointer.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")
Cc: stable@dpdk.org
Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
|
|
The rte_hash lookup can return ZERO which is not a positive value.
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Chenming Chang <ccm@ccm.ink>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
|
|
Extend rte_flow_conv() to support working only on item's mask.
This allows drivers to get only the mask's size when working on pattern
templates and duplicate items having only the mask in a generic way.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
|
|
Add support the extend stats for flower firmware, include the stats for
each queue.
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
|
|
When using multi PF firmware, the other ports always get the
xstats of the first port.
Fix it by adding the offset for other ports.
Fixes: 8ad2cc8fec37 ("net/nfp: add flag for multiple PFs support")
Cc: stable@dpdk.org
Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
|
|
With modern CPUs, it is possible to have higher
CPU count thus we can have higher RTE_MAX_LCORES.
In testpmd application, the current config forwarding
cores option "--nb-cores" is hard limited to 255.
The patch fixes this constraint and also adjusts the lcore
data structure to 32-bit to align with rte lcore APIs.
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
|
|
If requesting an inner (L3/L4 checksum or L4 segmentation) offload,
when the hardware does not support recomputing outer UDP checksum,
automatically disable it in the common helper.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
|
|
According to the X710 datasheet, X710 devices do not support outer
checksum offload.
"""
8.4.4.2 Transmit L3 and L4 Integrity Offload
Tunneling UDP headers and GRE header are not offloaded while the
X710/XXV710/XL710 leaves their checksum field as is.
If a checksum is required, software should provide it as well as the inner
checksum value(s) that are required for the outer checksum.
"""
Fix Tx offload capabilities depending on the VF type.
Bugzilla ID: 1406
Fixes: f7c8c36fdeb7 ("net/iavf: enable inner and outer Tx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
|
|
According to the X710 datasheet (and confirmed on the field..), X710
devices do not support outer checksum offload.
"""
8.4.4.2 Transmit L3 and L4 Integrity Offload
Tunneling UDP headers and GRE header are not offloaded while the
X710/XXV710/XL710 leaves their checksum field as is.
If a checksum is required, software should provide it as well as the inner
checksum value(s) that are required for the outer checksum.
"""
Fix Tx offload capabilities according to the hardware.
X722 may support such offload by setting I40E_TXD_CTX_QW0_L4T_CS_MASK.
Bugzilla ID: 1406
Fixes: 8cc79a1636cd ("net/i40e: fix forward outer IPv6 VXLAN")
Cc: stable@dpdk.org
Reported-by: Jun Wang <junwang01@cestc.cn>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
|
|
Setting a pseudo header checksum in the outer UDP checksum is a Intel
(and some other vendors) requirement.
Applications (like OVS) requesting outer UDP checksum without doing this
extra setup have broken outer UDP checksums.
Move this specific setup from testpmd to the "common" helper
rte_net_intel_cksum_flags_prepare().
net/hns3 can then be adjusted.
Bugzilla ID: 1406
Fixes: d8e5e69f3a9b ("app/testpmd: add GTP parsing and Tx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
|
|
Resetting the outer IP checksum to 0 is not something mandated by the
mbuf API and is done by rte_eth_tx_prepare(), or per driver if needed.
Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
|
|
At the moment, if the driver sets an incorrect Tx descriptor, the HW
will raise a MDD event reported as:
ice_interrupt_handler(): OICR: MDD event
Add some debug info for this case.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
|
|
ICE_TX_CTX_EIPT_NONE == 0.
There is a good chance that !(anything & 0) is true :-).
While removing this noop check is doable, let's check that the
descriptor does contain a outer ip type.
Fixes: 2ed011776334 ("net/ice: fix outer UDP Tx checksum offload")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
|
|
VXLAN extensions (VXLAN-GPE and VXLAN-GBP) are unified
in a single VXLAN flow item.
It is user responsibility to explicitly match VXLAN-GPE with its UDP port.
Below are examples to match standard VXLAN, VXLAN-GPE and VXLAN-GBP.
To match standard vxlan,
... / udp dst is 4789 / vxlan ... / ...
To match VXLAN-GBP, group policy ID is 4321,
... / udp dst is 4789 / vxlan flag_g is 1 group_policy_id is 4321 ... / ...
To match VXLAN-GPE, next protocol is IPv6
... / udp dst is 4790 / vxlan flag_p is 1 protocol is 2 ... / ...
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ori Kam <orika@nvidia.com>
|
|
VXLAN and VXLAN-GPE were supported with similar header structures.
In order to add VXLAN-GBP, which is another extension to VXLAN, both
extensions are merged in the original VXLAN header structure for an easier
usage. More VXLAN extensions may be added in the future in the same single
structure.
VXLAN and VXLAN-GBP use the same UDP port (4789), while VXLAN-GPE uses a
different port (4790).
The three protocols have the same header length and overall a similar
header structure as below.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|R|R|R| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: VXLAN Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|Ver|I|P|B|O| Reserved |Next Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: VXLAN-GPE Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|G|R|R|R|I|R|R|R|R|D|R|R|A|R|R|R| Group Policy ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: VXLAN-GBP Extension
Both GPE and GBP are extending VXLAN by using some reserved bits.
It means the packets can be processed with the same pattern and most of
the code can be reused.
The old field names are kept with the use of anonymous unions.
The Group Policy ID (GBP) and the Next Protocol (GPE) fields are
overlapping so they are in a union as well.
Another improvement is defining and documenting each bit.
Instead of adding flow items, a single VXLAN flow item is more flexible
as it uses the same header anyway.
GBP can be matches with the G bit.
GPE can be matched with the UDP port number.
VXLAN-GPE flow item and specific header are marked as deprecated.
A removal of the deprecated structures and macros may be proposed later.
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
|
|
Implementing the VxLAN last reserved byte modification.
Following the RFC, the field is only 1 byte and needs to
use the field_length as 8 instead of the real dst_field->size.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
|
|
Add "uint8_t last_rsvd" as union with origin rsvd1.
Add RTE_FLOW_FIELD_VXLAN_LAST_RSVD into rte flow packet
field.
The new union is used by testpmd matching item VXLAN
"last_rsvd" and modify target RTE_FLOW_FIELD_VXLAN_LAST_RSVD.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
|
|
The field string should be in the same order as the rte_flow_field_id
enumration definitions
Fixes: bfc007802da7 ("ethdev: allow modifying IPv6 FL and TC fields")
Fixes: d66aa38f431d ("ethdev: allow modifying IPsec fields")
Fixes: b160da13b398 ("ethdev: allow modifying IPv4 next protocol field")
Cc: stable@dpdk.org
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
|
|
Modify debug messages to get better information from debug logs
Signed-off-by: Venkat Kumar Ande <venkatkumar.ande@amd.com>
Acked-by: Selwin Sebastian <selwin.sebastian@amd.com>
|