moomendemol! @moomendemol

1 Beitrag1 Beteiligte*r0 Beiträge heute

**Lukas Weidinger** @lukasweidinger@gruene.social · 21 Std.

Lukas Weidinger @lukasweidinger@gruene.social

I’m thinking of #compiling #darktable from source so that it’s better optimized for my processor.
Anybody experience with its potential? #question #followerpower

I’m generally ok with how fast the flatpak runs on my i7-1255 laptop. However, with such an iterative workflow, I feel that one has much to gain with slight improvements via #opencl and AVX.

**Giuseppe Bilotta** @giuseppebilotta@fediscience.org · 25. März *

25. März *

Giuseppe Bilotta @giuseppebilotta@fediscience.org

I'm liking the class this year. Students are attentive and participating, and the discussion is always productive.

We were discussing the rounding up of the launch grid in #OpenCL to avoid the catastrophic performance drops that come from the inability to divide the “actual” work size by anything smaller than the maximum device local work size, and were discussing on how to compute the “rounded up” work size.

The idea is this: given the worksize N and the local size L, we have to round N to the smallest multiple of L that is not smaller than N. This effectively means computing D = ceili(N/L) and then using D*L.

There are several ways to compute D, but on the computer, working only with integers and knowing that integer division always rounded down, what is the “best way”?

D = N/L + 1 works well if N is not a multiple of L, but gives us 1 more than the intended result if N *is* a multiple of L. So we want to add the extra 1 only if N is not a multiple. This can be achieved for example with

D = N/L + !!(N % L)

which leverages the fact that !! (double logical negation) turns any non-zero value into 1, leaving zero as zero. So we round *down* (which is what the integer division does) and then add 1 if (and only if) there is a reminder to the division.

This is ugly not so much because of the !!, but because the modulus operation % is slow.

1/n

**mirror::box::milo** @mirrorboxmilo@mstdn.social · 24. März

24. März

mirror::box::milo @mirrorboxmilo@mstdn.social

GIF

#generativeart #artificiallife #reactiondiffusion

Fortgeführter Thread

**GPUOpen** @gpuopen@mastodon.gamedev.place · 11. März

11. März

GPUOpen @gpuopen@mastodon.gamedev.place

AMD Radeon GPU Analyzer (RGA) is our performance analysis tool for #DirectX, #Vulkan, SPIR-V, #OpenGL, & #OpenCL.

As well as updates for AMD RDNA 4, there's enhancements to the ISA view UI, using the same updated UI as RGP

More detail: https://gpuopen.com/learn/rdna-cdna-architecture-disassembly-radeon-gpu-analyzer-2-12/?utm_source=mastodon&utm_medium=social&utm_campaign=rdts
(5/7)

AMD GPUOpenReading AMD RDNA™ and CDNA™ Architecture Disassembly is a Breeze with AMD Radeon™ GPU Analyzer v2.12Explore Radeon GPU Analyzer v2.12's enhanced ISA disassembly view, making shader and kernel analysis easier with new tooltips and automatic highlighting. Optimize your GPU workflows effortlessly!

Fortgeführter Thread

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · 9. März *

9. März *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

Here's my #OpenCL implementation: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/kernel.cpp#L1924-L1993

GitHubFluidX3D/src/kernel.cpp at master · ProjectPhysX/FluidX3DThe fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use. - ProjectPhysX/FluidX3D

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · 9. März

9. März

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU #multithreading.
Horizontal sum in #OpenCL was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add.
Also improved volumetric #raytracing!
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2

FluidX3D simulation of the X-wing with velocity raytracing visualization

FluidX3D simulation of the X-wing with density raytracing visualization

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · 22. Feb.

22. Feb.

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

My OpenCL-Benchmark now uses the dp4a instruction on supported hardware (#Nvidia Pascal, #Intel #Arc, #AMD RDNA, or newer) to benchmark INT8 tghroughput.
dp4a is not exposed in #OpenCL C, but can still be used via inline PTX assembly and compiler pattern recognition. Even Nvidia's compiler will turn the emulation implementation into dp4a, but in some cases does so with a bunch of unnecessary shifts/permutations on inputs, so better use inline PTX directly.
https://github.com/ProjectPhysX/OpenCL-Benchmark/releases/tag/v1.8

dp4a implementation in OpenCL, using either inline PTX assembly on Nvidia GPUs with at least compute capability 6.1, or fallback emulatuion which compilers may turn into dp4a via pattern recognition.

INT8 benchmark on Nvidia H100 SXM5 80GB HBM3. dp4a ~quadruples INT8 throughput over char4 multiplication/addition.

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · 8. Feb.

8. Feb.

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and Nvidia compute capability, fixed broken voxelization on some #GPUs and added a workaround for a CPU compiler bug that corrupted rendering. Also AMD GPUs will now show up with their correct name (no idea why they can't report it as CL_DEVICE_NAME like every other sane vendor and instead need CL_DEVICE_BOARD_NAME_AMD extension...)
Have fun!
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.1

GitHubRelease FluidX3D v3.1 (more bug fixes) · ProjectPhysX/FluidX3DThank you for using FluidX3D! Update v3.1 brings two critical bug fixes/workarounds and various small improvements under the hood: Improvements faster enqueueReadBuffer() on modern CPUs with 64-B...

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · 28. Jan.

28. Jan.

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

#NVIDIA #GeForce #RTX5090 #Linux #GPU Compute Performance #Benchmarks
When taking geo mean across 60+ benchmarks of #CUDA / #OptiX / #OpenCL / #Vulkan Compute, the GeForce RTX 5090 was delivering 1.42x the performance of GeForce #RTX4090. On performance-per-Watt GeForce RTX 5090 tended to deliver similar power efficiency to the RTX 4080/4090 graphics cards.
GeForce RTX 5090 Founders Edition was running cooler than many of the other Founders Edition graphics cards tested.
https://www.phoronix.com/review/nvidia-geforce-rtx5090-linux

www.phoronix.comNVIDIA GeForce RTX 5090 Linux GPU Compute Performance Benchmarks Review

**Käsekuchen** @Kaesekuchen@social.anoxinon.de · 24. Jan. *

24. Jan. *

Käsekuchen @Kaesekuchen@social.anoxinon.de

How can I install #AMD iGPU #OpenCL drivers on #tuxedo_os I tried the deb from AMD but I get:

Unsupported OS: /etc/os-release ID 'tuxedo'

Any ideas? @tuxedocomputers #Linux #ubuntu

Boosts welcome!

**HGPU group** @hgpu@mast.hpc.social · 6. Jan.

6. Jan.

HGPU group @hgpu@mast.hpc.social

A comparison of HPC-based quantum computing simulators using Quantum Volume

#CUDA #OpenCL #QuantumComputing #Overview

https://hgpu.org/?p=29643

hgpu.org · 6. Jan.A comparison of HPC-based quantum computing simulators using Quantum VolumeThis paper compares quantum computing simulators running on a single CPU or GPU-based HPC node using the Quantum Volume benchmark commonly proposed for comparing NISQ systems. As simulators do not …

**Gilberto Ficara** @zompetto@mastodon.art · 26. Dez. 2024

26. Dez. 2024

Gilberto Ficara @zompetto@mastodon.art

Published another article about #Darktable with #OpenCL for #AMD #ROCm on #Debian (SID)... I finally got it working (again) :)

https://www.stranatesta.eu/tech/darktable-opencl-debian-sid-december-2024/

strana/testaDarktable with AMD ROCm/OpenCL support on Debian SID, December 2024How I finally got Darktable to enable OpenCL for my AMD 6800 card on Debian SID

#linux #amdgpu #gpu

Fortgeführter Thread

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · 13. Dez. 2024

13. Dez. 2024

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

#Intel Arc B580 #OpenCL specs:
- Windows: https://opencl.gpuinfo.org/displayreport.php?id=4564
- Linux: https://opencl.gpuinfo.org/displayreport.php?id=4562

opencl.gpuinfo.orgIntel(R) Arc(TM) B580 Graphics - OpenCL Hardware Database by Sascha Willems

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · 13. Dez. 2024 *

13. Dez. 2024 *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

Dual #Intel Arc B580 go brrrr #OpenCL

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · 18. Nov. 2024 *

18. Nov. 2024 *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

This is the largest #CFD simulation ever on a single computer, the #NASA X-59 at 117 Billion grid cells. This video visualizes 7.6 PetaByte if volumetric data.

I did this simulation on 2x #Intel Xeon 6980P #HPC CPUs with 6TB MRDIMM memory at massive 1.7TB/s bandwidth. No #GPUs required!

https://www.youtube.com/watch?v=K5eKxzklXDA

As a little gift to you all: #FluidX3D v3.0 is out now, enabling 31% larger resolution on CPUs/iGPUs with #OpenCL zero-copy buffers:
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.0

YouTubeLargest CFD sim ever on a single node: NASA X-59 at 117 Billion cells with FluidX3D on Intel Xeon 6Von Dr. Moritz Lehmann

**karolherbst** @karolherbst@chaos.social · 28. Okt. 2024

28. Okt. 2024

karolherbst @karolherbst@chaos.social

*thinks about to which device to bring #OpenCL support next*

**David Heidelberg** @okias@floss.social · 28. Okt. 2024

28. Okt. 2024

David Heidelberg @okias@floss.social

I'm thrilled to announce that #OpenCL support for #Qualcomm Adreno 600 and later landed and will be available in upcoming #Mesa3D 24.3! I really enjoyed being part of this effort and work on it with Dmitry, @karolherbst, and @robclark https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30835

GitLabfreedreno: support OpenCL via rusticl (!30835) · Merge Requests · Mesa / mesa · GitLabPrerequisites (for applying on top of the 24.2 release): clear_buffer implementation:

**♡ Eva Winterschön ♡** @winterschon@bsd.cafe · 6. Okt. 2024

6. Okt. 2024

♡ Eva Winterschön ♡ @winterschon@bsd.cafe

@dexter Here's the AMD w/ CUDA support/library/api/shim/translation/thing that came up the other day:

> ZLUDA is a drop-in replacement for CUDA on non-NVIDIA GPU. ZLUDA allows to run unmodified CUDA applications using non-NVIDIA GPUs with near-native performance.

- https://vosen.github.io/ZLUDA/blog/zludas-third-life/
- https://github.com/vosen/ZLUDA

Individual benchmarks for ZLUDA, normalized to OpenCL (higher is better)

https://github.com/vosen/ZLUDA

#forDiscussion #gpu #ai

**Sikorski Arkadiusz** @sikorski@floss.social · 4. Okt. 2024

4. Okt. 2024

Sikorski Arkadiusz @sikorski@floss.social

clang-20: argument '-Ofast' is deprecated; use '-O3 -ffast-math' for the same behavior, or '-O3' to enable only conforming optimizations
#clang #programming #compiler #cli #optimization #level #c #cpp #o3 #opencl #llvm

**Troels** @athas@freeradical.zone · 23. Sept. 2024

23. Sept. 2024

Troels @athas@freeradical.zone

Finding myself once again staring at a black screen, I remind myself: never trust AMDs #OpenCL implementation, and test your #GPU code on a remote machine.

Frühere Suchanfragen

Suchoptionen

Verwaltet von:

Serverstatistik:

#opencl