Error parameter may not be qualified with an address space - Исправление ошибок и поиск оптимальных решений проблем

I have the following OpenCL kernel code:
kernel void mandelbrot(global write_only image2d_t output_image)
{
int2 pos = { get_global_id(0), get_global_id(1) };
write_imageui(output_image, (int2)(pos.x, pos.y), (uint4)(254, 0, 0, 254));
}
When the program gets build, I get the following error message:
:1:52: error: parameter may not be qualified with an address space
:1:31: warning: Access qualifiers should only be applied to image types
OpenCL program build error code: -11
Can output images only be created as 1D arrays ?

You don't need any address space qualifier for an image type.
OpenCL C specification 6.5.1:
As image objects are always allocated from the global address space, the __global or global qualifier should not be specified for image types.
https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf

Related

Setting __local float in OpenCL

Im trying to set new __local float but I get error when I pass the size of the float as argument.
This code gives error:
int TILE_DIM = get_local_size(0)*get_local_size(1); //local size
__local float buffer[TILE_DIM];
This code does not:
int TILE_DIM = get_local_size(0)*get_local_size(1); //local size
__local float buffer[512];

Local memory must always be allocated before the kernel runs. Therefore, no arrays with kernel runtime length are possible. However, you can pass a pointer to (uninitialised) __local memory as an argument to the kernel. The length of this can be set in the clSetKernelArg() call. (Check the linked documentation for details on local memory kernel arguments.) So it's variable-length per enqueued kernel, but not per workgroup.

The size of the __local array can be passed to clBuildProgram in options argument: "-DTILE_DIM=512"
For example:
clBuildProgram(program, 1, &device.device_id, "-DTILE_DIM=512", NULL, NULL);
This way the size of the local array can be decided at kernel build time.

JCuda’s JCublas2.cublasSdot: failed to use a device Pointer for the result Pointer parameter

In the source code's comments of JCublas2.cublasSdot, it's commented that the 'result' parameter can be a 'host or device pointer'.
public static int cublasSdot(
cublasHandle handle,
int n,
Pointer x,
int incx,
Pointer y,
int incy,
Pointer result)/** host or device pointer */
{
return checkResult(cublasSdotNative(handle, n, x, incx, y, incy, result));
}
However, I can use only a host pointer like Pointer.to(fs) with float[] fs ={0}. If I use a device pointer like 'CUdeviceptr devicePtr = new CUdeviceptr(); JCudaDriver.cuMemAlloc(devicePtr, 100 * Sizeof.FLOAT);', the program crashes with console messages like:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000007fed93af2a3, pid=9376, tid=0x0000000000003a7c
# .....
Minimization of data transfer between host and device saves time. How to use device Pointer as the 'result' argument for this method, as well as other JCuda methods with result Pointer commented with /** host or device pointer **/?

CUBLAS can write the results of certain computations (like the dot product) either to host or to device memory. The target memory type has to be set explicitly, using cublasSetPointerMode.
An example of how this can be used is shown in the JCublas2PointerModes sample.
It once writes the result of the dot product computation to host memory (which is also the default, when no pointer mode is set explicitly):
// Set the pointer mode to HOST
cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_HOST);
// Prepare the pointer for the result in HOST memory
float hostResult[] = { -1.0f };
Pointer hostResultPointer = Pointer.to(hostResult);
// Execute the 'dot' function
cublasSdot(handle, n, deviceData, 1, deviceData, 1, hostResultPointer);
And then changes the pointer mode and calls the function again, this time writing the result to device memory:
cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_DEVICE);
// Prepare the pointer for the result in DEVICE memory
Pointer deviceResultPointer = new Pointer();
cudaMalloc(deviceResultPointer, Sizeof.FLOAT);
// Execute the 'dot' function
cublasSdot(handle, n, deviceData, 1, deviceData, 1, deviceResultPointer);

OpenCL2.0 kernel giving access violation error at clBuildProgram

I have written a simple kernel, which uses new feature of OpenCL2.0 clang blocks.
int multiplier = 7;
__kernel void clang_blocks_ocl(__global int* input_array, __global int* output_array)
{
int global_id = get_global_id(0);
int ^MultiplayByConstant(int) = ^int (int num) {return multiplier*num;} ;
output_array[global_id] = MultiplayByConstant(input_array[global_id]);
}
I am passing each element of input array to block and getting it multiplied by some constant as you see that in code.
This is my configuration.
OS : Win7 64bit
Graphics Card : AMD
Driver Version : Crimson 15.30
As per debugging the application is hanging at clBuildProgram and application is getting crashed without any return code. I have passed all correct arguments to clBuildProgram including compile option "-cl-std=CL2.0".

Clang blocks were introduced to OpenCL 2.0 for device side enqueue feature, they can't be used for other cases as you are trying to do.
If you meant to use device side enqueue then you need to create device side queue on the host and also modify your kernel - see AMD tutorial.

Returning values from a vector type memory in openCL?

I am trying to use vectors in my OpenCL code. Prior to this, I was mapping the memory to and fro as
cmDevSrc= clCreateBuffer(cxGPUContext,CL_MEM_READ_WRITE,sizeof(cl_char) * (row_info->width) * bpp,NULL,&ciErr);
cmDevDest=clCreateBuffer(cxGPUContext,CL_MEM_READ_WRITE,sizeof(cl_char) * (row_info->width) * bpp,NULL,&ciErr);
I am using cmDevSrc as my source array of unsigned chars and cmdDevDest as for destination.
When I am trying to implement the same using vectors, I am passing the kernel argument as
clSetKernelArg(ckKernel,1,sizeof(cl_uchar4 )*row_info->rowbytes*bpp,&cmDevDest);
with cmDevDest being cl_uchar4 cmDevDest.
But now, I cannot read back my data using mapping , with the following error,
incompatible type for argument 2 of ‘clEnqueueMapBuffer’
/usr/include/CL/cl.h:1066: note: expected ‘cl_mem’ but argument is of type ‘cl_uchar4’
I don't know any other method for this compile time error at this time and I am searching net but any help will very helpful.
Thanks
Piyush

The clCreateBuffer function returns a cl_mem object, not a cl_uchar4 (or anything else), so cmDevSrc and cmDevDest should be declared as cl_mem variables. This is also what is causing the compiler error for your call to clEnqueueMapBuffer.
Additionally, the arg_size argument of clSetKernelArg should be sizeof(cl_mem) when you are passing memory object arguments, not the size of the buffer:
clSetKernelArg(ckKernel, 1, sizeof(cl_mem), &cmDevDest);

mmap() of arrays or malloced memory

I am trying to find the memory map of an array or some memory allocated from malloc() using mmap() but it is showing invalid argument.
#include<stdio.h>
#include<sys/mman.h>
#include<stdlib.h>
int main()
{
int *var1=NULL;
size_t size=0;
size = 1000*sizeof(int);
var1 = (int*)malloc(size);
int i=0;
for(i=0;i<999;i++)
{
var1[i] = 1;
}
printf("%pn",var1);
void *addr=NULL;
addr = mmap((void *)var1, size, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0); //to create memory map of var1
err(1,NULL); //to print error
return 0;
}
Error:
a.out: Invalid argument
Please help me.
Thank you in advance.

Proximate cause: mmap fails because you asked it do create a new memory mapping, you asked for the mapping to be placed at a specific address (var1's address), that address is already occupied (by the heap from which malloc got its memory), and you told the operating system it was not allowed to choose an alternate address in case var1 was not a suitable address (MAP_FIXED).
Analysis: What are you trying to do here? What does "find the memory map of an array" mean? Do you want to have your array of integers located in heap memory (returned by malloc()) or in an anonymous memory mapping created by mmap()? By the way, unless you fork() (create a child process) there is little functional difference: both are areas of memory that are private to your process. But they are not the same thing and you can't manipulate the heap with mmap() nor can you manage mapped memory with malloc().

Источник

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Источник

У меня есть следующий пример кода:

int compute_stuff(int *array)
{
    /* do stuff with array */
    ...
    return x;
}

__kernel void my_kernel()
{
    __local int local_mem_block[LENGTH*MY_LOCAL_WORK_SIZE];
    int result;

    /* do stuff with local memory block */
    result = compute_stuff(local_mem_block + (LENGTH*get_local_id(0)));
    ...
}

Приведенный выше пример отлично компилируется и выполняется на моей карте NVIDIA (RTX 2080).
Но когда я пытаюсь скомпилировать на Macbook с картой AMD, я получаю следующую ошибку:

error: passing '__local int *' to parameter of type '__private int *' changes address space of pointer

Хорошо, тогда я изменяю функцию «compute_stuff» на следующую:

int compute_stuff(__local int *array)

Теперь и NVIDIA, и AMD компилируют его нормально, без проблем… Но затем у меня есть еще один тест, чтобы скомпилировать его на том же Macbook с помощью WINE (а не загружать Windows в bootcamp), и он дает следующую ошибку:

error: parameter may not be qualified with an address space

Таким образом, кажется, что никто не должен квалифицировать параметр функции с помощью адресного пространства. Справедливо. Но если я этого не сделаю, то AMD на родной винде подумает, что я пытаюсь изменить адресное пространство указателя на private (думаю, потому что предполагается, что все аргументы функции будут приватными?).

Каков хороший способ справиться с этим, чтобы все три среды были счастливы его скомпилировать? В крайнем случае, я думаю о том, чтобы программа просто проверяла, не завершилась ли сборка без квалификатора, и если да, подставьте квалификатор «__local» и соберите второй раз… Похоже на хак, но это может Работа.

2 ответа

Я согласен с ProjectPhysX в том, что это ошибка в реализации WINE. Я также обнаружил, что следующее кажется удовлетворяющим всем трем средам:

int compute_stuff(__local int * __private array)
{
    ...
}

__kernel void my_kernel()
{
    __local int local_mem_block[LENGTH*MY_LOCAL_WORK_SIZE];
    __local int * __private samples;

    samples = local_mem_block + (LENGTH*get_local_id(0));

    result = compute_stuff(samples);
}

Вышеупомянутое явно указывает, что сам указатель имеет вид private, а память, на которую он указывает, хранится в адресном пространстве local. Таким образом, это устраняет любую двусмысленность.

AsmCoder8088
17 Апр 2021 в 21:50

int* в int compute_stuff(int *array) — это адресное пространство __generic. Вызов result = compute_stuff(local_mem_block+...); неявно преобразует его в __local, что разрешено в соответствии с Спецификация OpenCL 2.0 Khronos.

Возможно, AMD по умолчанию использует OpenCL 1.2. Возможно, явно установлено –cl-std=CL2.0 в clBuildProgram() или clCompileProgram().

Чтобы сохранить совместимость кода с OpenCL 1.2, вы можете явно установить указатель в функции на __local: int compute_stuff(__local int *array). OpenCL позволяет устанавливать параметры функции в адресные пространства __global и __local. В WINE, похоже, есть ошибка. Возможно, встраивание функции может решить эту проблему: int __attribute__((always_inline)) compute_stuff(__local int *array).

В крайнем случае можно обойтись предложенным вами способом. Вы можете определить, работает ли он в системе WINE, например этот. При этом вы можете переключаться между двумя вариантами кода без двойной компиляции и обнаружения ошибки.

marc_s
27 Май 2021 в 21:56

Источник

Привет, это программа в файле .cl:

__kernel void convolutional_feed_forward ( __global float* input, __local float* kernel, __private int input_i, __private int input_j, __private int kernel_i, __private int kernel_j, __local float bias, __private int channels, __global float* output, __private int stride, __private int padding, __private int output_depth_offset, __private int output_rows_offset, __private int output_cols_offset, __private int n_kernels) {
    int oi,oj,i,j,c,n;
    int output_i = (input_i-kernel_i)/stride + 1 + 2*padding;
    int output_j = (input_j-kernel_j)/stride + 1 + 2*padding;
    int rows,cols,depth;
    depth = get_global_id(2)+output_depth_offset;
    rows = get_global_id(1)+output_rows_offset;
    cols = get_global_id(0)+output_cols_offset;
    if (depth >= 0 && depth < n_kernels && rows >= padding && rows < output_i-padding && cols < output_j-padding){

            for(c = 0; c < channels; c++){
                for(i = 0; i < kernel_i; i++){
                    for(j = 0; j < kernel_j; j++){
                        output[depth*output_i*output_j+rows*output_j+cols] += kernel[c*kernel_i*kernel_j + i*kernel_j + j]*input[c*input_i*input_j + i*input_j + j+(cols-padding)*stride+(rows-padding)*stride*input_j];
                    }
                }
            }
            output[depth*output_i*output_j+rows*output_j+cols] += bias;    
        }       
}

И это ошибка журнала, которая появляется, когда я пытаюсь скомпилировать программу:

<kernel>:1:199: error: parameter may not be qualified with an address space
__kernel void convolutional_feed_forward ( __global float* input, __local float* kernels, __private int input_i, __private int input_j, __private int kernel_i, __private int kernel_j, __local float bias, __private int channels, __global float* output, __private int stride, __private int padding, __private int output_depth_offset, __private int output_rows_offset, __private int output_cols_offset, __private int n_kernels) {
                                                                                                                                                                                                      ^

Помоги пожалуйста.

Источник

Related

Setting __local float in OpenCL

JCuda’s JCublas2.cublasSdot: failed to use a device Pointer for the result Pointer parameter

OpenCL2.0 kernel giving access violation error at clBuildProgram

Returning values from a vector type memory in openCL?

mmap() of arrays or malloced memory

Recommend Projects

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://www.typescriptlang.org/favicon-32x32.png" alt="Typescript photo" /> Typescript

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://laravel.com/img/logomark.min.svg" alt="Laravel photo" /> Laravel

Recommend Topics

javascript

server

Machine learning

Visualization

Game

Recommend Org

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/6154722?v=4" alt="Microsoft photo" /> Microsoft

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/1342004?v=4" alt="Google photo" /> Google

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/1562726?v=4" alt="D3 photo" /> D3

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/18461506?v=4" alt="Tencent photo" /> Tencent

2 ответа

Typescript

Laravel

Microsoft

Google

D3

Tencent