Fatal user error 1002 for loop executed with inconsistent parameters between threads

Я пытаюсь распараллелить цикл for с OpenMP. Обычно это должно быть довольно просто. Однако мне нужно выполнить специфичные для потока инициализации перед...

Я пытаюсь распараллелить цикл for с OpenMP. Обычно это должно быть довольно просто. Однако мне нужно выполнить специфичные для потока инициализации перед выполнением цикла for.

В частности, у меня есть следующая проблема: у меня есть генератор случайных чисел, который не является потокобезопасным, поэтому мне нужно создать экземпляр ГСЧ для каждого потока. Но я хочу убедиться, что не каждый поток будет выдавать одинаковые случайные числа.

Поэтому я попробовал следующее:

    #pragma omp parallel
{
int rndseed = 42;
#ifdef _OPENMP
rndseed += omp_get_thread_num();
#endif

// initialize randon number generator

#pragma omp for
for (int sampleid = 0; sampleid < numsamples; ++sampleid)
{
// do stuff
}
}

Если я использую эту конструкцию, я получаю следующее сообщение об ошибке во время выполнения:

Неустранимая пользовательская ошибка 1002: «#pragma omp for» неправильно вложена в конструкцию разделения работы

Так есть ли способ сделать инициализацию для конкретного потока?

Спасибо

0

Решение

Ошибка у вас есть:

Fatal User Error 1002: '#pragma omp for' improperly nested in a work-sharing construct

относится к незаконному вложению рабочие конструкции. Фактически, стандарт OpenMP 3.1 дает следующие ограничения в разделе 2.5:

  • Каждый регион общего доступа должен встречаться всеми потоками в команде или вообще ни с кем.
  • Последовательность областей разделения рабочих мест и обнаруженных барьерных областей должна быть одинаковой для всех потоков в команде.

Из приведенных выше строк следует, что вложение различных конструкций совместного использования в одной и той же параллельной области не соответствует.

Хотя незаконное вложение не виден в вашем фрагменте, я предполагаю, что он был скрыт из-за упрощения поста по отношению к фактическому коду. Просто чтобы дать вам подсказку, наиболее распространенные случаи:

  • совместное использование цикла конструкции, вложенные в одиночная конструкция (похож на пример Вот)
  • совместное использование цикла конструкции вложенные внутри другого конструкция петли

Если вы заинтересованы, в этот ответ последний случай обсуждается более подробно.

0

Другие решения

Я думаю, что есть ошибка дизайна.

Параллельный цикл for — это не просто N потоков с N числом ядер, например, но потенциально N * X потоков с 1 <= Н * Х < NumSamples.

Если вам нужна переменная «Итерация приватная», объявите ее только внутри тела цикла (но вы уже это знаете); но объявление частной переменной потока для использования внутри параллельного цикла for, вероятно, недостаточно оправдано.

0

Comments

@HannH

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    no custom code(original code cloned from github )
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Windows 10
  • TensorFlow installed from (source or binary):
    source(cmake from tensorflow/contribe/cmake)
  • TensorFlow version (use command below):
    r1.3
  • Visual studio version:
    vs2015
  • CUDA/cuDNN version:
    no GPU/CUDA 8.0+cuDNN5.1
  • GPU model and memory:
    NVIDIA Titan X(12GB)
  • Exact command to reproduce:
    tf_cc.vcxproj -> A:C++tensorflow-1.3.0Source_GPUtf_cc.dirDebugtf_cc.lib
    65>a:c++tensorflow-1.3.0source_gpuexternaleigen_archiveeigensrccoreproductsgeneralblockpanelkernel.h(1977): fatal error C1002:

Describe the problem

Hello, I get error 1002 when compiling tensorflow r1.3 with vs2015 in tf_core_kernels , use DEBUG mode without GPU, which occured during compilation of GPU version too. I belive I have enough memory for compilation. However, I compiled successfully through RELEASE mode without error.

Same issue in tensorflow-r1.2.

@reedwm

Can you post the entire error output that occurs?

@snnn

I believe debug build is not supported, especially for gpu build. You may try to replace /DEBUG:Full with /DEBUG:Fastlink in your compiler flags. And you may also need to split tf_core_kernerls into multiple static libraries. And you need a debug build of python,cuda,… If I were you, I’ll give up

@HannH

@reedwm the error output have just 2 sentences, which I have written above. I would post a part of build log for you
1> count_extremely_random_stats_op.cc
1> finished_nodes_op.cc
1> grow_tree_op.cc
1> reinterpret_string_to_float_op.cc
1> sample_inputs_op.cc
1> scatter_add_ndim_op.cc
1> tree_predictions_op.cc
1> tree_utils.cc
1> update_fertile_slots_op.cc
1> hard_routing_function_op.cc
1> k_feature_gradient_op.cc
1> k_feature_routing_function_op.cc
1> routing_function_op.cc
1> routing_gradient_op.cc
1> stochastic_hard_routing_function_op.cc
1> stochastic_hard_routing_gradient_op.cc
1> unpack_path_op.cc
1> utils.cc
1> skip_gram_kernels.cc
1> skip_gram_ops.cc
1> cross_replica_ops.cc
1> infeed_ops.cc
1> outfeed_ops.cc
1> replication_ops.cc
1> tpu_configuration_ops.cc
1> tpu_sendrecv_ops.cc
1>a:c++tensorflow-1.3.0source_gpuexternaleigen_archiveeigensrccoreproductsgeneralblockpanelkernel.h(1989): fatal error C1002: 在第 2 遍中编译器的堆空间不足(compiler is out of heap space in pass 2)
1>cl : 命令行 error D8040: 创建子进程或与子进程通讯时出错(error creating or communicating with child process)
========== 生成: 成功(success) 0 个,失败 (failure)1 个,最新 0 个,跳过 0 个 ==========
sorry for using the chinese version, I have translated the output in english

@HannH

@snnn I agree with you, there are some problem during compiling in Debug mode now. So I post the error for later development of tensorflow.
It seems like there have some problem in MinSizeRel and RelWithDebInfo mode, because i cannot compile tf in these modes either.

@snnn

Did you use the 32 bits cl.exe or 64 bits cl.exe?

set PreferredToolArchitecture=x64

Run this command before open tensorflow.sln

Or, add «-T host=x64» to cmake command line args.

@HannH

@snnn Thank you for your suggestion! Because compile is time-consuming, I would tell you the result later.
Anthor question: I found it is necessary to compile a tensorflow project statically in vs, which caused the result bloat. For example, project ‘tf_label_image_example’ is 164 MB in release mode, but there only one file named ‘main.cc’ from ‘exampleslabel_image’ in this project. Is there any way to compile it dynamiclly to squeeze the file size?

@reedwm

@HannH

@snnn OK, your suggestion works, thank you!

@HannH

I found it is easy to build tensorflow project dynamically in vs 2015 for tf-r1.3. Close the issue.

@yuyijie1995

@HannH I have the same problem , which solution is useful to you ? Is the «set PreferredToolArchitecture=x64»? I try to add «-T host=x64» to cmake command line args. but another error happened . Can I have your qq or wechat number?

I added openMp code to some serial code in a simulator applicaton, when I run a program that uses this application the program exits unexpectedly with the output "The thread 'Win32 Thread' (0x1828) has exited with code 1 (0x1)", this happens in the parallel region where I added the OpenMp code,
here's a code sample:
#pragma omp parallel for private (curr_proc_info, current_writer, method_h) shared (exceptionOccured) schedule(dynamic, 1)
for (i = 0 ; i < method_process_num ; i++)
{
current_writer = 0;
// we need to add protection before we can dequeue a method from the methods queue,
#pragma omp critical(dequeueMethod)
method_h = pop_runnable_method(curr_proc_info, current_writer);
if(method_h !=0 && exceptionOccured == false){
try {
method_h->semantics();
}
catch( const sc_report& ex ) {
::std::cout << "n" << ex.what() << ::std::endl;
m_error = true;
exceptionOccured = true; // we cannot jump outside the loop, so instead of return we use a flag and return somewhere else
}
}
}
The scheduling was static before I made it dynamic, after I added dynamic with a chunk size of 1 the application proceeded a little further before it exited, can this be an indication of what is happening inside the parallel region?
thanks

As I read it, and I'm more of a Fortran programmer than C/C++, your private variable curr_proc_info is not declared (or defined ?) before it first appears in the call to pop_runnable_method. But private variables are undefined on entry to the parallel region.
I also think your sharing of exception_occurred is a little fishy since it suggests that an exception on any thread should be noticed by any thread, not just the thread in which it is noticed. Of course, that may be your intent.
Cheers
Mark

Related

Eclipse Plugin: How to run Launch-Configurations in a for-loop synchronously?

Here is a simplified version of my code. configurations is an array of the type ILaunchConfiguration.
for (int j = 0; j < configurations.length; j++) {
configurations[j].launch("debug", null);
}
I want to achieve that every ILaunchConfiguration only launches when the prior one is terminated. With my current code I have Thread behaviour. All configurations start simultaneously.
What should I change?
You can't really do this in a simple loop as you will have to use an IDebugEventSetListener listener to listen for each process created by the launch terminating.
When you call ILaunchConfiguration.launch you get back an ILaunch object. You can then call ILaunch.getProcesses to get an array of IProcess objects that were created by the launch (there may be several processes created).
Set up an IDebugSetEventListener using:
DebugPlugin.getDefault().addDebugEventListener(listener);
In the listener handleDebugEvents you can check for the processes finishing with something like:
public void handleDebugEvents(DebugEvent [] events)
{
for (DebugEvent event : events) {
Object source = event.getSource();
if (source instanceof IProcess &&
event.getKind() == DebugEvent.TERMINATE) {
// TODO check if the process terminating is one you are interested in
}
}
}
Once all the processes for a launch have terminated you can do the next launch.

C# : this.Invoke((MethodInvoker)delegate

can somebody explain me the following code please :
this.Invoke((MethodInvoker)delegate
{
lblNCK.Text = cncType;
});
Here is where it comes from :
string cncType;
if (objDMainCncData != null)
{
int rc = objDMainCncData.Init(objDGroupManager.Handle);
if (rc == 0)
{
cncType = objDMainCncData.GetCncIdentifier();
if (cncType != string.Empty)
{
if (cncType.ToUpper().IndexOf("+") != -1)
_bFXplus = true;
this.Invoke((MethodInvoker)delegate
{
lblNCK.Text = cncType;
});
}
}
else
{
DisplayMessage("objDMainCncData.Init() failed ! error : " + rc.ToString());
}
}
}
I don't get the use of "this.Invoke((MethodInvoker)delegate".
Thank you by advance.
Peter.
Strange that no one has answered this.
Lets take it in pieces:
this.Invoke: This is a synchronization mechanism, contained in all controls. All graphic/GUI updates, must only be executed from the GUI thread. (This is most likely the main thread.) So if you have other threads (eg. worker threads, async functions etc.) that will result in GUI updates, you need to use the Invoke. Otherwise the program will blow up.
delegate{ ... }: This is a anonymous function. You can think of it as "creating a function on the fly". (Instead of finding a space in the code, create function name, arguments etc.)
(MethodInvoker): The MethodInvoker is just the name of the delegate, that Invoke is expecting. Eg. Invoke expects to be given a function, with the same signature as the "MethodInvoker" function.
What happens, is that Invoke is given a function pointer. It wakes up the GUI thread through a mutex and tells it to executes the function (through the function pointer). The parent thread then waits for the GUI thread to finish the execution. And it's done.

what happend if i didn’t call ev_loop_fork in the child

I thought, if I didn't call the ev_loop_fork in the child, then the watcher in child wouldn't be triggered.
This is my code, I build the ev_loop with EVBACKEND_EPOLL and EVFLAG_NOENV flags.
So there is no EVFLAG_FORKCHECK flag.
Then I comment the ev_loop_fork call in the child.
If everything goes well, I thought the child will not trigger the timeout callback function.
But actually, the output is something like this:
$ 4980 fork 4981
$ time out at 4980
$ time out at 4981
it seemed that the watchers still has been triggered in the child, it behaved the same as call ev_loop_fork .
So what's the problem, thank you.
#include<ev.h>
#include<stdio.h>
#include<unistd.h>
void timeout_cb(EV_P_ ev_timer *w,int revents)
{
printf("time out at %dn", getpid());
ev_break(EV_A_ EVBREAK_ONE);
}
int main()
{
int ret;
ev_timer timeout_watcher;
struct ev_loop *loop = ev_default_loop(EVBACKEND_EPOLL | EVFLAG_NOENV);
ev_timer_init(&timeout_watcher,timeout_cb,5.5,0.);
ev_timer_start(loop,&timeout_watcher);
ret = fork();
if(ret>0) printf("%d fork %dn",getpid(),ret);
else if(ret==0)
{
//ev_loop_fork(EV_DEFAULT);
}
else return -1;
ev_run(loop,0);
return 0;
}
The libev manual does not say that after a fork an event loop will be stopped. All it says is that to be sure that the event loop will properly work in the child, you need to call ev_loop_fork(). What's actually happening depends on the backend.
And technically, timers will even be more resilient against forks in most backends: select(), poll(), epoll(), kqueue all allow for specification of a timeout value after which these functions return in case of no event. libev uses this feature to be able to trigger timeouts when they are supposed to be triggered. So there's no need to re-register any file descriptors for timeouts to work.

NSJSONSerialization.JSONObjectWithData leaks memory

I have a function which uses NSJSONSerialization.JSONObjectWithData, but some memory was not released.
So I tracked down where the leak occurred and tested it with the following function:
private func test() {
for var i = 0; i < 100000; i++ {
let toParse = NSString(string: "{ "Test" : [ "Super mega long JSON-string which is super long because it should be super long because it is easier to see the differences in memory usage with a super long JSON-string than with a short one." ] }").dataUsingEncoding(NSUTF8StringEncoding)!
let json = try! NSJSONSerialization.JSONObjectWithData(toParse, options: NSJSONReadingOptions(rawValue: 0))
}
}
The memory-usage of my app before I called test() was 11 MB, the memory-usage after was 74.4 MB (even if I did some other things in my app to give the system some time to release the memory)...
Why is json not released?
Mundi pointed me to autoreleasepool which I hadn't tried yet (insert facepalm here)... so I changed the code to:
autoreleasepool {
self.test()
}
This didn't make any difference, and because Xcode suggested it, I also tried:
autoreleasepool({ () -> () in
self.test()
})
But this also didn't work...
P.S.: Maybe I should add that I'm using Swift 2.0 in Xcode 7 GM.
P.P.S: The test()-function is called from within a
dispatch_async(dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0), {
//... my code ...
self.test()
})
but this shouldn't make any difference...
You are misunderstanding how an autorelease pool works. An autorelease pools keeps hold of allocated memory until the pool is released. Calling a loop 100,000 times inside an autorelease pool means the pool has no chance to release anything, so the memory builds up. Eventually it goes away, when the code has finished running and the autorelease pool is released, but meanwhile your memory usage goes up.
Correct way:
private func test() {
for var i = 0; i < 100000; i++ {
autoreleasepool {
stuff
}
}
}
As you point out in your question, the app arbitrarily releases the memory, so the fact that it is still not release does not mean it would cause a tight memory condition.
You can try enclosing your test routine in an autoreleasepool, similar to Objective-C.
func test() {
autoreleasepool {
// do the test
}
}

gracefully shutdown (multi-threaded) gSOAP service with http-keepalive enabled

I have a multi-threaded gSOAP service running with enabled http-keepalive. How can I gracefully shutdown the service when there are still clients connected?
A similar question was asked in gSoap: how to gracefully shutdown the webservice application?, but the answers do not cover the http-keepalive aspect: The soap-serve function will simply not return until the http-keepalive-session wasn't closed by the client. Thus, step 2 in the accepted answer will block until the client decides to close the connection (or the receive-timeout expires, but a short timeout would break the desired http-keepalive behaviour here).
The examples from the gSOAP documentation suffer from the same problem.
What I tried so far was to call soap_done() for all soap structs that are hanging in a soap_serve call from the main thread to interrupt the connections waiting for http-keepalive, which works most of the time, but crashes in rare conditions (a race condition maybe), so this is no solution for me.
I just ran into the very same problem and I think I've got a solution for you.
As you just said, the problem is that the gSoap hangs on soap_serve. This happens because gSOAP generates an internal loop for you that waits for the arrival of all keep-alive requests OR a timeout on the server-side arises.
What I've done is grabbing the soap_serve function inside the automatically generated service stub. I'm going to list the original soap_serve function so that you can find it on your service stub file :
SOAP_FMAC5 int SOAP_FMAC6 soap_serve(struct soap *soap)
{
#ifndef WITH_FASTCGI
unsigned int k = soap->max_keep_alive;
#endif
do
{
#ifdef WITH_FASTCGI
if (FCGI_Accept() < 0)
{
soap->error = SOAP_EOF;
return soap_send_fault(soap);
}
#endif
soap_begin(soap);
#ifndef WITH_FASTCGI
if (soap->max_keep_alive > 0 && !--k)
soap->keep_alive = 0;
#endif
if (soap_begin_recv(soap))
{ if (soap->error < SOAP_STOP)
{
#ifdef WITH_FASTCGI
soap_send_fault(soap);
#else
return soap_send_fault(soap);
#endif
}
soap_closesock(soap);
continue;
}
if (soap_envelope_begin_in(soap)
|| soap_recv_header(soap)
|| soap_body_begin_in(soap)
|| soap_serve_request(soap)
|| (soap->fserveloop && soap->fserveloop(soap)))
{
#ifdef WITH_FASTCGI
soap_send_fault(soap);
#else
return soap_send_fault(soap);
#endif
}
#ifdef WITH_FASTCGI
soap_destroy(soap);
soap_end(soap);
} while (1);
#else
} while (soap->keep_alive);
#endif
return SOAP_OK;
}
You should extract the body of this function and replace your old soap_serve(mySoap) call inside your thread (the thread that performs the requests and hagns because of the keep-alive) with the following:
do
{
if ( Server::mustShutdown() ) {
break;
}
soap_begin(mySoap);
// If we reached the max_keep_alive we'll exit
if (mySoap->max_keep_alive > 0 && !--k)
mySoap->keep_alive = 0;
if (soap_begin_recv(mySoap))
{ if (mySoap->error < SOAP_STOP)
{
soap_send_fault(mySoap);
break;
}
soap_closesock(mySoap);
continue;
}
if (soap_envelope_begin_in(mySoap)
|| soap_recv_header(mySoap)
|| soap_body_begin_in(mySoap)
|| soap_serve_request(mySoap)
|| (mySoap->fserveloop && mParm_Soap->fserveloop(mySoap)))
{
soap_send_fault(mySoap);
break;
}
} while (mySoap->keep_alive);
Note the following:
The Server::mustShutdown() acts as a flag that will be set to true (externally) to end all the threads. When you want to stop the server from handling new requests you'll this function will return true and the loop will end.
I've removed the ifdef, WITH_FASTCGI it's not interesting for us now.
When the you close the connection like this, any clients connected to the server will raise an exception. Clients written in C# for instance will throw a "The underlying connection is excepted to keep alive was closed by the server" wich makes perfect sense for us.
But we are not done yet, thanks to what AudioComplex pointed out, the system still remains waiting for reqeuests on soap_begin_recv. But I've got a solution for that too ;)
Each of the threads on the connection-handling pool creates a copy of the main soap context (via soap_copy), these threads are the ones that
I store each of these contexts as an element on the array that resides on the main connection-handling thread.
When terminating the main connection-handling thread (the one that serves the requests) it will go through all soap contexts and finalize "manually" the connection by using:
for (int i = 0; i < soaps.size(); ++i) {
soaps[i]->fclose(soaps[i]);
}
This will force the soap_serve loop to finish. It actually will stop the internal loop near line 921 of stdsoap2.cpp_
r = select((int)soap->socket + 1, &fd, NULL, &fd, &timeout);
It is not the cleanest solution (haven't found a cleaner one) but it will definitely stop the service.

Понравилась статья? Поделить с друзьями:

Читайте также:

  • Fatal unable to access the requested url returned error 500
  • Fatal unable to access the requested url returned error 403 gitlab runner
  • Fatal unable to access the requested url returned error 400
  • Fatal tls error check tls errors co restarting
  • Fatal tls error check tls errors co openvpn

  • 0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии