Error to execute custom object detection

Greetings everyone, I have followed the tutorial on the custom object detection on google colab with my own dataset. This is a good tutorial honestly. However, I am trying to execute the program using Jupyter Notebook with my own laptop with graphic card of NVIDIA GTX 1060. I have downloaded CuDNN and CUDA 10.0. I also installed tensorflow-gpu 1.13.1 However, I am facing this problem:

---------------------------------------------------------------------------

ResourceExhaustedError Traceback (most recent call last)
in
1 trainer.setTrainConfig(object_names_array=[‘Streetlight’],batch_size=2, num_experiments=100, train_from_pretrained_model=“pretrained-yolov3.h5”)
----> 2 trainer.trainModel()

~\Anaconda3\lib\site-packages\imageai\Detection\Custom_init_.py in trainModel(self)
289 callbacks=callbacks,
290 workers=4,
–> 291 max_queue_size=8
292 )
293

~\Anaconda3\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your ' + object_name + ' call to the ’ +
90 'Keras 2 API: ’ + signature, stacklevel=2)
—> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

~\Anaconda3\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1730 use_multiprocessing=use_multiprocessing,
1731 shuffle=shuffle,
-> 1732 initial_epoch=initial_epoch)
1733
1734 @interfaces.legacy_generator_methods_support

~\Anaconda3\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
218 sample_weight=sample_weight,
219 class_weight=class_weight,
–> 220 reset_metrics=False)
221
222 outs = to_list(outs)

~\Anaconda3\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
1512 ins = x + y + sample_weights
1513 self._make_train_function()
-> 1514 outputs = self.train_function(ins)
1515
1516 if reset_metrics:

~\Anaconda3\lib\site-packages\tensorflow\python\keras\backend.py in call(self, inputs)
3074
3075 fetched = self._callable_fn(*array_vals,
-> 3076 run_metadata=self.run_metadata)
3077 self._call_fetch_callbacks(fetched[-len(self._fetches):])
3078 return nest.pack_sequence_as(self._outputs_structure,

~\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in call(self, *args, **kwargs)
1437 ret = tf_session.TF_SessionRunCallable(
1438 self._session._session, self._handle, args, status,
-> 1439 run_metadata_ptr)
1440 if run_metadata:
1441 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
526 None, None,
527 compat.as_text(c_api.TF_Message(self.status.status)),
–> 528 c_api.TF_GetCode(self.status.status))
529 # Delete the underlying status object from memory otherwise it stays alive
530 # as there is a reference to status from this from the traceback due to

ResourceExhaustedError: OOM when allocating tensor with shape[1,26,26,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node replica_0_1/model_4/leaky_37/LeakyRelu}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[{{node training_1/Adam/gradients/replica_1_1/model_4/bnorm_17/cond/FusedBatchNorm_grad/FusedBatchNormGrad}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

@aqiff12 Seems the error is happening because your GPU ran out of memory. But it does seem strange since you are using a batch_size of 2 which is just small enough for a 1060.

Do you have other processes using the GPU at the time you attempted to train?

I will get back to you later since I am undergoing your code using my own CPU. 40 epochs, batch size is 16. Which takes about 45 hours. And also, I do not think I have another program that is using my GPU.

1 Like

This is the error.

InvalidArgumentError: Cannot assign a device for operation replica_1/model_1/yolo_layer_3/Variable: node replica_1/model_1/yolo_layer_3/Variable (defined at C:\Users\user\Anaconda3\envs\executewithGPU\lib\site-packages\imageai\Detection\Custom\yolo.py:43) was explicitly assigned to /device:GPU:1 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device specification refers to a valid device.
[[replica_1/model_1/yolo_layer_3/Variable]]

From this error, I can see that the job is being set to use 2 GPUs and CPUs. While I only have 1 GPU and 1 CPU.

2019-12-04 10:39:23.543430: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources’ devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_=’/device:GPU:1’ assigned_device_name_=’’ resource_device_name_=’/device:GPU:1’ supported_device_types_=[GPU, CPU] possible_devices_=[]
VariableV2: GPU CPU
Assign: GPU CPU
Identity: GPU CPU
AssignAdd: GPU CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
replica_1_1/model_4/yolo_layer_6/Variable (VariableV2) /device:GPU:1
replica_1_1/model_4/yolo_layer_6/Variable/Assign (Assign) /device:GPU:1
replica_1_1/model_4/yolo_layer_6/Variable/read (Identity) /device:GPU:1
replica_1_1/model_4/yolo_layer_6/AssignAdd (AssignAdd) /device:GPU:1

2019-12-04 10:39:23.611797: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources’ devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_=’/device:GPU:1’ assigned_device_name_=’’ resource_device_name_=’/device:GPU:1’ supported_device_types_=[GPU, CPU] possible_devices_=[]
VariableV2: GPU CPU
Assign: GPU CPU
Identity: GPU CPU
AssignAdd: GPU CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
replica_1_1/model_4/yolo_layer_5/Variable (VariableV2) /device:GPU:1
replica_1_1/model_4/yolo_layer_5/Variable/Assign (Assign) /device:GPU:1
replica_1_1/model_4/yolo_layer_5/Variable/read (Identity) /device:GPU:1
replica_1_1/model_4/yolo_layer_5/AssignAdd (AssignAdd) /device:GPU:1

2019-12-04 10:39:23.667680: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources’ devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_=’/device:GPU:1’ assigned_device_name_=’’ resource_device_name_=’/device:GPU:1’ supported_device_types_=[GPU, CPU] possible_devices_=[]
VariableV2: GPU CPU
Assign: GPU CPU
Identity: GPU CPU
AssignAdd: GPU CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
replica_1_1/model_4/yolo_layer_4/Variable (VariableV2) /device:GPU:1
replica_1_1/model_4/yolo_layer_4/Variable/Assign (Assign) /device:GPU:1
replica_1_1/model_4/yolo_layer_4/Variable/read (Identity) /device:GPU:1
replica_1_1/model_4/yolo_layer_4/AssignAdd (AssignAdd) /device:GPU:1