Implementing a Flutter plugin with native OpenCV support via dart::ffi – Part 2/2

In this article, we are going to finish what we started in part 1 of this article. So far, we have created an empty Flutter app and linked it with precompiled native binaries to use with the dart::ffi foreign function interface.

app store

Our ultimate goal is to use our new FFI bindings with OpenCV to detect shapes in a camera stream, and to point those out in an overlay. To make that possible, we will now add the flutter_camera plugin and wrap it with additional functionality for live detection.

Let’s get into it.

Add further Flutter dependencies

First, we need to add the following dependencies in our  flutter_ffi_demo: camera, permission_handler, logging, and ffi. Open the file pubspec.yaml (you can find our repository of our example app here) and add those dependencies as follows:

environment:
  sdk: ">=2.14.0 <3.0.0"
  flutter: ">=1.20.0"

dependencies:
  flutter:
    sdk: flutter
  camera: 0.9.4+1 # plugin for camera support
  permission_handler: ^8.2.0 # to handle platform permissions
  logging: ^1.0.2 # for logging
  ffi: ^1.1.2 # dart ffi dependency itself

After adding new dependencies, make sure to install them via Android Studio or run flutter pub get in the terminal.

Implement the camera preview

Next, let’s implement a camera preview in our Flutter example app. You can find the complete code in the file lib/camera_preview.dart. To follow along the steps, you can create the same file in your project.

CameraPreview is a widget provided by the camera plugin that we added as dependency in the previous step. It works in tandem with a component called CameraController

Let’s start with the widget:

@override
Widget build(BuildContext context) {
 if (!_initialized) {
   return Container();
 }
 final camera = controller.value;

 [...]

 var combinedOverlay = Center(
   child: Stack(
     children: [debugOverlay, overlay ?? Container()],
   ),
 );
 return Center(
     child: CameraPreview(
   controller,
   child: combinedOverlay,
 ));
}

The CameraController provides control of most camera aspects, like selecting a suitable camera (front, back …), zooming, or enabling frame streaming. 

Wrap the camera preview with the aspect ratio handling

To work with the camera preview, we need to initialize the CameraController first: 

@override
void initState() {
 super.initState();

 controller.initialize().then((_) {
   setState(() {
     _initialized = true;
   });
   if (!mounted) {
     return;
   }
   if (detectHandler != null) {
     controller.startImageStream((image) {
       if (!_isDetecting && this.mounted) {
         callFrameDetection(image, finder);
       }
     });
   }
 });
}
@override
void dispose() {
 controller.dispose();
 super.dispose();
}

The main part here is that we start a frame streaming with controller.startImageStream and run a detection on incoming image frames via callFrameDetection(image, finder). Since the detection on a frame might take some time, we ignore other incoming frames during the detection.

Before we call the actual detection, we need to calculate the rect of interest (ROI) based on current image orientation and finder:

void callFrameDetection(CameraImage image, FinderConfig? finder) async {
 try {
   _isDetecting = true;
   Rect? roi; // rect of interest
   const rotation = 90;
   // calculate ROI based on image orientation/rotation and finder
   if (finder is AspectRatioFinderConfig) {
     roi = calculateRoiFromAspectRatio(image, finder, rotation);
   }
   if (finder is FixedSizeFinderConfig) {}
   // and run the actual detection
   await detectHandler?.detect(image, roi, rotation);
 } catch (e) {
   // todo: error handling
 } finally {
   _isDetecting = false;
 }
}

Note: There is no information about the frame orientation when it comes from the camera image stream. For simplicity’s sake, we are handling only vertical device orientation by turning frames by 90 degrees. In a real-world app, you will need to properly handle other device orientations. 

In the above snippet, detectHandler is an abstract class which implements a detection. It then returns the result to the stream, so that any subscribers can get the result. At this point, our OpenCV implementation via dart:ffi is finally used.

abstract class FrameHandler<T> {
 abstract StreamController<T> detectionResultStreamController;

 Future<void> detect(CameraImage image, Rect? roi, int rotation);
}

class OpenCvFramesHandler extends FrameHandler<ProcessingResult> {
 OpenCvShapeDetector frameProcessor;

 @override
 StreamController<ProcessingResult> detectionResultStreamController;

 OpenCvFramesHandler(
     this.frameProcessor, this.detectionResultStreamController);

 Future<void> detect(CameraImage image, Rect? roi, int rotation) async {
 print("frame aspect ratio ${image.width/image.height}");
 final ProcessingResult result = await frameProcessor.processFrame(image, rotation, roi);
 detectionResultStreamController.add(result);
}

}

As we are going to add our camera to the screen, we also need to add a second layer on top of our current CameraView. Let’s call this new layer LiveDetection. In this layer we need to initialize the camera controller and also handle the camera permissions.

Permissions handling

For the permissions handling, we will use the permission_handler plugin. This plugin provides functionality to request and handle permissions on Android and iOS. In order to make this plugin work properly, please look at its readme and check the iOS platform installation details. 

The overall permissions workflow looks like this: There is a widget property called permissionGranted that describes whether the camera permission is granted or not. If yes, we show the camera widget, if not, we display a blank widget or a placeholder. On initialization of the state, we run the permission check. If it has not been granted yet, this calls up the platform’s permission dialog:

@override
void initState() {
 checkPermission();
 super.initState();
}

void checkPermission() async {
 final permissionResult = await [Permission.camera].request();
 setState(() {
   permissionGranted =
       permissionResult[Permission.camera]?.isGranted ?? false;
 });
}

Initialize the Camera Controller

Once we have our permissions, we can start with the CameraController initialization. The widget FutureBuilder helps us to create widgets based on the Future approach for getting the available cameras. 

When the cameras are ready, we check if the CameraController was already initialized. If not, we can use cameraData to do so now: 

@override
 Widget build(BuildContext context) {
   late Widget cameraPlaceholder;
   if (permissionGranted) {
     cameraPlaceholder = FutureBuilder(
       future: availableCameras(),
       builder: (BuildContext context,
           AsyncSnapshot<List<CameraDescription>> snapshot) {
         final data = snapshot.data;
         if (data != null) {
           final cameraData = data[0];
           if (cameraData != null) {
             var resolutionPreset = ResolutionPreset.max;
             if (Platform.isIOS) {
               resolutionPreset = ResolutionPreset.medium;
             }
             controller ??= CameraController(cameraData, resolutionPreset,
                 imageFormatGroup: ImageFormatGroup.yuv420);

             return ScanbotCameraWidget(
               const Key('Camera'),
               controller!,
               finderConfig: aspectRatioFinderConfig,
               detectHandler: handler,
               overlay: overlay,
             );
           } else {
             return Container();
           }
         } else {
           return Container();
         }
       },
     );
   } else {
     cameraPlaceholder = Container();
   }

   return cameraPlaceholder;
 }
}

@override
void dispose() {
 controller?.dispose();
 controller = null;
 super.dispose();
}

Make sure to use imageFormatGroup: ImageFormatGroup.yuv420,  because it’s the only format that works for both native platforms.

Now that we have all the basic code for the live detection, let’s proceed with the detection flow. There are a few catches that we need to consider.

Once we get the frame in the CameraImage format, we need to create a native object which will represent the camera image in the native memory. The iOS frame comes as one plane in YUV422 format, while Android’s frames also come in YUV422, but in three different planes that we need to merge later.

Because of the difference in how the image data is represented in iOS and Android, we need to create some generic structures:

class SdkImage extends Struct {
 external Pointer<SdkImagePlane> plane;
 @Int32()
 external int platform; // 0 ios, 1 android
 @Int32()
 external int width;
 @Int32()
 external int height;
 @Int32()
 external int rotation;
}

class SdkImagePlane extends Struct {
 external Pointer<Uint8> planeData;
 @Int32()
 external int length;
 @Int32()
 external int bytesPerRow;
 external Pointer<SdkImagePlane> nextPlane;
}

This is how we describe dart::ffi structures for images that contain planes with byte data. It’s similar in structure to the CameraImage class from the flutter_camera plugin.

Scanbot SDK:
Unlimited scanning at a fixed price

Your reliable data capture solution for mobile and web app integration.


Supports all common platforms and frameworks.

Frame preparation for detection

Future<ProcessingResult> processFrameAsync(_FrameData detect) async {
 try {
   final stopwatch = Stopwatch()..start();
   ffi.Pointer<SdkImage> image =
       detect.image.toSdkImagePointer(detect.rotation);
   final scanner = ffi.Pointer.fromAddress(detect.scanner);
   ffi.Pointer<_ShapeNative> result;
   var roi = detect.roi;
   if (roi != null) {
     result = _processFrameWithRoi(scanner, image, roi.left.toInt(),
         roi.top.toInt(), roi.right.toInt(), roi.bottom.toInt());
   } else {
     result = _processFrame(scanner, image);
   }
   print('recognise() detect in ${stopwatch.elapsedMilliseconds}');
   stopwatch.stop();
   final shapes = _mapNativeItems(result);
   image.release();
   print("shapes total found ${shapes.length}");
   return ProcessingResult(shapes);
 } catch (e) {
   print(e);
 }

 return ProcessingResult([]);
}

This is the main method of the shape detector. It prepares an image and calls detection on the native layer. Let’s look into image.toSdkImagePointer(detect.rotation). It’s an extension method to the CameraImage that converts it to the data structure that we can use from our C++ code. All extension methods we are using can be found here.

extension CameraImageExtention on CameraImage {
 bool isEmpty() => planes.any((element) => element.bytes.isEmpty);

 Pointer<SdkImage> toSdkImagePointer(int rotation) {
   var pointer = createImageFrame();
   final image = pointer.ref;
   image.width = width;
   image.height = height;
   image.rotation = rotation;

   if (Platform.isIOS) {
     image.platform = 0;
     final plane = planes[0];
     final bytesPerRow = planes[0].bytesPerRow;
     final pLength = plane.bytes.length;
     final p = malloc.allocate<Uint8>(pLength);
     // Assign the planes data to the pointers of the image
     final pointerList0 = p.asTypedList(pLength);
     pointerList0.setRange(0, pLength, plane.bytes);
     final sdkPlanePointer = createImagePlane();
     final sdkPlane = sdkPlanePointer.ref;
     sdkPlane.bytesPerRow = bytesPerRow;
     sdkPlane.length = pLength;
     sdkPlane.planeData = p;
     sdkPlane.nextPlane = nullptr;
     image.plane = sdkPlanePointer;
   }

   if (Platform.isAndroid) {
     image.platform = 1;
     final plane0 = planes[0];
     final pLength0 = plane0.bytes.length;
     final plane1 = planes[1];
     final pLength1 = plane1.bytes.length;
     final plane2 = planes[2];
     final pLength2 = plane2.bytes.length;
     final bytesPerRow0 = planes[0].bytesPerRow;
     final bytesPerRow1 = planes[1].bytesPerRow;
     final bytesPerRow2 = planes[2].bytesPerRow;

     final p0 = malloc.allocate<Uint8>(pLength0);
     final p1 = malloc.allocate<Uint8>(pLength1);
     final p2 = malloc.allocate<Uint8>(pLength2);

     // Assign the planes data to the pointers of the image
     final pointerList0 = p0.asTypedList(pLength0);
     final pointerList1 = p1.asTypedList(pLength1);
     final pointerList2 = p2.asTypedList(pLength2);
     pointerList0.setRange(0, pLength0, plane0.bytes);
     pointerList1.setRange(0, pLength1, plane1.bytes);
     pointerList2.setRange(0, pLength2, plane2.bytes);

     //final allocate = malloc.allocate<SdkImagePlane>(0);
     final sdkPlanePointer0 = createImagePlane();
     final sdkPlanePointer1 = createImagePlane();
     final sdkPlanePointer2 = createImagePlane();
     final sdkPlane0 = sdkPlanePointer0.ref;
     final sdkPlane1 = sdkPlanePointer1.ref;
     final sdkPlane2 = sdkPlanePointer2.ref;

     sdkPlane2.bytesPerRow = bytesPerRow2;
     sdkPlane2.nextPlane = nullptr;
     sdkPlane2.length = pLength2;
     sdkPlane2.planeData = p2;
     sdkPlane1.nextPlane = sdkPlanePointer2;

     sdkPlane1.bytesPerRow = bytesPerRow1;
     sdkPlane1.length = pLength1;
     sdkPlane1.planeData = p1;
     sdkPlane0.nextPlane = sdkPlanePointer1;

     sdkPlane0.bytesPerRow = bytesPerRow0;
     sdkPlane0.length = pLength0;
     sdkPlane0.planeData = p0;
     image.plane = sdkPlanePointer0;
   }
   return pointer;
 }
}

This method describes how to create objects in the native memory and fill them with the byte data of our image. Here we see the difference between iOS and Android’s frame structures. iOS images have just one plane, while Android has three planes, though both use the YUV422 image format.

We use native methods to allocate the memory for structures. Then we fill this allocated memory with some data. Here is how we describe our dart::ffi interfaces for the methods of allocating structs in memory. The implementation is done in C++ code as part of this file.

createImagePlane();
createImageFrame();

final createImageFrame =
   sdkNative.lookupFunction<_CreateImageFrameNative, _CreateImageFrame>(
       'MathUtils_createImageFrame');

final createImagePlane =
   sdkNative.lookupFunction<_CreateImagePlaneNative, _CreateImagePlane>(
       'MathUtils_createPlane');

typedef _CreateImageFrameNative = ffi.Pointer<SdkImage> Function();
typedef _CreateImageFrame = ffi.Pointer<SdkImage> Function();

typedef _CreateImagePlaneNative = ffi.Pointer<SdkImagePlane> Function();
typedef _CreateImagePlane = ffi.Pointer<SdkImagePlane> Function();

Here, MathUtils_createPlane and MathUtils_createImage are native methods that allocate structs in the native memory and return pointers to them.

#ifdef __cplusplus
extern "C" {
#endif

flutter::Plane *MathUtils_createPlane() {
   return (struct flutter::Plane *) malloc(sizeof(struct flutter::Plane));
}

flutter::ImageForDetect *MathUtils_createImageFrame() {
   return (struct flutter::ImageForDetect *) malloc(sizeof(struct flutter::ImageForDetect));
}

#ifdef __cplusplus
}
#endif

After we get the pointers and fill all the data into structs, we can call the detection with the methods _processFrame and _processFrameWithRoi. See this dart::ffi part:

final _processFrame = sdkNative
   .lookupFunction<_ProcessFrameNative, _ProcessFrame>('processFrame');

typedef _ProcessFrameNative = ffi.Pointer<_ShapeNative> Function(
   ffi.Pointer<ffi.NativeType>, ffi.Pointer<SdkImage>);
typedef _ProcessFrame = ffi.Pointer<_ShapeNative> Function(
   ffi.Pointer<ffi.NativeType>, ffi.Pointer<SdkImage>);

final _processFrameWithRoi =
   sdkNative.lookupFunction<_ProcessFrameWithRoiNative, _ProcessFrameWithRoi>(
       'processFrameWithRoi');

typedef _ProcessFrameWithRoiNative = ffi.Pointer<_ShapeNative> Function(
 ffi.Pointer<ffi.NativeType>,
 ffi.Pointer<SdkImage>,
 ffi.Int32,
 ffi.Int32,
 ffi.Int32,
 ffi.Int32,
);
typedef _ProcessFrameWithRoi = ffi.Pointer<_ShapeNative> Function(
   ffi.Pointer<ffi.NativeType>, ffi.Pointer<SdkImage>, int, int, int, int);

And their native representation as full code here:

flutter::Shape *processFrame(ShapeDetector *scanner, flutter::ImageForDetect *image) {
   auto img = flutter::prepareMat(image);
   auto shapes = scanner->detectShapes(img);
   //we need to map result as a linked list of items to return multiple result
   flutter::Shape *first = mapShapesFFiResultStruct(shapes);
   return first;
}

flutter::Shape *processFrameWithRoi(ShapeDetector *scanner, flutter::ImageForDetect *image, int areaLeft,
                   int areaTop, int areaRight, int areaBottom) {
   auto areaWidth = areaRight - areaLeft;
   auto areaHeight = areaBottom - areaTop;
   auto img = flutter::prepareMat(image);
   if (areaLeft >= 0 && areaTop >= 0 && areaWidth > 0 && areaHeight > 0) {
       cv::Rect mrzRoi(areaLeft, areaTop, areaWidth, areaHeight);
       img = img(mrzRoi);
   }
   auto shapes = scanner->detectShapes(img);
   //we need to map result as a linked list of items to return multiple result
   flutter::Shape *first = mapShapesFFiResultStruct(shapes);
   return first;
}

Next, we need to prepare a cv::Mat instance for OpenCV. Because we have different image formats, we need a different logic for iOS and Android. Let’s look inside the method flutter::prepareMat in MatUtils.h:

  static cv::Mat prepareMat(flutter::ImageForDetect *image) {
       if (image->platform == 0) {
           auto *plane = image->plane;
           return flutter::prepareMatIos(plane->planeData,
                                         plane->bytesPerRow,
                                         image->width,
                                         image->height,
                                         image->orientation);
       }
       if (image->platform == 1) {
           auto *plane0 = image->plane;
           auto *plane1 = plane0->nextPlane;
           auto *plane2 = plane1->nextPlane;
           return flutter::prepareMatAndroid(plane0->planeData,
                                             plane0->bytesPerRow,
                                             plane1->planeData,
                                             plane1->length,
                                             plane1->bytesPerRow,
                                             plane2->planeData,
                                             plane2->length,
                                             plane2->bytesPerRow,
                                             image->width,
                                             image->height,
                                             image->orientation);
       }
       throw "Can't parse image data due to the unknown platform";
   }

The iOS image conversion is pretty straightforward because the image comes in one plane:

static cv::Mat prepareMatIos(uint8_t *plane,
                            int bytesPerRow,
                            int width,
                            int height,
                            int orientation) {
   uint8_t *yPixel = plane;

   cv::Mat mYUV = cv::Mat(height, width, CV_8UC4, yPixel, bytesPerRow);

   fixMatOrientation(orientation, mYUV);

   return mYUV;

}

The Android conversion is a bit more complex, as we need to merge three planes into one:

static cv::Mat prepareMatAndroid(
       uint8_t *plane0,
       int bytesPerRow0,
       uint8_t *plane1,
       int lenght1,
       int bytesPerRow1,
       uint8_t *plane2,
       int lenght2,
       int bytesPerRow2,
       int width,
       int height,
       int orientation) {

   uint8_t *yPixel = plane0;
   uint8_t *uPixel = plane1;
   uint8_t *vPixel = plane2;

   int32_t uLen = lenght1;
   int32_t vLen = lenght2;

   cv::Mat _yuv_rgb_img;
   assert(bytesPerRow0 == bytesPerRow1 && bytesPerRow1 == bytesPerRow2);
   uint8_t *uv = new uint8_t[uLen + vLen];
   memcpy(uv, uPixel, vLen);
   memcpy(uv + uLen, vPixel, vLen);
   cv::Mat mYUV = cv::Mat(height, width, CV_8UC1, yPixel, bytesPerRow0);
   cv::copyMakeBorder(mYUV, mYUV, 0, height >> 1, 0, 0, BORDER_CONSTANT, 0);

   cv::Mat mUV = cv::Mat((height >> 1), width, CV_8UC1, uv, bytesPerRow0);
   cv:Mat dst_roi = mYUV(Rect(0, height, width, height >> 1));
   mUV.copyTo(dst_roi);

   cv::cvtColor(mYUV, _yuv_rgb_img, COLOR_YUV2RGBA_NV21, 3);

   fixMatOrientation(orientation, _yuv_rgb_img);

   return _yuv_rgb_img;
}

We won’t go into the details of the detection algorithm itself in this article. 

Note that after the successful (or unsuccessful) detection, we need to convert our internal objects into structs that we can access with dart::ffi. Because of that, they need to be allocated with malloc and described with the extern C protocol (no C++ vector objects, all strings are represented as char[], etc.). 

You can check out the full implementation here.

Memory cleanup

Another challenge that we should mention in this article is memory management. Essentially, native memory allocations must be cleaned after usage. 
Let’s take a look at the method processFrameAsync in shape_detector.dart. There are two places where memory is cleaned up. The first is in _mapNativeItems, which maps native result structures into Dart, then releases the native object’s memory.

List<Shape> _mapNativeItems(ffi.Pointer<_ShapeNative> result) {
 final shapes = <Shape>[];
 var currentShapeNative = result;
 while (currentShapeNative != ffi.nullptr) {
   try {
     final item = currentShapeNative.ref;
     final points = <Point<double>>[];
     var currentPointNative = item.point;
     _mapNativePoints(currentPointNative, points);
     shapes.add(Shape(item.corners, points));
     final tempItem = currentShapeNative;
     currentShapeNative = item.next;
     malloc.free(tempItem); // need to deallocate pointer to the object
   } catch (e) {
     print(e);
   }
 }
 return shapes;
}

void _mapNativePoints(
   ffi.Pointer<_PointNative> currentPointNative, List<Point<double>> points) {
 while (currentPointNative != ffi.nullptr) {
   points.add(Point(currentPointNative.ref.x, currentPointNative.ref.y));
   final tempItem = currentPointNative;
   currentPointNative = currentPointNative.ref.next;
   malloc.free(tempItem); // need to deallocate pointer to the object
 }
}

The other one is the image.release()extension method that cleans all frame-related data. 

extension SdkImagePoinerExtention on Pointer<SdkImage> {
 void release() {
   var plane = ref.plane;
   while (plane != nullptr) {
     if (plane.ref.planeData != nullptr) {
       malloc.free(plane.ref.planeData);
     }
     final tmpPlane = plane;
     plane = plane.ref.nextPlane;
     malloc.free(tmpPlane);
   }
   malloc.free(this);
 }
}

Here we are releasing all internal byte arrays and other objects using their pointers to the native memory. 

Basically, we need to release all pointers to the native memory after we are done using them. This is very important, especially for the live detection process, otherwise well run out of memory. 

Threading

The next big challenge is the threading issue. Usually, to use the live detection on native platforms, we need to switch to a background thread to proceed with the frame detection. However, threading presents a few problems in Flutter. 

The first problem is that the asynchronous function is called in the same thread as the rendering of the UI (main thread). So using an async function is not an option, because it would freeze the UI thread. The official tutorials advise using isolates instead. 

The main issue with isolates is that they copy all objects into another thread memory heap. That causes a frame duplication at some point in time. So should we use isolates? The answer is yes! Isolates are the only proper way to do heavy processing in Flutter.

In earlier versions, Flutter isolates had memory leak issues with frame data. Frames simply were not cleaned after the compute method was finished. Make sure to use at least Flutter SDK version 1.20.0 and Dart SDK 2.14.0, which fixed this issue(see pubspec.yaml).

The overall API of isolates is quite complex, but Flutter proposes a method called compute, which handles the opening of the isolate, the data processing, and its closing. 

So let’s wrap our processFrameAsync with some threading:

Future<ProcessingResult> processFrame(
   CameraImage image, int rotation, Rect? roi) async {
 // make sure we have valid image data (flutter camera plugin might provide an empty image)
 if (!image.isEmpty() && scanner != ffi.nullptr) {
   return compute(processFrameAsync,
       _FrameData(scanner.address, image, rotation, roi: roi));
 } else {
   return ProcessingResult([]);
 }
}

/// We need to pass serializable data to the isolate to process the frame in another thread and unblock the UI thread
class _FrameData {
 CameraImage image;
 int rotation;
 int scanner;
 Rect? roi;

 _FrameData(this.scanner, this.image, this.rotation, {this.roi});
}

When running the compute method, we need to send only the serializable data inside it. That’s why we have created the _FrameDataclass for it. _FrameData represents a serializable object that contains all metadata of the image and a pointer to the scanner in native memory. 

Presenting results 

We have now covered most of what you need to implement native live detection on camera streams in Flutter. The last step is displaying the results. In the class FrameHandler, we have StreamController<T> detectionResultStreamController. We can subscribe to that stream and get the detection results. Also, if you are going to draw the results on top of the preview, make sure your widget is the same size as the preview.

@override
void initState() {
 notifier = ValueNotifier([]);
 startListenStream(_stream);
 super.initState();
}

void startListenStream(Stream<ProcessingResult> stream) async {
 await for (final result in stream) {
  //todo do something with the result.
 }
}

In this example, we are using a canvas to draw circles ShapesResultOverlay on top of the preview.

Summary

As you can see, it’s possible to implement a live detection feature with Flutter by using C++ code via dart:ffi, and even apply this to  different platforms. Native code can be built not only for iOS and Android, but also for Windows or macOS. If you want to implement Flutter applications that cover different platforms and elements of native live detection on camera frames, this article and our example project should be very useful.

Would you like to start? Try our Flutter Document or Flutter Barcode Scanner SDKs today.

Developers, ready to get started?

Adding our free trial to your app is easy. Download the Scanbot SDK now and discover the power of mobile data capture