Using OCR, it's possible to read the text from an image. However, if the customer has scanned the image in a rotated orientation, the text might not be read correctly. Therefore, a function that returns the orientation of the text in the image—similar to what Tesseract provides—would be highly desirable.
Here's an example:
var ocr = new TesseractEngine(this.tesseractDataPath, this.tesseractLanguage, EngineMode.Default));
// OSD (Orientation and Script Detection) requires training data for orientation recognition.
using (var page = ocr.Process(image, PageSegMode.AutoOsd))
{
using (var pageIter = page.AnalyseLayout())
{
pageIter.Begin();
var pageProps = pageIter.GetProperties();
// Get page orientation (Page Up, Page Down, Page Left, Page Right)
var orientation = pageProps.Orientation;
Console.WriteLine($"Orientation: {orientation}");
Console.WriteLine($"DeskewAngle: {pageProps.DeskewAngle}");
// Rotate image based on DeskewAngle (in radians)
////pix.Rotate(pageProps.DeskewAngle);
return orientation;
}
}It would be great to see this feature available in GemBox soon!
Official response
Mario - GemBox
Hi,
This feature request has been implemented and is available in the latest versions of GemBox.Pdf.
Regards,
Mario