Automating Support with OCR & Semantic Search

Customer support teams often receive the same screenshot-based issues over and over.

“Why did my payment fail?”
“Is this transaction successful?”
“Why is my app showing this error?”

In our support flow, we noticed:

A huge portion of tickets were image-based
Many issues were repeated daily

This created:

⏳ Slow response time
🔁 Repetitive manual work
📉 Inefficient scaling in business operation

So I built a small service to automate that flow:

User sends a screenshot to Telegram Bot.
A FastAPI webhook receives it.
OCR extracts text from the image.
Semantic matching finds the closest known issue.
The bot replies with the right support message.
For transaction screenshots, it validates payment with the API.

“In this article, I will focus on steps 4 and 6 — the semantic matching and transaction validation.”

Semantic matching finds the closest known issue.

I started off by defining a small dataset of pattern–intent pairs.

pattern_intent_pairs = [
    {
        "pattern": "transaction receipt with reference and amount",
        "intent": "intent_transaction_image",
    },
    {
        "pattern": "pin input error session invalid",
        "intent": "intent_device_time_issue",
    },
    {
        "pattern": "session expired error code",
        "intent": "intent_device_time_issue",
    },
    {
        "pattern": "payment confirmation with enterprise reference",
        "intent": "intent_transaction_image",
    },
    {
        "pattern": "otp sent but requires manual input",
        "intent": "intent_otp_autofill",
    },
    {
        "pattern": "otp not automatically filling in app",
        "intent": "intent_otp_autofill",
    },
]

After that, to enable semantic understanding, I used a lightweight sentence embedding model.
Each predefined pattern is converted into a vector representation, allowing the system to compare meanings instead of relying on exact text matches.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
patterns = [q["pattern"] for q in pattern_intent_pairs]
embeddings = model.encode(patterns)

# How embeddings arr looks like when you print it:
# [
#   [0.123, -0.456, 0.231, ..., 0.789],  # pattern_1 
#   [-0.087, 0.512, -0.334, ..., 0.102], # pattern_2
#   [0.442, 0.019, -0.275, ..., -0.661], # pattern_3
#   [-0.298, 0.733, 0.145, ..., 0.058],  # pattern_4
# ]
# Each item in the array is the vector representation of a pattern

Next, the user’s screenshot is processed through OCR using pytesseract to extract readable text.

import pytesseract
from PIL import Image

def extract_text_from_image(image_path):
    return pytesseract.image_to_string(Image.open(image_path))

With the text, I convert it into a vector.

query_embedding = model.encode([ocr_result])[0]

To find the most relevant match, I compute the similarity between the user input and all predefined patterns using a dot product.
This produces a score for each pattern, where higher values indicate closer semantic similarity.

import numpy as np

# For normalized vectors, this is equivalent to cosine similarity
scores = np.dot(embeddings, query_embedding)

Why does this measure similarity?

The embedding model is trained so that:

Similar sentences → vectors point in similar directions
Different sentences → vectors point in different directions

So:

If two vectors point in the same direction → score closer to 1
If unrelated → score closer to 0
If opposite → score closer to -1

# Fancy code just to find the index of the largest value in the array.
best_idx = np.argmax(scores)
best_score = scores[best_idx]

# Threshold check for similarity
if best_score < 0.7:
    return "Unknown Issue"

# getting the intent corresponding to the pattern
potential_intent = pattern_intent_pairs[best_idx]["intent"]

Arriving at this point, anything further becomes straightforward. You can map the potential_intent to a predefined response or trigger an API call based on your logic. For example, if potential_intent == "txn_screenshot", the flow hands off the user input to a dedicated transaction handler, which extracts the necessary details (hash, amount, currency) and validates the transaction using api before returning a user-friendly result.

Additional Tips:

You should have at least one normalize common patterns to improve consistency and matching accuracy.

For example:

pattern_intent_pairs = [
    {
        "pattern": "USD Payment to <RECIPIENT_NAME> <SENDER_ACCOUNT> <RECEIVER_ACCOUNT> <RECEIVER_ALIAS> <RECIPIENT_NAME> <AMOUNT> KHR <DATE_TIME> <IROHA_ID> Bank <RECEIVER_ALIAS> Hash <SHORT_HASH> Screenshot",
        "intent": "TXN_SCREENSHOT",
    },
]

This approach applies normalization through placeholder masking, transforming dynamic values into a consistent template. While this doesn’t directly change the model, it reduces input variability and helps with generalization. If needed, you can also include realistic values in the pattern_intent_pairs to better reflect real-world data.

Feedback on this article

Feedback is always welcome! Feel free to share your thoughts, improvements, or additional insights in the comments.

How I leverage AI to reduce customer support inefficiency

Semantic matching finds the closest known issue.

Additional Tips:

Feedback on this article

Comments

More from this blog

Why My Cookies Aren’t Sent

Command Palette

Semantic matching finds the closest known issue.

Additional Tips:

Feedback on this article

Comments

More from this blog