1

I'm trying to give an image as an imput and find the best match for that input from a small datbase of pictures (≈100 pics).

The problem itself is the following: for my homework, I get assigned a series of graphs (X-ray diffraction pattern graphs, in case anyone's interested) and I have to match those graphs with the corresponding chemical element or compound. So, the solution I thought of is giving the graph as an input and letting the code find the graph with the most similarities. Here's an example of a graph: X-ray diffraction pattern graph.

I only know the basics for python, and I have no idea what to look up or where to start. Can anybody help guide my reaserch?

  • Do the x-axis and y-axis always have the same range? Are the pictures always the same height/width/colour? – Mark Setchell Aug 26 '21 at 16:55
  • Do you need to literally compare pictures pixel by pixel or do you have the datapoints of the graph available? – iLuvLogix Aug 26 '21 at 16:56
  • @MarkSetchell beat me to the minute ;) – iLuvLogix Aug 26 '21 at 16:56
  • @MarkSetchell In the case of the input, yes. In the case of the database, I could try looking up a standarized database, but no guarantee I'll find it. Let's assume I do if it makes the question easier. – Nicolás Derbez Aug 26 '21 at 16:57
  • @iLuvLogix Both could work, given I find datapoints for the database. Let's assume I do for the sake of simplicity. – Nicolás Derbez Aug 26 '21 at 16:59
  • Matching up datapoint-structures with some offset-threshold should be fairly simple in case the ticks of the x & y-axis correspond, but processing images to retrieve said datapoints (blue line in your image) and then comparing them to other processed images in the database without knowing that their x&y axis range as well as their resolution match would result in heaps of more code and processing.. – iLuvLogix Aug 26 '21 at 17:04
  • I don't personally know anything about x-ray diffraction so you might consider saying what characterises a match and show some nearly matching and some definitely not matching examples. Is the position of the main peak enough to differentiate? Or could there be multiple peaks and their number or their spacing (or maybe relative heights) will be key? – Mark Setchell Aug 26 '21 at 17:05
  • in x-ray diffraction also the amount of certain chemicals/compounds make an impact on the amplitude, similar to quantitative analysis.. – iLuvLogix Aug 26 '21 at 17:07
  • This MIT paper may help you to get some insights: http://prism.mit.edu/xray/introduction%20to%20xrpd%20data%20analysis.pdf – iLuvLogix Aug 26 '21 at 17:14
  • @iLuvLogix Thanks! The math seems interesting, but translating into code will require some thought. Definetely helpful! – Nicolás Derbez Aug 26 '21 at 17:19
  • @iLuvLogix Fascinating reading - thank you. – Mark Setchell Aug 26 '21 at 17:20
  • Pleasure, thats the beauty about programing: problem -> brainstorming -> implementation -> solution ;) – iLuvLogix Aug 26 '21 at 17:20
  • @MarkSetchell X ray diffrction works this way: every chemical element or compound has a pattern consisting of multiple peaks, in the graph. So, when you get a graph from analyzing a sample in the lab, you should get a graph that resembles an already existing one. XRD machines are pretty acurate, but downscaling the quality of the image could result in slight variations. Analizing the 3-4 biggest peaks and throwing a few similarities should be enough. – Nicolás Derbez Aug 26 '21 at 17:24
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Aug 26 '21 at 17:59

1 Answers1

1

Rather than looking at shapes, heights, widths and numbers of peaks in images, I think you would do better to "digitize" the curves in your database so that you have 80 numbers or so (one for each angle) then you can do statistical analysis of a bunch of numbers (e.g. pandas or scipy) rather than morphological (or other) analysis of images.

So, I would make everything that isn't blue in your plots become white (say) and then find the min/max/mean y-coordinate of the blue pixels at every x-coordinate.

You can do this ahead of time for all the entries in your database, then when a new sample comes along, you just calculate its characteristics and match it to the pre-calculated curves.


So, in concrete terms I am suggesting:


#!/usr/bin/env python3

import numpy as np
import cv2

# Load the image as BGR
im = cv2.imread('A8aHv.png')

# Convert to HSV colourspace to find blues
hsv = cv2.cvtColor(im, cv2.COLOR_BGR2HSV) 

# Set low and high limit for the tones we want to identify
lo = np.uint8([116,100,100])
hi = np.uint8([124,255,255]) 

# Mask all blue pixels
mask = cv2.inRange(hsv,lo,hi)

# Save mask
cv2.imwrite('result.png',mask)

# Find y-value of first non-zero element in each column
firstNonZero = (mask!=0).argmax(axis=0)
print(firstNonZero)

enter image description here

And here are the y-values, which is what I wa suggesting storing in your database for a statistical analysis:

array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
       275, 276, 275, 277, 279, 276, 275, 279, 281, 284, 286, 290, 294,
       293, 293, 294, 295, 295, 294, 293, 296, 296, 298, 298, 298, 299,
       301, 299, 297, 300, 299, 298, 298, 295, 292, 280, 279, 281, 283,
       287, 297, 299, 301, 302, 303, 308, 312, 312, 308, 309, 312, 312,
       310, 311, 313, 313, 309, 308, 308, 309, 311, 313, 311, 310, 311,
       312, 313, 312, 310, 310, 312, 310, 309, 309, 310, 312, 310, 310,
       311, 309, 308, 307, 309, 309, 307, 308, 308, 310, 311, 307, 307,
       308, 307, 309, 310, 312, 309, 310, 312, 312, 311, 311, 310, 311,
       310, 308, 308, 311, 311, 308, 309, 309, 311, 309, 308, 307, 309,
       309, 306, 306, 305, 302, 301, 302, 304, 304, 307, 303, 303, 304,
       304, 301, 303, 303, 302, 301, 301, 299, 298, 290, 285, 281, 282,
       284, 290, 291, 292, 289, 287, 287, 266, 261, 260, 259, 279, 288,
       294, 291, 292, 293, 292, 292, 293, 293, 297, 294, 293, 294, 288,
       289, 292, 295, 293, 292, 290, 290, 288, 290, 288, 287, 288, 291,
       292, 291, 293, 294, 293, 295, 297, 297, 295, 295, 294, 288, 285,
       284, 285, 286, 287, 289, 292, 294, 293, 284, 278, 222,   0,  78,
        75,  93, 283, 284, 284, 285, 286, 288, 288, 290, 291, 298, 295,
       296, 297, 298, 298, 298, 301, 302, 300, 300, 300, 301, 302, 300,
       301, 303, 303, 303, 303, 303, 303, 303, 306, 306, 306, 307, 306,
       304, 306, 307, 310, 307, 307, 305, 308, 307, 304, 306, 307, 307,
       309, 308, 308, 309, 310, 307, 305, 304, 304, 303, 301, 301, 304,
       306, 308, 311, 311, 309, 308, 310, 311, 310, 309, 308, 306, 305,
       305, 307, 309, 305, 307, 303, 296, 293, 293, 296, 303, 305, 304,
       310, 311, 296, 295, 294, 305, 308, 308, 309, 308, 309, 310, 310,
       310, 312, 312, 313, 313, 316, 316, 316, 316, 316, 317, 318, 318,
       318, 319, 313, 311, 310, 309, 303, 308, 310, 313, 314, 313, 310,
       311, 312, 314, 315, 318, 314, 315, 316, 316, 313, 319, 320, 314,
       315, 316, 316, 316, 316, 316, 315, 313, 312, 314, 314, 313, 315,
       318, 319, 319, 319, 318, 318, 317, 315, 316, 321, 319, 318, 319,
       320, 317, 316, 317, 318, 321, 318, 314, 316, 317, 318, 318, 310,
       309, 309, 315, 319, 320, 321, 319, 320, 320, 321, 322, 322, 319,
       321, 322, 322, 322, 321, 324, 322, 322, 322, 322, 322, 321, 323,
       324, 325, 323, 322, 322, 322, 323, 322, 321, 320, 320, 319, 318,
       317, 315, 298, 296, 300, 311, 320, 320, 320, 321, 322, 323, 323,
       326, 324, 322, 325, 325, 324, 325, 325, 324, 324, 325, 322, 322,
       323, 323, 323, 323, 324, 324, 323, 324, 322, 321, 322, 322, 319,
       319, 320, 320, 318, 317, 319, 320, 320, 317, 316, 313, 312, 315,
       315, 314, 315, 316, 321, 322, 322, 322, 320, 320, 321, 321, 323,
       323, 324, 325, 321, 320, 321, 322, 322, 324, 323, 322, 323, 324,
       324, 322, 322, 323, 323, 322, 322, 323, 321, 320, 320, 323, 324,
       323, 322, 323, 320, 319, 301, 297, 302, 305, 317, 318, 318, 317,
       315, 319, 318, 319, 318, 319, 320, 320, 321, 322, 320, 320, 318,
       320, 320, 318, 317, 315, 316, 316, 316, 315, 318, 319, 319, 321,
       321, 321, 320, 318, 318, 318, 316, 316, 317, 317, 318, 319, 318,
       319, 320, 321, 321, 320, 320, 320, 320, 318, 318, 319, 319, 318,
       318, 317, 319, 320, 317, 317, 321, 322, 319, 318, 318, 317, 317,
       318, 311, 310, 309, 309, 307, 284, 288, 299, 302, 318, 319, 314,
       315, 322, 321, 321, 321, 322, 322, 322, 322, 324, 323, 322,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0])
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432