3

I have several thousand bar graphs to convert that follow the following format:

Example Bar Graph

I need to convert these to actual data + date. My plan has been to use something, perhaps ImageMagick, to extract the date, pass it through OCR and then slice up the bars in some fashion so as to get the value. The Y axis is in 4 hour increments (so each tick or graph represents 4 hours of a day). These red bars below do change colors at certain thresholds, so it is more of a white vs nonwhite for detection of the bars.

Example output desired:

1996-11-27 000000 UTC, 3.0
1996-11-27 040000 UTC, 3.0
1996-11-27 080000 UTC, 2.0
1996-11-27 0120000 UTC, 2.0
1996-11-27 0160000 UTC, 1.0

What might a solution for extracting these bars and assigning values based on height?

ylluminate
  • 12,102
  • 17
  • 78
  • 152
  • can you put what kind of output value your looking for? i read this three times and im not sure what kind of output your looking for. ie do you want 11/27 12am:3 , 11/27 4am:3, 11/27 8am:2 ? also this graph do you have the data in like an array. or is it an html5 graph? ie do i have a key value pair for each column or an array and a start date? ie whats the input and whats the desired output – Frank Visaggio Nov 12 '13 at 19:15
  • How similar are the graphs? Do they all cover the same date range and 0-9? Simplifies things if they do. Seems like it'll be harder to get the numbers correct than the extract the data from the bar graphs, which might be easier and faster to just take the time to make up a table with the numbers (max value, min value, scale (linear/log)) for each graph by hand, an algorithm then can handle extracting the data form each. @BobSinclar: It seems like the issue is that ylluminate only has the images of the bar graphs, and wants to extract the bar graph data from the images. – Nuclearman Nov 12 '13 at 19:20
  • @BobSinclar added some example output. Just looking for a concise expression of the date-time + value for each bar. – ylluminate Nov 12 '13 at 19:28
  • @Nuclearman yes, the graphs are essentially all identical (there are a few thousand that follow this format, then another several thousand that are slightly modified, but they are very similar and conducive to just modifying slightly after the first set is working). It APPEARS (from a visual sampling) that the data values are recorded at WHOLE numbers right now. I thought that there would be some additional accuracy involved, but I don't believe so now. If there is, I will eventually find it and deal with it, but for now let's assume whole numbers. – ylluminate Nov 12 '13 at 19:30
  • and what format is the bar graph? or do you have a data representation for the bargraph – Frank Visaggio Nov 12 '13 at 19:40
  • @BobSinclar not sure I'm following you on the format of the bar graph. I am taking the data from the graph and turning it into numerical data output. The input format of the graphs themselves are simply images in folders that I will process. – ylluminate Nov 12 '13 at 19:43
  • gotcha exactly like the one you posted. was hoping that it might have been constructed using some api or something. – Frank Visaggio Nov 12 '13 at 19:57

2 Answers2

0

if its only integers, that makes reading out a bit easier.
load the image
slice it up vertically so that you get 1 bar per slice
read out a pixel in the middle width, middle height between 0 and 1, (point 0.5,0.5), and if pixel is non-white, counter += 1,
repeat for point (0.5,1,5) by adding certain amount of vertical pixels,
then read point (0.5,2.5), etc until you hit a white pixel.
value1 = counter.

then load the next slice and repeat.

derek
  • 1
  • 1
    I agree with the vertical slicing, but wouldn't be more flexible to use a binary search horizontally rather than incremental? This would also probably avoid needing to assume integers, and would instead be based on precision (either pre-defined or based on number of pixels). However, it does still require going through the graphs and recording the min/max values and the scale (if any of those changes). – Nuclearman Nov 13 '13 at 18:27
-2

Some how evaluate the height of the pixels. A script to do so? That's where you have to get creative.

ilarsona
  • 436
  • 4
  • 13
  • Well that was the first impression I had as well, but I started thinking that I might need to expand my horizons to other more well established options vs grinding out an entire algorithm from scratch for processing images and assigning data. – ylluminate Nov 12 '13 at 18:51