2

The title of this doesn't quite make sense, so I'll do my best to explain.

I have a very large dataset (1000's of rows) in a single table. The data in this table relates to GPS tracking of vehicles. When the vehicle is stationary (Speed=0), the latitude and longitude can vary quite dramatically over a period of 12 hours.

My current SELECT query is this:

$query = "SELECT UUID, UNITID, Truncate(LONGITUDE,6) AS LONGITUDE, Truncate(LATITUDE,6) AS LATITUDE, SPEED, TRACKINGTIME FROM trackpoint_au WHERE SPEED > -1 Order By UnitID, TRACKINGTIME";

The query returns an XML page, via PHP. Built like so:

header("Content-type: text/xml");
// Start XML file, echo parent node
echo '<markers>';
// Iterate through the rows, printing XML nodes for each
while ($row = @mysql_fetch_assoc($result)){ 
// ADD TO XML DOCUMENT NODE
echo '<marker ';
echo 'unitid="' . $row['UNITID'] . '" ';
echo 'lat="' . $row['LATITUDE'] . '" ';
echo 'lng="' . $row['LONGITUDE'] . '" ';
echo 'spd="' . $row['SPEED'] . '" ';
echo 'time="' . $row['TRACKINGTIME'] . '" ';
echo '/>';
}
// End XML file
echo '</markers>';

The output looks like:

<marker unitid="7711010426" lat="-32.080402" lng="115.854890" spd="0" time="2011-11-30 06:15:00" />
<marker unitid="7711010426" lat="-32.080376" lng="115.854880" spd="0" time="2011-11-30 06:16:00" />
<marker unitid="7711010426" lat="-32.080364" lng="115.854880" spd="0" time="2011-11-30 06:17:00" />
<marker unitid="7711010426" lat="-32.080330" lng="115.854836" spd="0" time="2011-11-30 06:18:00" />
<marker unitid="7711010426" lat="-32.080326" lng="115.854860" spd="1.85" time="2011-11-30 06:20:00" />
<marker unitid="7711010426" lat="-32.080265" lng="115.854890" spd="0" time="2011-11-30 06:21:00" /> 
<marker unitid="7711010426" lat="-32.080276" lng="115.854920" spd="0" time="2011-11-30 06:22:00" /> 
<marker unitid="7711010426" lat="-32.080315" lng="115.854900" spd="0" time="2011-11-30 06:23:00" /> 
<marker unitid="7711010426" lat="-32.080296" lng="115.854866" spd="0" time="2011-11-30 06:24:00" />

My question is this: How can I use PHP OR MYSQL to return the average latitude/longitude of the rows with spd=0?

My resulting data should be like this:

<marker unitid="7711010426" lat="-32.080367" lng="115.8548715" spd="0" time="2011-11-30 06:18:00" />
<marker unitid="7711010426" lat="-32.080326" lng="115.854860" spd="1.85" time="2011-11-30 06:20:00" />
<marker unitid="7711010426" lat="-32.080288" lng="115.854894" spd="0" time="2011-11-30 06:24:00" />

Note, the 'average' row has the LAST timestamp of the rows that have been averaged out.

I have tried to use a 'Group By Speed'. However, this fails to do what I need as it groups ALL the records with an identical speed, not just the ones with a value of 0.

EDIT

Grouping by the UUID as suggested by macek does not help, as the UUID is unique for each row.

<marker time="2011-11-30 06:15:00" spd="0" lng="115.854890" lat="-32.080402" unitid="7711010426" uuid="c6d50454-aa5b-4069-8756-72c787923173"/>
<marker time="2011-11-30 06:16:00" spd="0" lng="115.854880" lat="-32.080376" unitid="7711010426" uuid="be6f9052-ab00-430a-8cec-6abf5051cad1"/>

ANSWER

After posting the question and reading some of the answers below, I managed to put this PHP code together. It loops through all the rows, checks the speed, if the speed is 0, check the next row (until speed<>0) and average out the lat/lng of those points.

for($i=0;$i<$num;$i++){
    mysql_data_seek($result,$i); 
    $row = mysql_fetch_assoc($result); 
    if ($row['SPEED']==0){
    //echo $i . ' spd: '.$row['SPEED'] . '<br />';
    $spd0 = true;
    $counter = 1;
    $lat = $row['LATITUDE'];
    $lng = $row['LONGITUDE'];
    $i++;
    while (($spd0==true) && ($i<$num)){
        //echo ' + ' . $i;
        mysql_data_seek($result,$i); 
        $row2 = mysql_fetch_assoc($result);
        if (($row2['UNITID']==$row['UNITID']) && ($row2['SPEED']==0)){
            $counter++;
            $lat = $lat + $row2['LATITUDE'];
            $lng = $lng + $row2['LONGITUDE'];
            //echo $i . ' spd: '.$row2['SPEED'] . '<br />';
            $i++;
        }
        else{
            $spd0=false;
            $i--;
        }
    }
    $lat = $lat/$counter;
    $lng = $lng/$counter;

    // ADD TO XML DOCUMENT NODE
    echo '<marker ';
    echo 'uuid ="' . $row['UUID'] . '" ';
    echo 'unitid="' . $row['UNITID'] . '" ';
    echo 'lat="' . $lat . '" ';
    echo 'lng="' . $lng . '" ';
    echo 'spd="' . $row['SPEED'] . '" ';
    echo 'time="' . $row['TRACKINGTIME'] . '" ';
    echo '/>';
}
else {
    //echo $i;
    // ADD TO XML DOCUMENT NODE
    echo '<marker ';
    echo 'uuid ="' . $row['UUID'] . '" ';
    echo 'unitid="' . $row['UNITID'] . '" ';
    echo 'lat="' . $row['LATITUDE'] . '" ';
    echo 'lng="' . $row['LONGITUDE'] . '" ';
    echo 'spd="' . $row['SPEED'] . '" ';
    echo 'time="' . $row['TRACKINGTIME'] . '" ';
    echo '/>';
}
} 

If someone has a more elegant way of checking the next rows, please post it, as always looking for ways to improve my code.

Thanks all!

4 Answers4

1
while ($row = @mysql_fetch_assoc($result)){ 
    if( $row['SPEED']!=0){

        echo 'list average';
        clear list;
        // ADD TO XML DOCUMENT NODE
        echo '<marker ';
        echo 'unitid="' . $row['UNITID'] . '" ';
        echo 'lat="' . $row['LATITUDE'] . '" ';
        echo 'lng="' . $row['LONGITUDE'] . '" ';
        echo 'spd="' . $row['SPEED'] . '" ';
        echo 'time="' . $row['TRACKINGTIME'] . '" ';
        echo '/>';
    } else {
        //put data to a list 
    }
}
stealthyninja
  • 10,343
  • 11
  • 51
  • 59
  • This is one idea. It needs iterating through the resulting rows. So you are effectively fetching all the records and doing the averaging in your code. This can be the last option. – Kangkan Dec 02 '11 at 07:56
1

in addition to the normal use of GROUP BY and AVG() you may be interested in Quassnoi's answer to my question here:

GROUP BY for continuous rows in SQL

He posted a very nice solution that also performs very well with many rows.

Think of the speed as a state, and you want to aggregate all continous rows within a time period that have the same speed.

Here is my attempt on rewriting your query using this method:

SELECT 
        UNITID,
        /* we aggregate multiple rows, maybe you want to know which ones..
           this one is optional */
        CAST(GROUP_CONCAT(UUID SEPARATOR ', ') AS CHAR) AS UUIDS, 
        /* is group field in the inner subquery, we can just use it 
           in our select without an aggregate function */
        SPEED, 
        /* very important to select the lowest timestamp - 
           this is the time when your unit has stopped moving ..
           first row with speed=0 */
        MIN(TRACKINGTIME) AS TRACKINGTIME, 
        /* we calc the average on latitude here */
        TRUNCATE(AVG(LATITUDE),6) AS LATITUDE, 
        /* same for longitude */
        TRUNCATE(AVG(LONGITUDE),6) AS LONGITUDE, 
        /* maybe you want to know how many rows with speed 0 
           are grouped together */
        COUNT(UUID) AS AGGREGATE_COUNT 

FROM    (
        SELECT
                /* this increases the counter variable @r each time
                   the state has changed.. when speed of the previous row
                   was also "0" and is "0" in the current row, 
                   the counter is not increased. -- this is a virtual field 
                   we will use for GROUPing.

                   @speed is used to remember the speed of the previous
                   row for comparison in @r to determine if the speed has changed
                */
                @r := @r + (@prev_unit != UNITID 
                              OR @prev_speed != 0 
                              OR SPEED != 0) AS gn,  
                @prev_speed := SPEED AS a_speed,
                @prev_unit := UNITID AS a_unit,
                tp.*
        FROM    (
                SELECT  @r := 0,
                        @prev_speed := 1,
                        @prev_unit := ''
                ) vars,
                trackpoint_au tp
        ORDER BY
                UNITID, TRACKINGTIME
        ) q
GROUP BY
        gn
ORDER BY
        UNITID

Test data:

CREATE TABLE `trackpoint_au` (
 `uuid` int(11) NOT NULL AUTO_INCREMENT,
 `latitude` decimal(10,0) NOT NULL,
 `longitude` decimal(10,0) NOT NULL,
 `speed` int(11) NOT NULL,
 `unitid` int(11) NOT NULL,
 `trackingtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`uuid`)
) ENGINE=MyISAM;

INSERT INTO trackpoint_au (unitid, speed, latitude, longitude, trackingtime) VALUES
(1, 0, 10, 10, NOW()),
(1, 0, 20, 20, NOW()),
(1, 1, 10, 10, NOW()),
(1, 0, 10, 10, NOW()),
(1, 0, 30, 30, NOW()),
(2, 0, 10, 10, NOW()),
(2, 0, 20, 20, NOW()),
(3, 1, 10, 10, NOW()),
(4, 0, 10, 10, NOW()),
(4, 0, 20, 20, NOW()),
(4, 1, 30, 30, NOW()),
(4, 0, 60, 60, NOW()),
(4, 0, 60, 60, NOW());

Result:

+--------+--------+-------+---------------------+-----------+-----------+-----------------+
| UNITID | UUIDS  | SPEED | TRACKINGTIME        | LATITUDE  | LONGITUDE | AGGREGATE_COUNT |
+--------+--------+-------+---------------------+-----------+-----------+-----------------+
|      1 | 2, 1   |     0 | 2011-12-05 09:34:13 | 15.000000 | 15.000000 |               2 |
|      1 | 3      |     1 | 2011-12-05 09:34:13 | 10.000000 | 10.000000 |               1 |
|      1 | 4, 5   |     0 | 2011-12-05 09:34:13 | 20.000000 | 20.000000 |               2 |
|      2 | 6, 7   |     0 | 2011-12-05 09:34:13 | 15.000000 | 15.000000 |               2 |
|      3 | 8      |     1 | 2011-12-05 09:34:13 | 10.000000 | 10.000000 |               1 |
|      4 | 9, 10  |     0 | 2011-12-05 09:34:13 | 15.000000 | 15.000000 |               2 |
|      4 | 11     |     1 | 2011-12-05 09:34:13 | 30.000000 | 30.000000 |               1 |
|      4 | 12, 13 |     0 | 2011-12-05 09:34:13 | 60.000000 | 60.000000 |               2 |
+--------+--------+-------+---------------------+-----------+-----------+-----------------+
Community
  • 1
  • 1
Kaii
  • 20,122
  • 3
  • 38
  • 60
  • just a remark: using UUIDs to identify the rows in the mysql table is a bad idea in terms of performance. use an integer with auto_increment instead. – Kaii Dec 02 '11 at 08:02
  • Interesting. I took a look at the original question, and modified your code here a little, but I can't get it to work consistantly. I get some groups with up to 34 rows, and others I get groups of 1/2 rows, all straight after each other... – Adrian van der Wal Dec 05 '11 at 01:06
  • find the fixed and tested code above. I forgot to include changing UNITIDs in the calculation - because of this the previous query grouped trackpoints of different units together. – Kaii Dec 05 '11 at 22:59
  • 1
    Should be Order By UNITID, TRACKINGTIME to make it just the way i needed. Thanks for the MySQL solution. Will change the accepted answer, as this one is more complete than the other. – Adrian van der Wal Dec 08 '11 at 01:37
  • when you look at the inner query you see the data is already ordered by UNITID, TRACKINGTIME in that order. you could also easily remove the GROUP statement from the outer query and still get the desired result with less computational overhead. – Kaii Dec 08 '11 at 08:45
0

This should get you on the right track

-- get average lat/lng for each unitid where speed is 0
select uuid, unitid, avg(lat), avg(lng)
from trackpoint_au
where speed=0
group by uuid, unitid

When using group by, any selected fields that do not have a composite function being used on them should be added to the group by statement

EDIT

added uuid to SELECT and GROUP BY

maček
  • 76,434
  • 37
  • 167
  • 198
  • This is flawed, as it will only return me a single row for all of the rows with a speed value of 0. Not for each block of rows with a speed of 0. – Adrian van der Wal Dec 02 '11 at 06:11
  • Adrian, I added `uuid` to provide for additional grouping. This query isn't perfect, but it demonstrates how to capture an average based on a grouping. You should be able to adapt it to meet your specific needs. – maček Dec 02 '11 at 06:15
  • I think I understand what you are trying to get at. In this situation, however, grouping by UUID does not help, as the UUID is the unique identifier for each row. See Edit in first post. – Adrian van der Wal Dec 02 '11 at 06:25
0

I shall rather propose something different. If the change in the Lat and Long are not significant when speed = 0, why are you trying to get the average? You might truncate the last digit or round it up by 5 or 10 for such values.

This will not however give you fewer rows. But you should be able to get consistent values for Lat and Long.

Kangkan
  • 15,267
  • 10
  • 70
  • 113
  • I did think of this possibility, sorry, should have included it in my first post. The problem with rounding the lat/long values is that the plotting of the rounded values will be incorrect (sometimes by several hundred meters). Also, when plotting (sometimes 100's of) points on the one location onto a Google Map, the speed of the page decreses dramatically as it has an icon for each point. – Adrian van der Wal Dec 02 '11 at 06:49