-1

I'm trying to make a database of students using Google Sheets. It contains info about students, groups and orders; orders can change students membership in groups (taken in a group, moved up to a new group, graduated, on leave, sent down). Here are sample database sheets and here is a detailed description of my DB structure (the sheet report_Groups is slightly changed, its previous variant, described on the link, is now named old_report_Groups).

I need a query that would select a list of present members of given group on the given date. That means that for each student I have to select the name, the latter status before given date and corresponding group. And from this result select student names, where statuses are "Taken in" or "Moved Up" and group is the same as given one.

The problem is to select the latter status. It should be MAX(status), whose "since" date ≤ given date, but there's a well-known problem of selecting more than one field together with aggregate function. Here is a question which is very close to, but query from its "best" answer gives me error "QUERY:NO_COLUMN". I've even copied the sheet Raw from there and tried to perform proposed query (with the onliest modification — replacing commas with semicolons according to my locale restrictions) on the data it was reported to work on — same error (check Raw and report_Raw sheets in my DB). Other variant (via MMULT and TRANSPOSE) works, but it's perfomance is very poor.

What can you suggest me? Thanks in advance.

Update: I've found the solution with an issue (described in my answer). To solve the issue I need to know an answer for a different question.

Community
  • 1
  • 1
Vercetti
  • 437
  • 1
  • 6
  • 17
  • 1
    Hi Vercetti, Let's take this one step at the time. Maybe start by tellling us what data you want to query (what sheet/tab) ? Please be specific. What is the expected output ? Maybe also show that in your spreadsheet and explain the logics behind it. e.g: if you want to filter the data on sheet 'Raw' to only show the latest entry per person, you can use =ArrayFormula(iferror(vlookup(unique(Raw!E2:E); sort({Raw!E2:E\Raw!A2:J}; 2; 0); {2\3\4\5\6\7\8\9\10\11}; 0))) – JPV Sep 04 '15 at 12:04
  • Hello, JPV! I have to query _Orders_Students_ and to display results on _report_Groups_. I'd like to set **date** and **group** (as input data for the query) and to get list of students being members of the **group** on the **date**. Current content of _report_Groups_ illustrates the expected results. E. g. on 2012-12-01 Group2 consists of Scott, Westfield, Grimm and Johnson. On 2013-02-06 Fitzgerald was taken in and on 2013-09-01 Westfield was sent down. So on 2014-02-01 Scott, Grimm, Johnson and Fitzgerald are members of Group2 (everything according _Orders_Students_ info). – Vercetti Sep 04 '15 at 14:48
  • _Raw_ does not concern to my problem. This sheet contains data and _report_Raw_ query that is very similar to what I need. BTW after replacing semicolons inside curly brackets by backslashes, like in your formula, that query started to work. So it was some locale-specific issue. – Vercetti Sep 04 '15 at 14:51
  • For group 1, try: =transpose(unique(filter(Orders_Students!B2:B16; Orders_Students!D2:D16=B2; Orders_Students!C2:C16="Taken in"; Orders_Students!E2:E16<=A2)))... – JPV Sep 04 '15 at 17:28
  • Well, this is not what I actually mean. It displays original membership of Group1, but if we take a date after Smith became "on leave" (2012-12-12) and his membership was suspended, e. g. 2013-01-01, the formula will still display Smith in the list of members. Same trouble is with other groups. Please read the second paragraph of my original post. There's a description of what is actually required from the query. – Vercetti Sep 04 '15 at 17:54
  • Let me write desired query in _informal_ SQL-like syntax. There should be one nested query. SELECT name FROM (SELECT name, status, group FROM Orders_Students WHERE since = MAX(since) <= "2013-01-01") WHERE (status = "Taken in" OR status = "Moved Up") AND (group = "Group1") Should return "Danvers" and "Barlow", but not "Smith" (who is "On leave") – Vercetti Sep 04 '15 at 18:06

1 Answers1

0

Here's the solution (with an issue described below).

A. Orders_Students is filtered for selecting rows, having "since" cell value ≤ given date (report_Groups!A2):

=QUERY(Orders_Students!B:E;"select E, B, C, D where E <= date '" & TEXT(report_Groups!A2;"yyyy-MM-dd") & "'";1)

This interim result is stored at the inner_report_Groups tab (it will be referenced few times in the next query).

B. inner_report_Groups is filtered for selecting MAX("since") values and corresponding row cell values for each student:

ARRAYFORMULA(VLOOKUP(QUERY({ROW(inner_report_Groups!A$2:A)\SORT(inner_report_Groups!A$2:D)};"select max(Col1) group by Col3 label max(Col1)''";0);{ROW(inner_report_Groups!A$2:A)\SORT(inner_report_Groups!A$2:D)};{3\4\5};0)

The formula above is used as inner query in report_Groups!D2 (also in D3, D4—with appropriate indeces).

C. The second query result is filtered to get students whose status is either "Taken in" or "Moved Up" and corresponding group is equal to the given group (report_Groups!B2 (also in B3, B4—with appropriate indeces)):

=TRANSPOSE(IFERROR(QUERY(<here is the formula from step B>);"select Col1 where Col3 = '" & B2 & "' and (Col2='Taken in' or Col2='Moved Up')";0)))

The formula above is used as outer query in report_Groups!D2 (also in D3, D4—with appropriate indeces). IFERROR is intended to display nothing if query result is #N/A.

That query displays the needed results as you can see in report_Groups tab. But as the query on step B searches the whole columns of inner_report_Groups, there's only a single given date can be analysed (or the query interim results for other given dates should be placed in different columns of inner_report_Groups or at the different tab. Is there any way to give an alias for an interim result to refer it in a single cell formula instead of keeping it on different tab?

Vercetti
  • 437
  • 1
  • 6
  • 17