0

I am running SonarQube 5.3 on Windows with a MSSQL backend.

When creating new Issues, SonarQube queries its ElasticSearch user index to get author login for the "git blame" info of the line presenting the issue.

The following happens in /server/sonar-server/src/main/java/org/sonar/server/computation/issue/IssueAssigner.java:

=> The "git blame" information returns the author of the affected line, in my example (anonymized):

steve smith@ca5553f7-9c36-c34d-916b-b330600317e9

=> This value is looked up in ScmAccountToUser, which lazily queries the ElasticSearch index "users". I added some debug output to print the ES query, which is:

{
  "size": 3,
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": {
            "term": {
              "active": true
            }
          },
          "should": [
            {
              "term": {
                "login": "steve smith@ca5553f7-9c36-c34d-916b-b330600317e9"
              }
            },
            {
              "term": {
                "email": "steve smith@ca5553f7-9c36-c34d-916b-b330600317e9"
              }
            },
            {
              "term": {
                "scmAccounts": "steve smith@ca5553f7-9c36-c34d-916b-b330600317e9"
              }
            }
          ]
        }
      }
    }
  }
}

This query returns 0 results.

In contrast, when I enumerate the whole index, I get a hit which generally should match this user:

{ -
  "took": 4,
  "timed_out": false,
  "_shards": { -
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": { -
    "total": 39,
    "max_score": 1,
    "hits": [ -
      { -
        // snip
      },
      // snip
      { -
        "_index": "users",
        "_type": "user",
        "_id": "steve.smith",
        "_score": 1,
        "_source": { -
          "createdAt": 1442988141642,
          "name": "Steve Smith",
          "active": true,
          "login": "steve.smith",
          "scmAccounts": [ -
            "
",
            "steve smith@ca5553f7-9c36-c34d-916b-b330600317e9
",
            "steve.smith@ca5553f7-9c36-c34d-916b-b330600317e9
"
          ],
          "email": "steve.smith@globodex.ch",
          "updatedAt": 1450088380632
        }
      },
      // snip
    ]
  }
}

This issue is currently preventing my SonarQube instance from auto-assigning a lot of issues. I am in the process of figuring out when/how this broke, as some auto-assigning has previously succeeded.

Is this an error in the query or in the data? Can I work around this issue somehow?

G. Ann - SonarSource Team
  • 22,346
  • 4
  • 40
  • 76
2v0mjdrl
  • 582
  • 4
  • 19
  • What is the mapping of the `scmAccounts` field? If it is not a `not_analyzed` string field, then that's the reason. – Val Apr 22 '16 at 10:55
  • The mapping does specify: // ... "scmAccounts": { - "index": "not_analyzed", "type": "string" }, //... which is different from the other fields, e.g. login: "login": { - "index": "not_analyzed", "type": "string", "fields": { - "ngrams": { - "search_analyzer": "search_ngrams", "index_analyzer": "index_ngrams", "type": "string" } } }, Is this a misconfiguration of the mapping then? I have fully restored the index already, so the issue keeps popping up. – 2v0mjdrl Apr 22 '16 at 10:57
  • 1
    the root cause seems to be the whitespace in the scm account. Do you confirm ? – Simon Brandhof Apr 26 '16 at 12:59
  • @SimonBrandhof-SonarSource The problem seems to be the newlines at the beginning and end of the scmAccounts. I have removed and re-added the "steve smith@..." scm account in the SonarQube GUI, and now the ES data no longer contains these newlines, and the query succeeds. I have copied in the users table from a previous instance of SonarQube running 5.2 - this might be a compatibility issue. I will try to re-add the SCM accounts manually for all users, and will report back with results. This also explains why 10% of assignments succeed - those assignees have had SCM accounts manually added. – 2v0mjdrl Apr 26 '16 at 13:38
  • @SimonBrandhof-SonarSource Removing the newlines has resolved this issue. I will add an answer. – 2v0mjdrl Apr 27 '16 at 06:15

1 Answers1

2

It turns out that the problem was due to the newlines in the "scmAccounts" field entries.

By manually re-adding the SCM accounts in the SonarQube UI, these fields were updated to

"scmAccounts": 
[ -
            "steve smith@ca5553f7-9c36-c34d-916b-b330600317e9",
            "steve.smith@ca5553f7-9c36-c34d-916b-b330600317e9"
],

, after which the query succeeded and issue assignment succeeded.

The newlines got into the fields in the first place because I manually restored the table "users" on the SQL server from a backup SQL INSERT script.

2v0mjdrl
  • 582
  • 4
  • 19
  • 1
    Good to know. Who introduced the newlines ? You or the SQLServer backup tool ? In the latter case SonarQube should cover this corner-case and sanitize the SCM accounts when indexing into Elasticsearch. – Simon Brandhof Apr 27 '16 at 08:09
  • @SimonBrandhof-SonarSource I performed a table-wise export of the DB to .sql INSERT scripts (using MSSQL Server 2012), which exported the newlines. There might be an issue because of the windows line endings - I can send you a (censored) file of the users.sql to reproduce the issue. Please get in contact. – 2v0mjdrl Apr 27 '16 at 14:59
  • Thanks but it's quite easy to reproduce. As the problem is the db backup, nothing needs to be fixed in SonarQube. – Simon Brandhof Apr 28 '16 at 07:00