Why does CA1819 prefer collections over arrays in properties when the caller can modify them

Question

Why is this code (both sets of code taken straight from CA1819: Properties should not return arrays):

public class Book
{
    private string[] _Pages;

    public Book(string[] pages)
    {
        _Pages = pages;
    }

    public string[] Pages
    {
        get { return _Pages; }
        set { _Pages = value; }
    }
}

Worse than this code:

public class Book
{
    private Collection<string> _Pages;

    public Book(string[] pages)
    {
        _Pages = new Collection<string>(pages);
    }

    public Collection<string> Pages
    {
        get { return _Pages; }
    }
}

I saw Jon Skeet's answer to a previous question which is similar but not exactly the same. I understand that the ReadOnlyList<T> wraps the array and returns references to it rather than a full copy.

I also read Eric Lippert's article on why arrays are considered somewhat harmful.

However, both Jon's answer and Eric's answer seem to be based on the fact that you don't want the caller to be able to change the values in the array.

And in this Microsoft article Choosing Between Properties and Methods, they say:

Use a method where the operation returns an array because to preserve the internal array, you would have to return a deep copy of the array, not a reference to the array used by the property.

Which seems to confirm that everyone wants to preserve the internal array.

In the CA1819 article, they demonstrate two problematic usages which are fixed by first using a method and then a ReadOnlyCollection<T> which is consistent with both Jon and Eric's writings. However, they then go ahead and say that in the case you want the caller to be able to modify the property you should fix the rule violation by using a Collection<T>.

I want to know why using a Collection<T> is better than using an array when you don't care about preserving the the internal array (and in fact, want it to be changed)? What am I missing?

In this answer by Jon he says:

if you're happy with callers mutating your data, then an array will work fine...

EDIT: So what is the catch? Why does Microsoft still flag using an array as bad (causes a rule violation) and recommend using a collection to fix it when you want to be able to change the data?

Personally, neither looks quite right to me. I would tend towards the first one, but with the `set` marked as **private**. Note that even with the set marked private, callers can still change individual elements in the array, because that means first you _get_ a reference to the array, and then set on the indexer property within the array's object itself, which is still public. The only thing they can't do is swap the entire array out from under you with another array or null. — Joel Coehoorn, Feb 16 '22 at 22:21
i agree totally with mr skeet , if you are happy with you caller mutating the internals of the object then go for it. (Thats not sarcasm BTW) — pm100, Feb 16 '22 at 22:23
As to the preference towards Collection... that also seems wrong. `IList` or `ICollection` **interfaces** make a little more sense, because then you can potentially refactor for completely different underlying types as needed without impacting the caller. — Joel Coehoorn, Feb 16 '22 at 22:25
Thanks for the feedback @JoelCoehoorn. You're right about prefering interfaces. What I should have said is "Why does Microsoft still flag the using an array as bad (causes a rule violation) and recommend using a collection to fix it when you want to be able to change the data?" I still don't understand what's better about it in this case. — dceuinton, Feb 17 '22 at 21:44

score 1 · Answer 1 · answered Jul 31 '22 at 23:04

For others looking into this rule, I posted this question on a few other forums including Microsoft support forum site and also finally raised an issue on the docs GitHub.

I received a response that helped to clarify the goal of the rule from @mavasani:

The core idea of this rule is to discourage exposing mutable or cloned data structures through properties. It is a good API design principle to keep properties lightweight in terms of execution performance for it's getter/setter AND return immutable data to justify it being a property of the object. However, as you have mentioned, there are obviously corner cases where you still want to expose mutable data through properties, especially for internal consumers such as in tests. For such cases, it is best to return the data type that best suits your need - it could be an array or a Collection<T> type, as you prefer. If you choose an array, you will need to add a source suppression for each such property, which can be avoided by using the Collection<T>, but honestly either should be fine given that you have already made an explicit design decision for the API to return mutable data.

So, to answer my own question, it seems like I'm not missing anything and what I'm asking about is an edge case and I should use source suppression or something like Collection<T> as @masavani suggested.

score 0 · Answer 2 · answered Aug 01 '22 at 00:08

Why shouldn't a property return an internal array?

The reason that returning an internal array is discouraged is that doing so allows the caller to violate the data integrity of the object. Consider this example:

public class Book
{
    private readonly Page[] _pages;

    public Book(Page[] pages)
    {
        if (pages == null || pages.Any(page => page == null) || pages.Distinct().Count() != pages.Length)
            throw new ArgumentException("A Book must contain distinct Page objects.", nameof(pages));
        _pages = pages;
    }

    public Page[] Pages => _pages;
}

public class Page
{
}

Let's say that, as a business rule, a Book cannot contain null or duplicate Page references. This rule is enforced by the Book constructor:

var page1 = new Page();
var page2 = new Page();
//var book = new Book(null, page1, page1); // ERROR
var book = new Book(page1, page2); // OK

But because the Pages property returns the internal Page array, a caller can bypass the validation logic and corrupt the Book, and there is no way for the Book class to prevent it:

book.Pages[0] = null;

If instead the Book class uses the Collection class to store its pages, then it can enforce the business rule:

public class Book
{
    private readonly PageCollection _pages;

    public Book(params Page[] pages)
    {
        _pages = new PageCollection(pages);
    }

    public PageCollection Pages => _pages;
}

public class PageCollection : Collection<Page>
{
    public PageCollection(Page[] pages)
    {
        if (pages == null || pages.Any(page => page == null) || pages.Distinct().Count() != pages.Length)
            throw new ArgumentException("A Book must contain distinct Page objects.", nameof(pages));
        foreach (Page page in pages)
            this.Add(page);
    }

    protected override void SetItem(int index, Page page)
    {
        if (this[index] != page)
        {
            if (page == null || this.Any(existingPage => existingPage == page))
                throw new ArgumentException("A Book must contain distinct Page objects.", nameof(page));
        }

        base.SetItem(index, page);
    }
}

Now the Book class can guarantee its own data integrity:

book.Pages[0] = null; // ERROR

Thanks for the answer @MichaelLiu. In my opinion, this answers a different question but it's a valuable answer nonetheless. My question was more for when it's given that you do want to "violate the data integrity of the object". In this situation, what is the advantage of using a collection over an array. Those examples I pulled from the docs were under the heading "Allow users to modify a property". — dceuinton, Aug 02 '22 at 03:32

Why does CA1819 prefer collections over arrays in properties when the caller can modify them

2 Answers2