List实现高效批量删除指定区域的元素(等效于removeRange)

KaelLi 2018年12月14日14:44:00
评论
9,5148

在上一篇RecyclerView多级树的实现里,在折叠的时候需要从List里remove掉一定数量的元素,这就促使我去想一个高效的批量从List种删除元素的途径。

我们都知道,Java中的List提供了2个addAll方法(包括其父接口Collection提供的),它们分别是:

addAll(Collection<? extends E> c);
addAll(int index, Collection<? extends E> c)

而removeAll方面则只提供了一个方法:

removeAll(Collection<?> c);

其中,最后的addAll方法让我产生了一个疑问:既然提供了在指定位置增加一个集合,那么为什么remove方法却没有提供一个在指定位置直接删除相应数量的方法呢?嘿,你别说,还真有这样一个方法呢。

查看一下ArrayList的源码,里面有这样的一个方法:

/**
 * Removes from this list all of the elements whose index is between
 * {@code fromIndex}, inclusive, and {@code toIndex}, exclusive.
 * Shifts any succeeding elements to the left (reduces their index).
 * This call shortens the list by {@code (toIndex - fromIndex)} elements.
 * (If {@code toIndex==fromIndex}, this operation has no effect.)
 *
 * @throws IndexOutOfBoundsException if {@code fromIndex} or
 *         {@code toIndex} is out of range
 *         ({@code fromIndex < 0 ||
 *          fromIndex >= size() ||
 *          toIndex > size() ||
 *          toIndex < fromIndex})
 */
protected void removeRange(int fromIndex, int toIndex) {
    // Android-changed: Throw an IOOBE if toIndex < fromIndex as documented.
    // All the other cases (negative indices, or indices greater than the size
    // will be thrown by System#arrayCopy.
    if (toIndex < fromIndex) {
        throw new IndexOutOfBoundsException("toIndex < fromIndex");
    }

    modCount++;
    int numMoved = size - toIndex;
    System.arraycopy(elementData, toIndex, elementData, fromIndex,
                     numMoved);

    // clear to let GC do its work
    int newSize = size - (toIndex-fromIndex);
    for (int i = newSize; i < size; i++) {
        elementData[i] = null;
    }
    size = newSize;
}

显然该方法可以满足需求了,但是它不是public的,它是protected的!!!这就意味着我们根本不能用这个方法,可是这个方法为什么不对外暴露出来呢?这个疑问在StackOverFlow上有人给出了不错的答案:

Why is Java's AbstractList's removeRange() method protected?

在这里我也把相应的答案拿过来了,因为答案出自于《Effective Java (2nd Edition)》:

There are three techniques for shortening overly long parameter lists. One is to break the method up into multiple methods, each of which requires only a subset of the parameters. If done carelessly, this can lead to too many methods, but it can also help reduce the method count by increasing orthogonality. For example, consider the java.util.List interface. It does not provide methods to find the first or last index of an element in a sublist, both of which would require three parameters. Instead it provides the subList method, which takes two parameters and returns a view of a sublist. This method can be combined with the indexOf or lastIndexOfmethods, each of which has a single parameter, to yield the desired functionality. Moreover, the subList method can be combined with any method that operates on a List instance to perform arbitrary computations on sublists. The resulting API has a very high power-to-weight ratio.

显然《Effective Java》的解释是为了让List接口更简洁,不去为一些功能的实现提供太多方法,所以removeRange方法由ArrayList自行实现且仅供内部使用,List本身不会对外暴露这么多的方法,而只会暴露一些短小简单而基础的方法,需要一些高级一点的功能就要自行实现。其他接口的设计也应该遵循同样的原则。

好啦,反正removeRange是不让我们用的,那我们该如何从List里直接删除一段区间呢?这个问题讲的标准一点就是:

要求从第m(我们这里设定第m个是从0开始计数的)个元素开始删除,一直删除到第n(n>m)呢?或者说从第m个元素开始删除,共删除10个元素,该怎么操作?

“聪明”的人马上会写出如下代码:

for (int i = m; i <= n; i ++) {
    list.remove(m);
}

OK,这样的代码确实能满足需求,但是它的效率高吗?你可能会想到,反正我的代码实现了需求啊。确实实现了,但是我们知道,无论是ArrayList还是LinkedList,频繁删除元素对性能都有着不小的损耗——ArrayList删除一个元素后,后续元素需要依次把位置提前导致性能损耗,而LinkedList虽然删除元素容易但是每次定位(即get(i))的损耗不小。

有没有更高效的办法呢?让我们继续在List的源码里寻找,好的,我们找到了这样一个方法:

List<E> subList(int fromIndex, int toIndex);

熟悉的人应该知道,这个方法是获取List的一个子List,即从List的第fromIndex位置开始,到toIndex为止(不包括toIndex这个位置的元素)的子集。然后我们再来看一下它的注释:

/**
 * Returns a view of the portion of this list between the specified
 * <tt>fromIndex</tt>, inclusive, and <tt>toIndex</tt>, exclusive.  (If
 * <tt>fromIndex</tt> and <tt>toIndex</tt> are equal, the returned list is
 * empty.)  The returned list is backed by this list, so non-structural
 * changes in the returned list are reflected in this list, and vice-versa.
 * The returned list supports all of the optional list operations supported
 * by this list.<p>
 *
 * This method eliminates the need for explicit range operations (of
 * the sort that commonly exist for arrays).  Any operation that expects
 * a list can be used as a range operation by passing a subList view
 * instead of a whole list.  For example, the following idiom
 * removes a range of elements from a list:
 * <pre>{@code
 *      list.subList(from, to).clear();
 * }</pre>
 * Similar idioms may be constructed for <tt>indexOf</tt> and
 * <tt>lastIndexOf</tt>, and all of the algorithms in the
 * <tt>Collections</tt> class can be applied to a subList.<p>
 *
 * The semantics of the list returned by this method become undefined if
 * the backing list (i.e., this list) is <i>structurally modified</i> in
 * any way other than via the returned list.  (Structural modifications are
 * those that change the size of this list, or otherwise perturb it in such
 * a fashion that iterations in progress may yield incorrect results.)
 */

在注释里有着非常重要的一句话:

The returned list is backed by this list, so non-structural changes in the returned list are reflected in this list, and vice-versa.

简单来说,对子List的任何操作,都会同步影响到原来的父List数据。而对父List的元素做了操作,同样会subList获取的子List里的元素产生影响。说白了,subList获取到的子List本质上就是原来父List一部分元素的映射,双方的子元素实际上是同一批对象。而在注释中也很明白的说到,我们可以用

list.subList(from, to).clear();

方法来批量从List里删除一个区域的元素,好啦,这不就满足我们的需求了?

最后,我们来用代码验证一下,看看这样的办法是不是效率更高:

List<String> arrayList1 = new ArrayList<>();
List<String> arrayList2 = new ArrayList<>();
List<String> linkedList1 = new LinkedList<>();
List<String> linkedList2 = new LinkedList<>();

for (int i = 0;  i < 1000; i ++) {
    arrayList1.add(i + "");
    arrayList2.add(i + "");
    linkedList1.add(i + "");
    linkedList2.add(i + "");
}

long arrayListTime1 = System.currentTimeMillis();
for (int i = 100; i < 500; i++) {
    arrayList1.remove(100);
}
Log.d("kaelli","arrayList Time1 : " + (System.currentTimeMillis() - arrayListTime1));

long arrayListTime2 = System.currentTimeMillis();
arrayList2.subList(100, 501).clear();
Log.d("kaelli", "arrayList Time2 : " + (System.currentTimeMillis() - arrayListTime2));

long linkedListTime1 = System.currentTimeMillis();
for (int i = 100; i < 500; i++) {
    linkedList1.remove(100);
}
Log.d("kaelli", "linkedList Time1 : " + (System.currentTimeMillis() - linkedListTime1));

long linkedListTime2 = System.currentTimeMillis();
linkedList2.subList(100, 501).clear();
Log.d("kaelli", "linkedList Time2 : " + (System.currentTimeMillis() - linkedListTime2));

2个ArrayList,2个LinkedList,全部插入1千条、1万条或10万条String数据,然后分别删除100-500、100-5000及100-50000条数据,对比循环remove与subList(fromIndex, toIndex).clear的时间:

List size 1000 10000 100000
ArrayList.remove(i) 2ms 43ms 3296ms
ArrayList.subList().clear() 0ms 1ms 4ms
LinkedList.remove(i) 4ms 14ms 16ms
LinkedList.subList().clear() 1ms 15ms 15ms

结果很明显,对于ArrayList来说,通过subList.clear()的办法去删除一块区域的元素,效率会远远高于不断的remove操作。而对于LinkedList来说,两种方法的效率没有明显差异。其中的原因也很简单,ArrayList每次删除一个元素,所有后面位置的元素都要重新排位置,即每个元素的下标都要往前移动1位,频繁操作就会十分考验性能了,而对于LinkedList来说,耗时的时间仅仅在于get(i)即定位这里,而删除元素后并不需要重组。

KaelLi
  • 本文由 发表于 2018年12月14日14:44:00
  • 转载请务必保留本文链接:https://www.kaelli.com/23.html
匿名

发表评论

匿名网友 填写信息

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: