Chapter9 Collections

9.1 The Java Collections Framework

Java最初只提供了很少但最有用的一系列数据结构：Vector, Stack, Hashtable, BitSet, Enumeration （Interfaces）

Java1.2，设计者认为是时候推出full-fledged的数据结构，不像C++的STL那么复杂但又要想STL一样generic

9.1.1 Separating Collection Interfaces and Implementation

1. queue

FIFO：先进先出

public interface Queue<E>
{
	void add(E element);   // add at the tail
	E remove();   // remove at the head
	int size();
}

两种实现：

循环数组
链表（LinkedList）

You don’t need to know which implementation is actually used. Therefore, it makes sense to use the concrete class only when you construct the collection object. Use the interface type to hold the collection reference.

Queue staff = new LinkedList<>();

循环数组一般来说会稍微更有效，但循环数组是一个有界限的集合——有限的容量，所以当不确定队列的具体上限时，推荐使用链表

还有一些命名以Abstract开头的集合，是为了library implementers更好的实现自己的集合类

9.1.2 The Collection Interface

最基本的接口是Collection，有两个基本方法

public interface Collection<E>
{
    boolean add(E element);   // return true if the collection actually changes
    Iterator<E> iterator();
}

9.1.3 Iterators

四个方法

public interface Iterator<E>
{
    E next();
    boolean hasNext();
    void remove();
    default void forEachRemaining(... action);
}

reach the end, next() throws a NoSuchElementException

call hasNext() before next()

Collection<String> c = ...
Iterator<String> iter = c.iterator();
while (iter.hasNext()) {
    String element = iter.next();
    // do something with element
}

// "for each" loop
for (String element : c) {
    // do something with element
}

编译器会将”for each”循环用iterator翻译为一个普通的循环，所有实现了Iterable接口的都能使用”for each”循环

还可以调用forEachRemaining() + 匿名函数

1	iterator.forEachRemaining(element -> do something with element)

Instead, think of Java iterators as being between elements. When you call next, the iterator jumps over the next element, and it returns a reference to the element that it just passed

**The remove method of the Iterator interface removes the element that was returned by the last call to next. In many situations, that makes sense—you need to see the element before you can decide that it is the one that should be removed.

// how to remove the first element
Iterator<String> iter = c.iterator();
iter.next();	// skip over the first element
iter.remove();	// now remove it

// 不能在不调用next()的情况下连续使用remove()
iter.remove();
iter.remove();	// ERROR

9.1.4 Generic Utility Methods

Collection和Iterator都被称为generic, 意味着你能用该方法去操作任意一个collection

9.2 Interfaces in the Collections Framework

最基本的是两个接口: Collection和Map

List

List是有序的Collection, 两种方式访问元素: by an iterator / by an integer index(random access)

LinkedList不支持random access, 所以通过一个tagging interface: RandomAccess去判断是否是linked list

if (c instanceof RandomAccess) {
    use random access algorithms
}
else {
    use sequential access algorithms
}

Set

add方法需要避免重复, equals方法忽略顺序, hashCode方法必须使得相同的两个sets生成相同的hash码

TreeSet: 保留顺序的set

9.3 Concrete Collections

9.3.1 Linked Lists

ArrayList在中间插入或删除一个元素成本太高

a linked list stores each object in a separate link. Each link also stores a reference to the next link in the sequence

实际上是双向链表

删除或添加操作成本大大降低, 只需要修改link

ListIterator接口有previous()和hasPrevious()方法可供倒着遍历

add方法在iterator位置的前面加入新元素

set方法用以替换目前iterator调用next()或previous()指向的元素

当一个迭代器在遍历, 另一个在修改时可能会出问题

Concurrent modification detection is done in a simple way. The collection keeps track of the number of mutating operations (such as adding and removing elements). Each iterator keeps a separate count of the number of mutating operations that it was responsible for. At the beginning of each iterator method, the iterator simply checks whether its own mutation count equals that of the collection. If not, it throws a ConcurrentModificationException

一些常用的方法:

toString(): 将对象中的元素以一定的格式转为String
contains(E element): 是否存在某个元素
offerFirst(E element)
offerLast(E element)
E pollFirst(): 返回并删除链表的第一个元素
E pollLast()

一般用于作为队列的底层数据结构

1	Queue<T> queue = new LinkedList<>();

9.3.2 Array Lists

An ArrayList encapsulates a dynamically reallocated array of objects

为什么不用Vector

All methods of the Vector class are synchronized

9.3.3 Hash Sets

哈希表, 对每个对象计算一个哈希值, 放入相应的”桶”中, 所以自定义类时, 要实现相应的hashCode方法

Java中哈希表的底层实现是链表数组, 每个链表称为一个桶

从Java8开始, 当一个链表中元素达到一个阈值(默认8)时将会转为红黑树

一般设置桶个数为75% - 150%的预期元素个数, 标准桶的个数应该是2的幂次, 默认是16

当桶装满时, 就需要重新计算哈希(rehash), 设置承载因子:

The load factor determines when a hash table is rehashed. For example, if the load factor is 0.75 (which is the default) and the table is more than 75% full, it is automatically rehashed with twice as many buckets.

9.3.4 Tree Sets

sorted collection

加入时任意顺序, 但是访问时是按照一定的排序规则排序好的

目前的底层实现是红黑树

使用tree set时, 对象一定要实现了Comparable接口, 或者为set提供一个Comparator

9.3.5 Queues and Deques

对于Queue, 添加删除以及查看队首都有两个方法, 区别在于出现问题时抛出异常或是返回false/null

add offer 队列满时add抛出IllegalStateException, offer返回false

remove poll 队列空时remove抛出NoSuchElementException, poll返回null

element peek 队列空时element抛出NoSuchElementException, peek返回null

9.3.6 Priority Queues

每次取队首时, 得到的是按照一定规则排序的第一个(比如降序排列, 队首就是最大值)

但是本身并不是完全顺序的, 底层实现是堆

同样需要对象实现Comparable或者提供Comparator

9.4 Maps

9.4.1 Basic Map Operations

HashMap和TreeMap, 与set类似, HashMap没有顺序, TreeMap有顺序

HashMap<String, Integer> map = new HashMap<>();
map.put("Robin", 22);
String name = "";
int age = map.get("Robin");
int age2 = map.getOrDefault(name, 0);

键值必须唯一, 对相同键重复赋值会覆盖, 同时返回了前一个值

remove() 删除给定键的元素
size() 返回个数

forEach() 配合匿名函数做相应处理

1	scores.forEach((k, v) -> System.out.println("key=" + k + "value=" + v));

putAll() 将另一个map的entries全部加入到当前map
containsKey()
containsValue()

9.4.2 Updating Map Entries

counts.put(word, counts.get(word) + 1);   // 键不存在时get会返回null

counts.put(word, counts.getOrDefault(word, 0) + 1);

counts.putIfAbsent(word, 0);
counts.put(word, counts.get(word) + 1);

counts.merge(word, 1, Integer::sum);

9.4.3 Map Views

keySet()

不是HashSet或TreeSet, 而是一个实现了Set接口的另一个类, 可以当做一般的Collection

values()

entrySet()

for (Map.Entry<String, Employee> entry : staff.entrySet())
{
    String k = entry.getKey();
    Employee v = entry.getValue();
    // do something with k, v
}

在keySet view中remove相当于删除了map的一个项, 但是无法添加

9.4.4 Weak Hash Maps

存在垃圾回收问题, 当map中某个值对应的键不会再被使用, 也不会被回收

As long as the map object is live, all buckets in it are live and won’t be reclaimed. Thus, your program should take care to remove unused values from long-lived maps. Or, you can use a WeakHashMap instead. This data structure cooperates with the garbage collector to remove key/value pairs when the only reference to the key is the one from the hash table entry.

WeakHashMap内部使用weak references保存键

However, if the object is reachable only by a WeakReference, the garbage collector still reclaims the object, but places the weak reference that led to it into a queue. The operations of the WeakHashMap periodically check that queue for newly arrived weak references. The arrival of a weak reference in the queue signifies that the key was no longer used by anyone and has been collected. The WeakHashMap
then removes the associated entry.

对于一般不再被引用的对象, 垃圾收集器直接回收, 对于weak reference, 同样回收, 但也会放入一个队列中, 然后map去移除相应的项

9.4.5 Linked Hash Sets and Maps

记录了加入的顺序, 可用于实现LRU

var lru = new LinkedHashMap<K, V>(128, 0.75F, true){
		protected boolean removeEldestEntry(Map.Entry<K, V> eldest)
		{
			return size() > 100;
		}
	};

9.4.6 Enumeration Sets and Maps

9.4.7 Identity Hash Maps

键不是由hashCode算出而是System.identityHashCode

相同键对象尽管内容相同也被认为是不同的

9.5 Views and Wrappers

9.5.1 Small Collections

static函数, 直接给定若干元素形成一个Collection

List<String> names = List.of("n1", "n2");
Set<Integer> numbers = Set.of(2, 3);
Map<String, Integer> scores = Map.of("n1", 2, "n2", 3);
Map<String, Integer> scores = ofEntries(
    entry("Peter", 2),
    entry("Paul", 3),
    entry("Mary", 5));
List<String> settings = Collections.nCopies(100, "DEFAULT");

Arrays.asList()返回的list无法调整大小, 即只能使用set而不能add或remove

9.5.2 Subranges

subList()左闭右开

1	List<Employee> group = staff.subList(10, 20); // extract 10 to 19

subSet(E from, E to) headSet(E to) tailSet(E from)

subMap(K from, K to) headMap(K to) tailMap(K from)

9.5.6 A Note on Optional Operations

在API说明中, 很多接口中的方法被称为”optional operations”, 看上去与接口本身的定义是相悖的, 但是如果接口中每一个方法都要被实现的话, 接口的数目可能就会太多了

9.6 Algorithms

9.6.1 Why Generic Algorithms?

方便重用

比如求最大值, 对于几乎所有数据结构(数组, 列表, 链表)几乎都是遍历所有元素通过比较得出最大值, 那么可定义为泛型函数

public static <T extends Comparable> T max(Collection<T> c)
{
	if (c.isEmpty()) throw new NoSuchElementException();
    Iterator<T> iter = c.iterator();
    T largest = iter.next();
    while (iter.hasNext())
    {
        T next = iter.next();
        if (largest.compareTo(next) < 0)
        	largest = next;
    }
    return largest;
}

9.6.2 Sorting and Shuffling

sort

var staff = new LinkedList<Employee>();
Collections.sort(staff);
staff.sort(Comparator.comparingDouble(Employee::getSalary));   // 通过雇员工资排序
staff.sort(Comparator.reverseOrder);   // 按b.compareTo(a)排序
staff.sort(Comparator.comparingDouble(Employee::getSalary).reversed());   //按工资逆序

Java中链表的排序方法是: 先将所有元素放入一个数组, 数组排序后再重建回链表

modifiable, resizable

modifiable: 支持set方法
resizable: 支持add和remove

能排序的列表必须modifiable但不需要resizable

shuffle

1 2	ArrayList<Card> cards = . . .; Collections.shuffle(cards);

如果列表本身不支持RandomAccess, 那么shuffle会先将所有元素丢入数组, shuffle后再放回去

9.6.3 Binary Search

1 2	i = Collections.binarySearch(c, element); i = Collections.binarySearch(c, element, comparator);

如果找到了, 返回的index为非负整数, 通过c.get(index)可访问该元素

如果没找到, 返回的index是负数, 同时也是有意义的, -index-1表示的是将该元素插入数组并保持有序的下标, 不是-i的原因是下标0会是模糊的

1
2
3

if (i < 0) {
    c.add(-i-1, element);
}

二分查找要求RandomAccess切排序, 否则没有任何优势了

9.6.4 Simple Algorithms

Collections.replaceAll(words, "C++", "Java")
words.removeIf(w -> w.length() <= 3)
words.replaceAll(String::toLowerCase)
Collections.fill(List<? super T> l, T value)
Collections.swap(List<?> l, int i, int j)
Collections.reverse(List<?> l)
Collections.frequency(list, "2")
Collections.rotate(List<?> l, int d)
Collections.copy(List<? super T> to, List<T> from)   // target list must be at least as long as the source list
static <T> min(Collection<T> elements, Comparator<? super T> c)
static <T> max(Collection<T> elements, Comparator<? super T> c)

最后更新： 2020年06月04日 20:12

原始链接： http://roooooobin.github.io/2020/05/31/Core-Java-Chapter9/

赏

Robin's Blog

Welcome~

Core Java Chapter9