Chapter9 Collections
9.1 The Java Collections Framework
Java最初只提供了很少但最有用的一系列数据结构:Vector, Stack, Hashtable, BitSet, Enumeration
(Interfaces)
Java1.2,设计者认为是时候推出full-fledged的数据结构,不像C++的STL那么复杂但又要想STL一样generic
9.1.1 Separating Collection Interfaces and Implementation
1. queue
FIFO:先进先出
1 | public interface Queue<E> |
两种实现:
- 循环数组
- 链表(LinkedList)
You don’t need to know which implementation is actually used. Therefore, it makes sense to use the concrete class only when you construct the collection object. Use the interface type to hold the collection reference.
Queue
staff = new LinkedList<>();
循环数组一般来说会稍微更有效,但循环数组是一个有界限的集合——有限的容量,所以当不确定队列的具体上限时,推荐使用链表
还有一些命名以Abstract开头的集合,是为了library implementers更好的实现自己的集合类
9.1.2 The Collection Interface
最基本的接口是Collection,有两个基本方法
1 | public interface Collection<E> |
9.1.3 Iterators
四个方法
1 | public interface Iterator<E> |
reach the end, next()
throws a NoSuchElementException
call hasNext()
before next()
1 | Collection<String> c = ... |
编译器会将”for each”循环用iterator翻译为一个普通的循环,所有实现了Iterable接口的都能使用”for each”循环
还可以调用forEachRemaining()
+ 匿名函数
1 | iterator.forEachRemaining(element -> do something with element) |
Instead, think of Java iterators as being between elements. When you call next, the iterator jumps over the next element, and it returns a reference to the element that it just passed
**The remove method of the Iterator interface removes the element that was returned by the last call to next. In many situations, that makes sense—you need to see the element before you can decide that it is the one that should be removed.
1 | // how to remove the first element |
9.1.4 Generic Utility Methods
Collection和Iterator都被称为generic, 意味着你能用该方法去操作任意一个collection
9.2 Interfaces in the Collections Framework
最基本的是两个接口: Collection和Map
List
List是有序的Collection, 两种方式访问元素: by an iterator / by an integer index(random access)
LinkedList不支持random access, 所以通过一个tagging interface: RandomAccess去判断是否是linked list
1 | if (c instanceof RandomAccess) { |
Set
add方法需要避免重复, equals方法忽略顺序, hashCode方法必须使得相同的两个sets生成相同的hash码
TreeSet: 保留顺序的set
9.3 Concrete Collections
9.3.1 Linked Lists
ArrayList在中间插入或删除一个元素成本太高
a linked list stores each object in a separate link. Each link also stores a reference to the next link in the sequence
实际上是双向链表
删除或添加操作成本大大降低, 只需要修改link
ListIterator接口有previous()
和hasPrevious()
方法可供倒着遍历
add方法在iterator位置的前面加入新元素
set方法用以替换目前iterator调用next()或previous()指向的元素
当一个迭代器在遍历, 另一个在修改时可能会出问题
Concurrent modification detection is done in a simple way. The collection keeps track of the number of mutating operations (such as adding and removing elements). Each iterator keeps a separate count of the number of mutating operations that it was responsible for. At the beginning of each iterator method, the iterator simply checks whether its own mutation count equals that of the collection. If not, it throws a ConcurrentModificationException
一些常用的方法:
- toString(): 将对象中的元素以一定的格式转为String
- contains(E element): 是否存在某个元素
- offerFirst(E element)
- offerLast(E element)
- E pollFirst(): 返回并删除链表的第一个元素
- E pollLast()
一般用于作为队列的底层数据结构
1 | Queue<T> queue = new LinkedList<>(); |
9.3.2 Array Lists
An ArrayList encapsulates a dynamically reallocated array of objects
为什么不用Vector
All methods of the Vector class are synchronized
9.3.3 Hash Sets
哈希表, 对每个对象计算一个哈希值, 放入相应的”桶”中, 所以自定义类时, 要实现相应的hashCode
方法
Java中哈希表的底层实现是链表数组, 每个链表称为一个桶
从Java8开始, 当一个链表中元素达到一个阈值(默认8)时将会转为红黑树
一般设置桶个数为75% - 150%的预期元素个数, 标准桶的个数应该是2的幂次, 默认是16
当桶装满时, 就需要重新计算哈希(rehash), 设置承载因子:
The load factor determines when a hash table is rehashed. For example, if the load factor is 0.75 (which is the default) and the table is more than 75% full, it is automatically rehashed with twice as many buckets.
9.3.4 Tree Sets
sorted collection
加入时任意顺序, 但是访问时是按照一定的排序规则排序好的
目前的底层实现是红黑树
使用tree set时, 对象一定要实现了Comparable接口, 或者为set提供一个Comparator
9.3.5 Queues and Deques
对于Queue, 添加删除以及查看队首都有两个方法, 区别在于出现问题时抛出异常或是返回false/null
add offer
队列满时add
抛出IllegalStateException, offer
返回false
remove poll
队列空时remove
抛出NoSuchElementException, poll
返回null
element peek
队列空时element
抛出NoSuchElementException, peek
返回null
9.3.6 Priority Queues
每次取队首时, 得到的是按照一定规则排序的第一个(比如降序排列, 队首就是最大值)
但是本身并不是完全顺序的, 底层实现是堆
同样需要对象实现Comparable或者提供Comparator
9.4 Maps
9.4.1 Basic Map Operations
HashMap和TreeMap, 与set类似, HashMap没有顺序, TreeMap有顺序
1 | HashMap<String, Integer> map = new HashMap<>(); |
键值必须唯一, 对相同键重复赋值会覆盖, 同时返回了前一个值
remove() 删除给定键的元素
size() 返回个数
forEach() 配合匿名函数做相应处理
1
scores.forEach((k, v) -> System.out.println("key=" + k + "value=" + v));
putAll() 将另一个map的entries全部加入到当前map
containsKey()
containsValue()
9.4.2 Updating Map Entries
1 | counts.put(word, counts.get(word) + 1); // 键不存在时get会返回null |
9.4.3 Map Views
keySet()
不是HashSet或TreeSet, 而是一个实现了Set接口的另一个类, 可以当做一般的Collection
values()
entrySet()
1 | for (Map.Entry<String, Employee> entry : staff.entrySet()) |
在keySet view中remove
相当于删除了map的一个项, 但是无法添加
9.4.4 Weak Hash Maps
存在垃圾回收问题, 当map中某个值对应的键不会再被使用, 也不会被回收
As long as the map object is live, all buckets in it are live and won’t be reclaimed. Thus, your program should take care to remove unused values from long-lived maps. Or, you can use a WeakHashMap instead. This data structure cooperates with the garbage collector to remove key/value pairs when the only reference to the key is the one from the hash table entry.
WeakHashMap内部使用weak references保存键
However, if the object is reachable only by a WeakReference, the garbage collector still reclaims the object, but places the weak reference that led to it into a queue. The operations of the WeakHashMap periodically check that queue for newly arrived weak references. The arrival of a weak reference in the queue signifies that the key was no longer used by anyone and has been collected. The WeakHashMap
then removes the associated entry.
对于一般不再被引用的对象, 垃圾收集器直接回收, 对于weak reference, 同样回收, 但也会放入一个队列中, 然后map去移除相应的项
9.4.5 Linked Hash Sets and Maps
记录了加入的顺序, 可用于实现LRU
1 | var lru = new LinkedHashMap<K, V>(128, 0.75F, true){ |
9.4.6 Enumeration Sets and Maps
9.4.7 Identity Hash Maps
键不是由hashCode
算出而是System.identityHashCode
相同键对象尽管内容相同也被认为是不同的
9.5 Views and Wrappers
9.5.1 Small Collections
static函数, 直接给定若干元素形成一个Collection
1 | List<String> names = List.of("n1", "n2"); |
Arrays.asList()
返回的list无法调整大小, 即只能使用set
而不能add
或remove
9.5.2 Subranges
subList()
左闭右开
1 | List<Employee> group = staff.subList(10, 20); // extract 10 to 19 |
subSet(E from, E to)
headSet(E to)
tailSet(E from)
subMap(K from, K to)
headMap(K to)
tailMap(K from)
9.5.6 A Note on Optional Operations
在API说明中, 很多接口中的方法被称为”optional operations”, 看上去与接口本身的定义是相悖的, 但是如果接口中每一个方法都要被实现的话, 接口的数目可能就会太多了
9.6 Algorithms
9.6.1 Why Generic Algorithms?
方便重用
比如求最大值, 对于几乎所有数据结构(数组, 列表, 链表)几乎都是遍历所有元素通过比较得出最大值, 那么可定义为泛型函数
1 | public static <T extends Comparable> T max(Collection<T> c) |
9.6.2 Sorting and Shuffling
sort
1 | var staff = new LinkedList<Employee>(); |
Java中链表的排序方法是: 先将所有元素放入一个数组, 数组排序后再重建回链表
modifiable, resizable
- modifiable: 支持
set
方法 - resizable: 支持
add
和remove
能排序的列表必须modifiable但不需要resizable
shuffle
1 | ArrayList<Card> cards = . . .; |
如果列表本身不支持RandomAccess, 那么shuffle
会先将所有元素丢入数组, shuffle后再放回去
9.6.3 Binary Search
1 | i = Collections.binarySearch(c, element); |
如果找到了, 返回的index为非负整数, 通过c.get(index)
可访问该元素
如果没找到, 返回的index是负数, 同时也是有意义的, -index-1
表示的是将该元素插入数组并保持有序的下标, 不是-i的原因是下标0会是模糊的
1 | if (i < 0) { |
二分查找要求RandomAccess切排序, 否则没有任何优势了
9.6.4 Simple Algorithms
1 | Collections.replaceAll(words, "C++", "Java") |
最后更新: 2020年06月04日 20:12
原始链接: http://roooooobin.github.io/2020/05/31/Core-Java-Chapter9/