文件系统的IO内存IOjava普通io和buffer IOjava nio 包的 ByteBufferapi使用DirectByteBuffer使用堆外内存的原因RandomAccessFile 随机读写mmap内存映射网络IO
文件系统的IO
文件系统的IO内存IO
java普通io和buffer IO
普通IO
test目录下执行脚本
./mysh 0
(0 代表走 最基本的file写的逻辑) ,同时开启另外一个shell窗口监控ll -h
生成的out.txt的文件大小增加速度,如下肉眼可见的缓慢速度(KB级别)打开strace追踪生成的文件,找到文件最大的为主线程代码
-rw-r--r-- 1 root root 4.1K Jun 27 12:12 OSFileIO.class -rw-r--r-- 1 root root 4.4K Jun 27 11:37 OSFileIO.java -rwxr-xr-x 1 root root 123 Jun 27 11:11 mysh* -rw-r--r-- 1 root root 14K Jun 27 12:12 out.7754 -rw-r--r-- 1 root root 4.4M Jun 27 12:15 out.7755 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7756 -rw-r--r-- 1 root root 1.3K Jun 27 12:12 out.7757 -rw-r--r-- 1 root root 1.1K Jun 27 12:12 out.7758 -rw-r--r-- 1 root root 1.4K Jun 27 12:12 out.7759 -rw-r--r-- 1 root root 506K Jun 27 12:15 out.7760 -rw-r--r-- 1 root root 41K Jun 27 12:15 out.7761 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7762 -rw-r--r-- 1 root root 1.4K Jun 27 12:12 out.7763 -rw-r--r-- 1 root root 1.3K Jun 27 12:12 out.7764 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7765 -rw-r--r-- 1 root root 41K Jun 27 12:15 out.7766 -rw-r--r-- 1 root root 12K Jun 27 12:15 out.7767 -rw-r--r-- 1 root root 13K Jun 27 12:15 out.7768 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7769 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7770 -rw-r--r-- 1 root root 794K Jun 27 12:15 out.7771 -rw-r--r-- 1 root root 1.9K Jun 27 12:15 out.7772 -rw-r--r-- 1 root root 183K Jun 27 12:15 out.txt #主线程追踪文件最大,这里是 out.7755
vim out.7755
set nu 显示行号,发现每一次system call 会写入10个字节的数据1307 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1308 write(4, "123456789\n", 10) = 10 1309 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=691940400}, F UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) 1310 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1311 write(4, "123456789\n", 10) = 10 1312 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=702383900}, F UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) 1313 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1314 write(4, "123456789\n", 10) = 10 1315 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=712889000}, F UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) 1316 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1317 write(4, "123456789\n", 10) = 10 1318 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=723286200}, F UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) 1319 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1320 write(4, "123456789\n", 10) = 10
bufferIO
test目录下执行脚本
./mysh
1 (0 代表走 bufferIO的逻辑) ,同时开启另外一个shell窗口监控ll -h
生成的out.txt的文件大小速度明显变大(MB级别),发现系统调用一次写多8190个字节strace结果
7420 futex(0x7f4d80023928, FUTEX_WAKE_PRIVATE, 1) = 0 7421 write(4, "123456789\n123456789\n123456789\n12"..., 8190) = 8190
总结
buffer的io将写入的内容存入数组,达到一定容量后再将这批数据,通过一次system call write 写入,而普通io是每写入一次都进行一次system call,system call需要进行用户态到内核态的切换,非常耗时,导致两者读写速度差几个数量级
java nio 包的 ByteBuffer
api使用
主要成员字段
//指针标记 private int mark = -1; //指针的当前位置 private int position = 0; //翻转后界限 private int limit; //最大容量 private int capacity; //当为堆外内存的时候,内存的地址 long address;
主要成员方法
//返回当前缓冲区的最大容量 public final int capacity() {return capacity;} //返回当前的指针位置 public final int position() {return position;} //返回当前的读写界限 public final int limit() {return limit;} //标记当前指针位置 public final Buffer mark() { mark = position; return this; } //恢复当前指针位置 public final Buffer reset() { int m = mark; if (m < 0) throw new InvalidMarkException(); position = m; return this; } //清空缓冲区,注意这里并不会清空数据,只是将各项指标初始化,后续再写入数据就直接覆盖 public final Buffer clear() { position = 0; limit = capacity; mark = -1; return this; } //切换读写模式 public final Buffer flip() { limit = position; position = 0; mark = -1; return this; } //重新从头进行读写,初始化指针和标记位置 public final Buffer rewind() { position = 0; mark = -1; return this; } //剩余可读可写的数量 public final int remaining() {return limit - position;} //当前是否可读/可写 public final boolean hasRemaining() {return position < limit;} //是不是只读的 public abstract boolean isReadOnly(); //是不是支持数组访问 public abstract boolean hasArray(); //获取当前缓存的字节数组(当hasArray返回为true的时候) public abstract Object array(); //是不是堆外缓冲区也就是直接缓冲区 public abstract boolean isDirect(); //取消缓冲区 final void discardMark() {mark = -1;} //压缩缓存的字节数组,并将position指向压缩后数组最后元素的下一位 public abstract ByteBuffer compact();
测试案例
@Test public void whatByteBuffer(){ // ByteBuffer buffer = ByteBuffer.allocate(1024); 堆内内存 ByteBuffer buffer = ByteBuffer.allocateDirect(1024);//堆外内存,由Unsafe类和VM类调用JNI实现 System.out.println("postition: " + buffer.position()); System.out.println("limit: " + buffer.limit()); System.out.println("capacity: " + buffer.capacity()); System.out.println("mark: " + buffer); buffer.put("123".getBytes());//实际存放的是"1","2","3"对应的ASCII值 System.out.println("-------------put:123......"); System.out.println("mark: " + buffer); buffer.flip(); //读写交替 System.out.println("-------------flip......"); System.out.println("mark: " + buffer); buffer.get(); System.out.println("-------------get......"); System.out.println("mark: " + buffer); buffer.compact(); System.out.println("-------------compact......"); System.out.println("mark: " + buffer); buffer.clear(); System.out.println("-------------clear......"); System.out.println("mark: " + buffer); } //postition: 0 limit: 1024 capacity: 1024 mark: java.nio.DirectByteBuffer[pos=0 lim=1024 cap=1024] -------------put:123...... mark: java.nio.DirectByteBuffer[pos=3 lim=1024 cap=1024] -------------flip...... mark: java.nio.DirectByteBuffer[pos=0 lim=3 cap=1024] -------------get...... mark: java.nio.DirectByteBuffer[pos=1 lim=3 cap=1024] -------------compact...... mark: java.nio.DirectByteBuffer[pos=2 lim=1024 cap=1024] -------------clear...... mark: java.nio.DirectByteBuffer[pos=0 lim=1024 cap=1024]
ps put "123" 其实转成了对应的ASCII码存储
案例流程演示
DirectByteBuffer
ByteBuffer buffer = ByteBuffer.allocateDirect(1024) // public static ByteBuffer allocateDirect(int capacity) { return new DirectByteBuffer(capacity); }
主要通过unsafe类分配堆外内存
堆外内存存在于JVM管控之外的内存区域,Java中对堆外内存的操作,依赖于Unsafe提供的操作堆外内存的native方法。
使用堆外内存的原因
- 对垃圾回收停顿的改善。由于堆外内存是直接受操作系统管理而不是JVM,所以当我们使用堆外内存时,即可保持较小的堆内内存规模。从而在GC时减少回收停顿对于应用的影响。
- 提升程序I/O操作的性能。通常在I/O通信过程中,会存在堆内内存到堆外内存的数据拷贝操作,对于需要频繁进行内存间数据拷贝且生命周期较短的暂存数据,都建议存储到堆外内存。
// Primary constructor // DirectByteBuffer(int cap) { // package-private super(-1, 0, cap, cap); boolean pa = VM.isDirectMemoryPageAligned(); int ps = Bits.pageSize(); long size = Math.max(1L, (long)cap + (pa ? ps : 0)); Bits.reserveMemory(size, cap); long base = 0; try { base = unsafe.allocateMemory(size); } catch (OutOfMemoryError x) { Bits.unreserveMemory(size, cap); throw x; } unsafe.setMemory(base, size, (byte) 0); if (pa && (base % ps != 0)) { // Round up to page boundary address = base + ps - (base & (ps - 1)); } else { address = base; } cleaner = Cleaner.create(this, new Deallocator(base, size, cap)); att = null; }
Cleaner继承自Java四大引用类型之一的虚引用
PhantomReference
(众所周知,无法通过虚引用获取与之关联的对象实例,且当对象仅被虚引用引用时,在任何发生GC的时候,其均可被回收),通常PhantomReference
与引用队列ReferenceQueue
结合使用,可以实现虚引用关联对象被垃圾回收时能够进行系统通知、资源清理等功能。如下图所示,当某个被Cleaner引用的对象将被回收时,JVM垃圾收集器会将此对象的引用放入到对象引用中的pending链表中,等待Reference-Handler
进行相关处理。其中,Reference-Handler
为一个拥有最高优先级的守护线程,会循环不断的处理pending链表中的对象引用,执行Cleaner的clean方法进行相关清理工作。所以当
DirectByteBuffer
仅被Cleaner引用(即为虚引用)时,其可以在任意GC时段被回收。当DirectByteBuffer
实例对象被回收时,在Reference-Handler线程操作中,会调用Cleaner的clean方法根据创建Cleaner时传入的Deallocator来进行堆外内存的释放。RandomAccessFile 随机读写
RandomAccessFile既可以读取文件内容,也可以向文件输出数据。同时,RandomAccessFile支持“随机访问”的方式,程序快可以直接跳转到文件的任意地方来读写数据。
andomAccessFile允许自由定义文件记录指针,RandomAccessFile可以不从开始的地方开始输出,因此RandomAccessFile可以向已存在的文件后追加内容。如果程序需要向已存在的文件后追加内容,则应该使用RandomAccessFile。
常用方法
/** * Returns the unique {@link java.nio.channels.FileChannel FileChannel} * object associated with this file. * * <p> The {@link java.nio.channels.FileChannel#position() * position} of the returned channel will always be equal to * this object's file-pointer offset as returned by the {@link * #getFilePointer getFilePointer} method. Changing this object's * file-pointer offset, whether explicitly or by reading or writing bytes, * will change the position of the channel, and vice versa. Changing the * file's length via this object will change the length seen via the file * channel, and vice versa. * * @return the file channel associated with this file * * @since 1.4 * @spec JSR-51 */ public final FileChannel getChannel() { synchronized (this) { if (channel == null) { channel = FileChannelImpl.open(fd, path, true, rw, this); } return channel; } }
/** * Sets the file-pointer offset, measured from the beginning of this * file, at which the next read or write occurs. The offset may be * set beyond the end of the file. Setting the offset beyond the end * of the file does not change the file length. The file length will * change only by writing after the offset has been set beyond the end * of the file. * * @param pos the offset position, measured in bytes from the * beginning of the file, at which to set the file * pointer. * @exception IOException if {@code pos} is less than * {@code 0} or if an I/O error occurs. */ public void seek(long pos) throws IOException { if (pos < 0) { throw new IOException("Negative seek offset"); } else { seek0(pos); } }
案例
//测试文件NIO public static void testRandomAccessFileWrite() throws Exception { RandomAccessFile raf = new RandomAccessFile(path, "rw"); raf.write("hello world\n".getBytes()); raf.write("hello java\n".getBytes()); System.out.println("write------------"); System.in.read(); //指定离开始处偏移4位的位置写 raf.seek(4); raf.write("ooxx".getBytes()); System.out.println("seek---------"); System.in.read(); FileChannel rafchannel = raf.getChannel(); //mmap 堆外 和文件映射的 byte not objtect MappedByteBuffer map = rafchannel.map(FileChannel.MapMode.READ_WRITE, 0, 4096); map.put("@@@".getBytes()); //不是系统调用 但是数据会到达 内核的pagecache //曾经我们是需要out.write() 这样的系统调用,才能让程序的data 进入内核的pagecache //曾经必须有用户态内核态切换 //mmap的内存映射,依然是内核的pagecache体系所约束的!!! //换言之,丢数据 //github上找一些 其他C程序员写的jni扩展库,使用linux内核的Direct IO //直接IO是忽略linux的pagecache //是把pagecache 交给了程序自己开辟一个字节数组当作pagecache,动用代码逻辑来维护一致性/dirty。。。一系列复杂问题 System.out.println("map--put--------"); System.in.read(); // map.force(); // flush raf.seek(0); ByteBuffer buffer = ByteBuffer.allocate(8192); // ByteBuffer buffer = ByteBuffer.allocateDirect(1024); int read = rafchannel.read(buffer); //写入到ByteBuffer 相当于buffer.put() System.out.println(buffer); buffer.flip(); System.out.println(buffer); for (int i = 0; i < buffer.limit(); i++) { Thread.sleep(200); System.out.print(((char)buffer.get(i))); } }
执行文件脚本
第一个read阻塞住,此时内容已经写到pagecache中
root@Code:~/develop/test# ./mysh* 2 write------------
root@Code:~/develop/test# cat out.txt && pcstat out.txt hello world hello java +---------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |---------+----------------+------------+-----------+---------| | out.txt | 31 | 1 | 1 | 100.000 | +---------+----------------+------------+-----------+---------+
随便输入一行放开read阻塞
root@Code:~/develop/test# ./mysh* 2 write------------ 啊 seek--------- map--put-------- java.nio.HeapByteBuffer[pos=4096 lim=8192 cap=8192] java.nio.HeapByteBuffer[pos=0 lim=4096 cap=8192] @@@looxxrld hello java
root@Code:~/develop/test# cat out.txt && pcstat out.txt @@@looxxshibing hello java +---------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |---------+----------------+------------+-----------+---------| | out.txt | 4096 | 1 | 1 | 100.000 | +---------+----------------+------------+-----------+---------+
mmap内存映射
上述用filechannel.map做了直接内存映射如下所示 mmap系统调用会打开一个mem的FD描述符,此时可以通过channel直接修改文件不用再走系统调用的读写操作,而是直接通过mmap的映射找到对应pagecache进行操作