深入理解IO模型

文件系统的IO 内存IO java普通io和buffer IO java nio 包的 ByteBuffer api使用 DirectByteBuffer 使用堆外内存的原因 RandomAccessFile 随机读写 mmap内存映射网络IO

文件系统的IO

内存IO

java普通io和buffer IO

普通IO

test目录下执行脚本 ./mysh 0 (0 代表走最基本的file写的逻辑) ,同时开启另外一个shell窗口监控ll -h生成的out.txt的文件大小增加速度,如下肉眼可见的缓慢速度(KB级别)

打开strace追踪生成的文件,找到文件最大的为主线程代码


-rw-r--r-- 1 root root 4.1K Jun 27 12:12 OSFileIO.class
-rw-r--r-- 1 root root 4.4K Jun 27 11:37 OSFileIO.java
-rwxr-xr-x 1 root root  123 Jun 27 11:11 mysh*
-rw-r--r-- 1 root root  14K Jun 27 12:12 out.7754
-rw-r--r-- 1 root root 4.4M Jun 27 12:15 out.7755
-rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7756
-rw-r--r-- 1 root root 1.3K Jun 27 12:12 out.7757
-rw-r--r-- 1 root root 1.1K Jun 27 12:12 out.7758
-rw-r--r-- 1 root root 1.4K Jun 27 12:12 out.7759
-rw-r--r-- 1 root root 506K Jun 27 12:15 out.7760
-rw-r--r-- 1 root root  41K Jun 27 12:15 out.7761
-rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7762
-rw-r--r-- 1 root root 1.4K Jun 27 12:12 out.7763
-rw-r--r-- 1 root root 1.3K Jun 27 12:12 out.7764
-rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7765
-rw-r--r-- 1 root root  41K Jun 27 12:15 out.7766
-rw-r--r-- 1 root root  12K Jun 27 12:15 out.7767
-rw-r--r-- 1 root root  13K Jun 27 12:15 out.7768
-rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7769
-rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7770
-rw-r--r-- 1 root root 794K Jun 27 12:15 out.7771
-rw-r--r-- 1 root root 1.9K Jun 27 12:15 out.7772
-rw-r--r-- 1 root root 183K Jun 27 12:15 out.txt
#主线程追踪文件最大,这里是 out.7755

vim out.7755 set nu 显示行号,发现每一次system call 会写入10个字节的数据


 1307 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0
 1308 write(4, "123456789\n", 10)             = 10
 1309 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=691940400}, F      UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
 1310 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0
 1311 write(4, "123456789\n", 10)             = 10
 1312 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=702383900}, F      UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
 1313 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0
 1314 write(4, "123456789\n", 10)             = 10
 1315 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=712889000}, F      UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
 1316 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0
 1317 write(4, "123456789\n", 10)             = 10
 1318 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=723286200}, F      UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
 1319 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0
 1320 write(4, "123456789\n", 10)             = 10

bufferIO

test目录下执行脚本 ./mysh 1 (0 代表走 bufferIO的逻辑) ,同时开启另外一个shell窗口监控ll -h生成的out.txt的文件大小速度明显变大(MB级别),发现系统调用一次写多8190个字节

strace结果


7420 futex(0x7f4d80023928, FUTEX_WAKE_PRIVATE, 1) = 0
 7421 write(4, "123456789\n123456789\n123456789\n12"..., 8190) = 8190

总结

buffer的io将写入的内容存入数组,达到一定容量后再将这批数据,通过一次system call write 写入,而普通io是每写入一次都进行一次system call,system call需要进行用户态到内核态的切换,非常耗时,导致两者读写速度差几个数量级

java nio 包的 ByteBuffer

api使用

主要成员字段


//指针标记
private int mark = -1;
//指针的当前位置
private int position = 0;
//翻转后界限
private int limit;
//最大容量
private int capacity;
//当为堆外内存的时候，内存的地址
long address;

主要成员方法


//返回当前缓冲区的最大容量
public final int capacity() {return capacity;}
//返回当前的指针位置
public final int position() {return position;}
//返回当前的读写界限
public final int limit() {return limit;}
//标记当前指针位置
public final Buffer mark() {
    mark = position;
    return this;
}
//恢复当前指针位置
public final Buffer reset() {
    int m = mark;
    if (m < 0)
        throw new InvalidMarkException();
    position = m;
    return this;
}
//清空缓冲区，注意这里并不会清空数据，只是将各项指标初始化，后续再写入数据就直接覆盖
public final Buffer clear() {
    position = 0;
    limit = capacity;
    mark = -1;
    return this;
}
//切换读写模式
public final Buffer flip() {
    limit = position;
    position = 0;
    mark = -1;
    return this;
}
//重新从头进行读写，初始化指针和标记位置
public final Buffer rewind() {
    position = 0;
    mark = -1;
    return this;
}
//剩余可读可写的数量
public final int remaining() {return limit - position;}
//当前是否可读/可写
public final boolean hasRemaining() {return position < limit;}
//是不是只读的
public abstract boolean isReadOnly();
//是不是支持数组访问
public abstract boolean hasArray();
//获取当前缓存的字节数组（当hasArray返回为true的时候）
public abstract Object array();
//是不是堆外缓冲区也就是直接缓冲区
public abstract boolean isDirect();
//取消缓冲区
final void discardMark() {mark = -1;}
//压缩缓存的字节数组,并将position指向压缩后数组最后元素的下一位
public abstract ByteBuffer compact();

测试案例


@Test
    public  void whatByteBuffer(){

//        ByteBuffer buffer = ByteBuffer.allocate(1024); 堆内内存
        ByteBuffer buffer = ByteBuffer.allocateDirect(1024);//堆外内存,由Unsafe类和VM类调用JNI实现


        System.out.println("postition: " + buffer.position());
        System.out.println("limit: " +  buffer.limit());
        System.out.println("capacity: " + buffer.capacity());
        System.out.println("mark: " + buffer);

        buffer.put("123".getBytes());//实际存放的是"1","2","3"对应的ASCII值

        System.out.println("-------------put:123......");
        System.out.println("mark: " + buffer);

        buffer.flip();   //读写交替

        System.out.println("-------------flip......");
        System.out.println("mark: " + buffer);

        buffer.get();

        System.out.println("-------------get......");
        System.out.println("mark: " + buffer);

        buffer.compact();

        System.out.println("-------------compact......");
        System.out.println("mark: " + buffer);

        buffer.clear();

        System.out.println("-------------clear......");
        System.out.println("mark: " + buffer);

    }
//postition: 0
limit: 1024
capacity: 1024
mark: java.nio.DirectByteBuffer[pos=0 lim=1024 cap=1024]
-------------put:123......
mark: java.nio.DirectByteBuffer[pos=3 lim=1024 cap=1024]
-------------flip......
mark: java.nio.DirectByteBuffer[pos=0 lim=3 cap=1024]
-------------get......
mark: java.nio.DirectByteBuffer[pos=1 lim=3 cap=1024]
-------------compact......
mark: java.nio.DirectByteBuffer[pos=2 lim=1024 cap=1024]
-------------clear......
mark: java.nio.DirectByteBuffer[pos=0 lim=1024 cap=1024]

ps put "123" 其实转成了对应的ASCII码存储

案例流程演示

DirectByteBuffer


ByteBuffer buffer = ByteBuffer.allocateDirect(1024)
//
public static ByteBuffer allocateDirect(int capacity) {
        return new DirectByteBuffer(capacity);
    }

主要通过unsafe类分配堆外内存

堆外内存存在于JVM管控之外的内存区域，Java中对堆外内存的操作，依赖于Unsafe提供的操作堆外内存的native方法。

使用堆外内存的原因

对垃圾回收停顿的改善。由于堆外内存是直接受操作系统管理而不是JVM，所以当我们使用堆外内存时，即可保持较小的堆内内存规模。从而在GC时减少回收停顿对于应用的影响。

提升程序I/O操作的性能。通常在I/O通信过程中，会存在堆内内存到堆外内存的数据拷贝操作，对于需要频繁进行内存间数据拷贝且生命周期较短的暂存数据，都建议存储到堆外内存。


// Primary constructor
    //
    DirectByteBuffer(int cap) {                   // package-private

        super(-1, 0, cap, cap);
        boolean pa = VM.isDirectMemoryPageAligned();
        int ps = Bits.pageSize();
        long size = Math.max(1L, (long)cap + (pa ? ps : 0));
        Bits.reserveMemory(size, cap);

        long base = 0;
        try {
            base = unsafe.allocateMemory(size);
        } catch (OutOfMemoryError x) {
            Bits.unreserveMemory(size, cap);
            throw x;
        }
        unsafe.setMemory(base, size, (byte) 0);
        if (pa && (base % ps != 0)) {
            // Round up to page boundary
            address = base + ps - (base & (ps - 1));
        } else {
            address = base;
        }
        cleaner = Cleaner.create(this, new Deallocator(base, size, cap));
        att = null;

    }

Cleaner继承自Java四大引用类型之一的虚引用PhantomReference（众所周知，无法通过虚引用获取与之关联的对象实例，且当对象仅被虚引用引用时，在任何发生GC的时候，其均可被回收），通常PhantomReference与引用队列ReferenceQueue结合使用，可以实现虚引用关联对象被垃圾回收时能够进行系统通知、资源清理等功能。如下图所示，当某个被Cleaner引用的对象将被回收时，JVM垃圾收集器会将此对象的引用放入到对象引用中的pending链表中，等待Reference-Handler进行相关处理。其中，Reference-Handler为一个拥有最高优先级的守护线程，会循环不断的处理pending链表中的对象引用，执行Cleaner的clean方法进行相关清理工作。

所以当DirectByteBuffer仅被Cleaner引用（即为虚引用）时，其可以在任意GC时段被回收。当DirectByteBuffer实例对象被回收时，在Reference-Handler线程操作中，会调用Cleaner的clean方法根据创建Cleaner时传入的Deallocator来进行堆外内存的释放。

RandomAccessFile 随机读写

RandomAccessFile既可以读取文件内容，也可以向文件输出数据。同时，RandomAccessFile支持“随机访问”的方式，程序快可以直接跳转到文件的任意地方来读写数据。

andomAccessFile允许自由定义文件记录指针，RandomAccessFile可以不从开始的地方开始输出，因此RandomAccessFile可以向已存在的文件后追加内容。如果程序需要向已存在的文件后追加内容，则应该使用RandomAccessFile。

常用方法


/**
     * Returns the unique {@link java.nio.channels.FileChannel FileChannel}
     * object associated with this file.
     *
     * <p> The {@link java.nio.channels.FileChannel#position()
     * position} of the returned channel will always be equal to
     * this object's file-pointer offset as returned by the {@link
     * #getFilePointer getFilePointer} method.  Changing this object's
     * file-pointer offset, whether explicitly or by reading or writing bytes,
     * will change the position of the channel, and vice versa.  Changing the
     * file's length via this object will change the length seen via the file
     * channel, and vice versa.
     *
     * @return  the file channel associated with this file
     *
     * @since 1.4
     * @spec JSR-51
     */
    public final FileChannel getChannel() {
        synchronized (this) {
            if (channel == null) {
                channel = FileChannelImpl.open(fd, path, true, rw, this);
            }
            return channel;
        }
    }


/**
     * Sets the file-pointer offset, measured from the beginning of this
     * file, at which the next read or write occurs.  The offset may be
     * set beyond the end of the file. Setting the offset beyond the end
     * of the file does not change the file length.  The file length will
     * change only by writing after the offset has been set beyond the end
     * of the file.
     *
     * @param      pos   the offset position, measured in bytes from the
     *                   beginning of the file, at which to set the file
     *                   pointer.
     * @exception  IOException  if {@code pos} is less than
     *                          {@code 0} or if an I/O error occurs.
     */
    public void seek(long pos) throws IOException {
        if (pos < 0) {
            throw new IOException("Negative seek offset");
        } else {
            seek0(pos);
        }
    }

案例


//测试文件NIO


        public static void testRandomAccessFileWrite() throws  Exception {
        RandomAccessFile raf = new RandomAccessFile(path, "rw");
        raf.write("hello world\n".getBytes());
        raf.write("hello java\n".getBytes());
        System.out.println("write------------");
        System.in.read();
				//指定离开始处偏移4位的位置写
        raf.seek(4);
        raf.write("ooxx".getBytes());
        System.out.println("seek---------");
        System.in.read();
        FileChannel rafchannel = raf.getChannel();
        //mmap  堆外  和文件映射的   byte  not  objtect
        MappedByteBuffer map = rafchannel.map(FileChannel.MapMode.READ_WRITE, 0, 4096);
        map.put("@@@".getBytes());  //不是系统调用  但是数据会到达 内核的pagecache
            //曾经我们是需要out.write()  这样的系统调用，才能让程序的data 进入内核的pagecache
            //曾经必须有用户态内核态切换
	//mmap的内存映射，依然是内核的pagecache体系所约束的！！！
            //换言之，丢数据
            //github上找一些 其他C程序员写的jni扩展库，使用linux内核的Direct IO
            //直接IO是忽略linux的pagecache
            //是把pagecache  交给了程序自己开辟一个字节数组当作pagecache，动用代码逻辑来维护一致性/dirty。。。一系列复杂问题
        System.out.println("map--put--------");
        System.in.read();
//        map.force(); //  flush
        raf.seek(0);
        ByteBuffer buffer = ByteBuffer.allocate(8192);
//        ByteBuffer buffer = ByteBuffer.allocateDirect(1024);
        int read = rafchannel.read(buffer);   //写入到ByteBuffer  相当于buffer.put()
        System.out.println(buffer);
        buffer.flip();
        System.out.println(buffer);
        for (int i = 0; i < buffer.limit(); i++) {
            Thread.sleep(200);
            System.out.print(((char)buffer.get(i)));
        }
    }

执行文件脚本

第一个read阻塞住,此时内容已经写到pagecache中


root@Code:~/develop/test# ./mysh*  2
write------------


root@Code:~/develop/test# cat out.txt && pcstat out.txt
hello world
hello java
+---------+----------------+------------+-----------+---------+
| Name    | Size (bytes)   | Pages      | Cached    | Percent |
|---------+----------------+------------+-----------+---------|
| out.txt | 31             | 1          | 1         | 100.000 |
+---------+----------------+------------+-----------+---------+

随便输入一行放开read阻塞


root@Code:~/develop/test# ./mysh*  2
write------------
啊
seek---------
map--put--------
java.nio.HeapByteBuffer[pos=4096 lim=8192 cap=8192]
java.nio.HeapByteBuffer[pos=0 lim=4096 cap=8192]
@@@looxxrld
hello java


root@Code:~/develop/test# cat out.txt && pcstat out.txt
@@@looxxshibing
hello java
+---------+----------------+------------+-----------+---------+
| Name    | Size (bytes)   | Pages      | Cached    | Percent |
|---------+----------------+------------+-----------+---------|
| out.txt | 4096           | 1          | 1         | 100.000 |
+---------+----------------+------------+-----------+---------+

mmap内存映射

上述用filechannel.map做了直接内存映射如下所示 mmap系统调用会打开一个mem的FD描述符,此时可以通过channel直接修改文件不用再走系统调用的读写操作,而是直接通过mmap的映射找到对应pagecache进行操作