论文发表

[21] Yao E, Tan G. Bit Flipping Errors in High Performance Linpack at Exascale and Beyond[C]//Parallel Processing (ICPP), 2015 44th International Conference on. IEEE, 2015: 420-429.

[22] Luo Y, Tan G, Mo Z, et al. FAST: A fast stencil autotuning framework based on an optimal-solution space model[C]//Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 2015: 187-196.

[23] Zhang C, Tang W, Guangming T. Accelerating massive short reads mapping for next generation sequencing[C]//Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays. ACM, 2014: 246-246.

[24] Lu H, Tan G, Chen M, et al. Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems[C]//Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on. IEEE, 2014: 1261-1268.

[25] Luo Y L, Tan G M. Optimizing stencil code via locality of computation[C]//Proceedings of the 23rd international conference on Parallel architectures and compilation. ACM, 2014: 477-478.

[26] Su Y, Cao Z, Fan Z, et al. Building a large-scale direct network with low-radix routers[C]//Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on. IEEE, 2014: 368-375.

[27] Fan Z, Cao Z, Su Y, et al. HiNetSim: A parallel simulator for large-scale hierarchical direct networks[C]//IFIP International Conference on Network and Parallel Computing. Springer Berlin Heidelberg, 2014: 120-131.

[28] Cao Z, Chen F, An X, et al. Accelerating synchronization communications for high-density blade enclosure[C]//Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 2014: 14.

[29] Yan J, Tan G, Sun N. Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture[J]. The Journal of Supercomputing, 2014, 69(3): 1462-1490.

[30] Zang D, Cao Z, Wang Z, et al. Decentralized NIC-Switching Architecture Using SR-IOV PCI Express Network Device[J]. IEEE Micro, 2014, 34(5): 42-50.