Springer Science and Business Media LLC
Công bố khoa học tiêu biểu
Sắp xếp:
Direct private query in location-based services with GPU run time analysis
Springer Science and Business Media LLC - Tập 71 - Trang 537-573 - 2014
Private query in location-based service allows users to request and receive nearest point of interest (POI) without revealing their location or object received. However, since the service is customized, it requires user-specific information. Problems arise when a user due to privacy or security concerns is unwilling to disclose this information. Previous solutions to hide them have been found to be deficient and sometimes inefficient. In this paper, we propose a novel idea that will partition objects into neighborhoods supported by database design that allows a user to retrieve the exact nearest POI without revealing its location, or the object retrieved. The paper is organized into two parts. In the first part, we adopted the concept of topological space to generalize object space. To help limit information disclosed and minimize transmission cost, we create disjointed neighborhoods such that each neighborhood contains no more than one object. We organize the database matrix to align with object location in the area. For optimization, we introduce the concept of kernel in graphical processing unit (GPU), and we then develop parallel implementation of our algorithm by utilizing the computing power of the streaming multiprocessors of GPU and the parallel computing platform and programming model of Compute Unified Device Architecture (CUDA). In the second part, we study serial implementation of our algorithm with respect to execution time and complexity. Our experiment shows a scalable design that is suitable for any population size with minimal impact to user experience. We also study GPU–CUDA parallel implementation and compared the performance with CPU serial processing. The results show 23.9
$$\times $$
improvement of GPU over CPU. To help determine the optimal size for the parameters in our design or similar scalable algorithm, we provide analysis and model for predicting GPU execution time based on the size of the chosen parameter.
PARCSIM: a parallel computing simulator for scalable software optimization
Springer Science and Business Media LLC - Tập 78 - Trang 17231-17246 - 2022
PARCSIM is a parallel software simulator that allows a user to capture, through a graphical interface, matrix algorithm schemes that solve scientific problems. With this tool, the user can analyse the execution times that would be obtained by using different spatio-temporal mapping of computational tasks on available computational units, parallelism parameters and computational libraries. Furthermore, for complex problem models, the self-optimization engine incorporated in this tool analyses the huge tree of possible calculations grouping and mapping strategies in search of the choice that makes the best use of the available hardware resources. This tool also offers polyalgorithmic resolution by making automatically the best decision between different software approaches to solve a given problem on the hardware system available. This work shows the usefulness of this simulator to efficiently solve hierarchical problems constructed from previously modelled subproblems. This task is performed by reusing, in a scalable way, the optimization information of these subproblems to establish the best execution configuration for the composite problem.
Communication optimization for RDMA-based science data transmission tools
Springer Science and Business Media LLC - Tập 72 - Trang 3312-3327 - 2015
Big data has raised new challenges to data communication and transmission capacity. RDMA (Remote Direct Memory Access) Transport Protocol can reduce the communication delay of big data through kernel memory bypass technology and zero-copy technology. This paper introduced RDMA technology for the network performance optimization of BBCP (Babar Copy Program), a big data network file copy tool. Also, transmission queue-oriented Tetris Scheduler algorithm, I/O performance optimization and buffer memory optimization technique were introduced to realize RDMA-based file transfer tool, OBCP (Optimized Babar Copy Program). The experiment results verified that the optimization method could decrease the file transmission time to one-tenth of original BBCP and improve the performance by 50 % compared with related works.
Adjusting ECN marking threshold in multi-queue DCNs with deep learning
Springer Science and Business Media LLC - Tập 79 - Trang 5443-5468 - 2022
Explicit Congestion Notification (ECN) is designed for single queues. However, today, data center networks (DCNs) need multiple queues on each switch port. But, if some of the switches in multiple queue scenarios exceed the ECN marking threshold, all packets on the same port can receive the ECN mark. To solve this problem, we propose mapping-ECN as a systematic answer to the wrong marking problem. First, we differentiate the mice and elephant flows learning algorithm. Then, we prioritize mice flows by keeping in mind the deadline of other flows to not sacrifice them. Secondly, if a packet is marked, we need to have the privilege of using a faster path than other packets for early notification of network status. This will give a complete picture of the instant requests from all senders. In the worst case, if there is no capacity in the buffer to transmit the packets that exceed the threshold of the buffer, mapping-ECN uses Cut Payload (CP), where CP drops the payloads of the packets when a queue reaches the threshold, rather than the metadata. Consequently, just one bit will transmit that carries the information of the packet. Therefore, the sender will immediately retransmit that packet without waiting for a time-out like TCP. This retransmission can arrive within a millisecond for having an extremely low latency network. Last but not least, mapping-ECN explores different kinds of neural network techniques to avoid miss marking in the output port buffer. Therefore, if any packet is marked within the queue buffer, these marked packets are not considered again for marking choices within the output port buffer. Mapping-ECN improves the overall performance of Flow-Completion Time (FCT) for short flows around 7%, 99th percentile around 52%, and FCT for short flows around 8% in comparison between MQ-ECN. Moreover, when compared to the MQ-ECN, Mapping-ECN improves the FCT for large flows, for cache flows and for mice (web search) flows 4, 15 and 6%, respectively. This improvement is legible in comparison between DemePro and Priority-ECN as well.
Improving the availability of P2P-based network management systems by provisioning fault tolerance property
Springer Science and Business Media LLC - Tập 61 - Trang 912-934 - 2011
In this paper we propose a 3-tier hierarchical architecture which is based on peer-to-peer model for network management purpose. The main focus of the proposed architecture is provisioning fault tolerance property which in turn leads to increasing the availability of the Network Management System (NMS). In each tier of the architecture we use redundancy to achieve the aforementioned goal. However, we do not use redundant peers thus no peer redundancy is imposed to the system. Instead we use some selected peers in several roles and therefore only add some software redundancy which is easily tolerable by advanced processors of NMS’s peers. Due to the hierarchical structure, failure of nodes in each tier may affect NMS’s availability differently. Therefore we examined the effect of failure of peers which play different roles in the architecture on the availability of the system by means of extensive simulation study. The results show that the architecture proposed offers higher availability in comparison to previously proposed peer-to-peer NMS. It also offered lower vulnerability to failure of nodes when nodes are repairable.
Toward a general framework for jointly processor-workload empirical modeling
Springer Science and Business Media LLC - Tập 77 - Trang 5319-5353 - 2020
The complexity of state-of-the-art processor architectures and their consequent vast design spaces have made it difficult and time-consuming to explore the best configuration for them. Design space exploration (DSE) refers to systematic analysis and pruning of unwanted design points based on parameters of interest. DSE requires analysis and estimation of performance criteria of design points. A more accurate estimation produces a more efficient target design. A typical estimation method is machine learning approaches based on statistical inference, also known as empirical modeling, which requires only a limited number of simulations. Undoubtedly, an empirical model finds the optima much faster than using cycle-accurate simulations and is much more accurate than employing analytical models. For that purpose, our paper proposes a general methodology and a framework to find an appropriate and most accurate empirical model to estimate the performance of general-purpose or embedded multiprocessors running multithreaded workloads. This framework consists of three main steps: (1) Workload characterization and clustering, (2) Finding optimal model, and (3) Estimating the performance of a new workload outside the training set. These optimal performance prediction models could be utilized in the process of exploring the architectural design space. An experimental case is also tested using this framework for feasibility purposes. Validation experiments show MAEs less than 10% for this case.
Dynamic hybrid replication effectively combining tree and grid topology
Springer Science and Business Media LLC - Tập 59 - Trang 1289-1311 - 2010
Effective data management is an important issue in a large-scale distributed environment such as distributed DBMS, Peer-to-Peer System (P2P), data grid, and World Wide Web (WWW). This can be achieved by using a replication protocol, which efficiently decrease the communication cost and increase the data availability. The Tree Quorum protocol is one of the representative replication protocols allowing low read cost in the best case but it has some drawbacks such as that the number of replicas grows rapidly as the level increases and root replica is a bottleneck. The Grid protocol requires fixed operation cost regardless of failure condition. In this paper we propose a new replication protocol called Dynamic Hybrid protocol, which efficiently improves the existing protocols. The proposed protocol effectively combines the grid and tree structure so that the overall topology can be flexibly adjusted using three configuration parameters; tree height, number of descendants and grid depth. For high read availability, the height of tree and number of descendants are decreased and depth of grid is increased. For high write availability, the height of tree and the depth of grid are decreased, while the number of descendant is increased. We present an analytical model of read/write availability and the average number of nodes accessed for each operation. We also employ computer simulation to estimate the throughput and communication overhead. The proposed protocol always allows much smaller communication and operation cost than earlier protocols.
Implementation of scalable bidomain-based 3D cardiac simulations on a graphics processing unit cluster
Springer Science and Business Media LLC - Tập 75 Số 8 - Trang 5475-5506 - 2019
Sign prediction in sparse social networks using clustering and collaborative filtering
Springer Science and Business Media LLC - Tập 78 - Trang 596-615 - 2021
Today, social networks have created a wide variety of relationships between users. Friendships on Facebook and trust in the Epinions network are examples of these relationships. Most social media research has often focused on positive interpersonal relationships, such as friendships. However, in many real-world applications, there are also networks of negative relationships whose communication between users is either distrustful or hostile in nature. Such networks are called signed networks. In this work, sign prediction is made based on existing links between nodes. However, in real signed networks, links between nodes are usually sparse and sometimes absent. Therefore, existing methods are not appropriate to address the challenges of accurate sign prediction. To address the sparsity problem, this work aims to propose a method to predict the sign of positive and negative links based on clustering and collaborative filtering methods. Network clustering is done in such a way that the number of negative links between the clusters and the number of positive links within the clusters are as large as possible. As a result, the clusters are as close as possible to social balance. The main contribution of this work is using clustering and collaborative filtering methods, as well as proposing a new similarity criterion, to overcome the data sparseness problem and predict the unknown sign of links. Evaluations on the Epinions network have shown that the prediction accuracy of the proposed method has improved by 8% compared to previous studies.
Dense Mesh RCNN: assessment of human skin burn and burn depth severity
Springer Science and Business Media LLC - - 2024
Tổng số: 4,547
- 1
- 2
- 3
- 4
- 5
- 6
- 455