thumbnail

Springer Science and Business Media LLC

  0920-8542

  1573-0484

 

Cơ quản chủ quản:  Springer Netherlands , SPRINGER

Lĩnh vực:
Information SystemsHardware and ArchitectureTheoretical Computer ScienceSoftware

Phân tích ảnh hưởng

Thông tin về tạp chí

 

Các bài báo tiêu biểu

Macro benchmarking edge devices using enhanced super-resolution generative adversarial networks (ESRGANs)
Tập 79 - Trang 5360-5373 - 2022
Jing-Ru C. Cheng, Corwin Stanford, Steven R. Glandon, Anthony L. Lam, Warren R. Williams
In standard machine learning implementations, training and inference takes place on servers located remotely from where data is gathered. With the advent of the Internet of Things (IoT), the groundwork is laid to shift half of that computing burden (inferencing) closer to where data is gathered. This paradigm shift to edge computing can significantly decrease the latency and cost of these tasks. Many small, powerful devices have been developed in recent years with the potential to fulfill that goal. In this paper, we analyze two such devices, the NVIDIA Jetson AGX Xavier Developer Kit and the Microsoft Azure Stack Edge Pro (2 GPUs). In addition, the NVIDIA DGX-1 system containerized in a ruggedized case is also taken for running inference model at the Edge. For comparison, the performance of these devices is compared to more common inferencing devices, including a laptop, desktop, and high performance computing (HPC) system. The inferencing model used for testing is the Enhanced Super-Resolution Generative Adversarial Networks (ESRGANs), which was developed using techniques borrowed primarily from other GAN designs, most notably SRGANs and Relativistic average GANs (RaGANs), along with some novel techniques. Metrics chosen for benchmarking were inferencing time, GPU power consumption, and GPU temperature. We found that inferencing using ESRGANs was approximately 10 to 20 times slower on the Jetson edge device, but used approximately 100 to 300 times less power, and was approximately 2 times cooler than any of the other devices tested. The inferencing using ESRGANs performed very similarly on the Azure device as on the more traditional methods. The Azure device performed with slightly slower speeds and equivalent temperatures to the other devices, but with slightly less power consumption.
UPSRVNet: Ultralightweight, Privacy preserved, and Secure RFID-based authentication protocol for VIoT Networks
- Trang 1-28 - 2023
Rakesh Kumar, Sunil K. Singh, D. K. Lobiyal
Vehicular Internet of Things (VIoT) refers to integrating Internet of Things (IoT) technology into the transportation sector, specifically vehicles. It aims to enhance transportation safety, efficiency, and sustainability by using connected devices and real-time data exchange between vehicles, road infrastructure, and transportation management systems. RFID can provide valuable insights into the functioning and performance of vehicles, enabling the development of intelligent transportation systems and contributing to the growth of the VIoT. To secure essential driving data tag location, data privacy, and road safety are the primary concerns in the VIoT system. An attacker can perform a variety of attacks on the tag and compromise its privacy through the wireless channel between it and the reader. To ensure secure communication for the various application of the VIoT system, An RFID-based protocol is proposed, which is referred to as UPSRVNet (Ultralightweight, Privacy preserved, and Secure RFID-based authentication protocol for VIoT Networks). The proposed protocol guarantees secure authentication in VIoT networks using ultralightweight, privacy-preserving RFID tags by substituting bit formation and right-shift rotation for high-computing operations like hash functions and encryption/decryption algorithms. It also reduces RFID tags' storage and communication overhead and aims to secure the driver’s location, personal information, and resistance from known attacks. As per the experiment result, the proposed UPSRVNet protocol showed a significant reduction in communication overhead between tag and reader, that is 50%, and the storage cost on the tag is also reduced by 33.33% compared to existing protocols. Due to this, the certification and authentication processes may be performed quickly. The protocol ensured the security requirements by informal analysis using the Scyther tool. A comprehensive analysis of the proposed protocol reveals its substantial advantages over the existing protocol concerning computation complexity, communication efficacy, and storage expenditure.
Image-Space Decomposition Algorithms for Sort-First Parallel Volume Rendering of Unstructured Grids
Tập 15 - Trang 51-93 - 2000
Hüuseyin Kutluca, Tah¨sin M. Kurç, Cevdet Aykanat
Twelve adaptive image-space decomposition algorithms are presented for sort-first parallel direct volume rendering (DVR) of unstructured grids on distributed-memory architectures. The algorithms are presented under a novel taxonomy based on the dimension of the screen decomposition, the dimension of the workload arrays used in the decomposition, and the scheme used for workload-array creation and querying the workload of a region. For the 2D decomposition schemes using 2D workload arrays, a novel scheme is proposed to query the exact number of screen-space bounding boxes of the primitives in a screen region in constant time. A probe-based chains-on-chains partitioning algorithm is exploited for load balancing in optimal 1D decomposition and iterative 2D rectilinear decomposition (RD). A new probe-based optimal 2D jagged decomposition (OJD) is proposed which is much faster than the dynamic-programming based OJD scheme proposed in the literature. The summed-area table is successfully exploited to query the workload of a rectangular region in constant time in both OJD and RD schemes for the subdivision of general 2D workload arrays. Two orthogonal recursive bisection (ORB) variants are adapted to relax the straight-line division restriction in conventional ORB through using the medians-of-medians approach on regular mesh and quadtree superimposed on the screen. Two approaches based on the Hilbert space-filling curve and graph-partitioning are also proposed. An efficient primitive classification scheme is proposed for redistribution in 1D, and 2D rectilinear and jagged decompositions. The performance comparison of the decomposition algorithms is modeled by establishing appropriate quality measures for load-balancing, amount of primitive replication and parallel execution time. The experimental results on a Parsytec CC system using a set of benchmark volumetric datasets verify the validity of the proposed performance models. The performance evaluation of the decomposition algorithms is also carried out through the sort-first parallelization of an efficient DVR algorithm.
Nonlinear characterization and complexity analysis of cardiotocographic examinations using entropy measures
Tập 76 - Trang 1305-1320 - 2018
João Alexandre Lobo Marques, Paulo C. Cortez, João P. V. Madeiro, Victor Hugo C. de Albuquerque, Simon James Fong, Fernando S. Schlindwein
The nonlinear analysis of biological time series provides new possibilities to improve computer aided diagnostic systems, traditionally based on linear techniques. The cardiotocography (CTG) examination records simultaneously the fetal heart rate (FHR) and the maternal uterine contractions. This paper shows, at first, that both signals present nonlinear components based on the surrogate data analysis technique and exploratory data analysis with the return (lag) plot. After that, a nonlinear complexity analysis is proposed considering two databases, intrapartum (CTG-I) and antepartum (CTG-A) with previously identified normal and suspicious/pathological groups. Approximate Entropy (ApEn) and Sample Entropy (SampEn), which are signal complexity measures, are calculated. The results show that low entropy values are found when the whole examination is considered, $$\hbox {ApEn}=0.3244\pm 0.1078$$ and $$\hbox {SampEn}=0.2351\pm 0.0758$$ ($$\hbox {average}\pm \hbox {standard}$$ deviation). Besides, no significant difference was found between the normal ($$\hbox {ApEn}=0.3366\pm 0.1250$$ and $$\hbox {SampEn}=0.2532\pm 0.0818$$) and suspicious/pathological ($$\hbox {ApEn}=0.3420\pm 0.1220$$ and $$\hbox {SampEn}=0.2457\pm 0.0850$$) groups for the CTG-A database. For a better analysis, this work proposes a windowed entropy calculation considering 5-min window. The windowed entropies presented higher average values ($$\hbox {ApEn}=0.6505\pm 0.2301$$ and $$\hbox {SampEn}=0.5290\pm 0.1188$$) for the CTG-A and ($$\hbox {ApEn}=0.5611\pm 0.1970$$ and $$\hbox {SampEn}=0.4909\pm 0.1782$$) for the CTG-I. The changes during specific long-term events show that entropy can be considered as a first-level indicator for strong FHR decelerations ($$\hbox {ApEn}=0.1487\pm 0.0341$$ and $$\hbox {SampEn}=0.1289\pm 0.0301$$), FHR accelerations ($$\hbox {ApEn}=0.1830\pm 0.1078$$ and $$\hbox {SampEn}=0.1501\pm 0.0703$$) and also for pathological behavior such as sinusoidal FHR ($$\hbox {ApEn}=0.1808\pm 0.0445$$ and $$\hbox {SampEn}=0.1621\pm 0.0381$$).
HPC node performance and energy modeling with the co-location of applications
Tập 72 Số 12 - Trang 4771-4809 - 2016
Dauwe, Daniel, Jonardi, Eric, Friese, Ryan D., Pasricha, Sudeep, Maciejewski, Anthony A., Bader, David A., Siegel, Howard Jay
Multicore processors have become an integral part of modern large-scale and high-performance parallel and distributed computing systems. Unfortunately, applications co-located on multicore processors can suffer from decreased performance and increased dynamic energy use as a result of interference in shared resources, such as memory. As this interference is difficult to characterize, assumptions about application execution time and energy usage can be misleading in the presence of co-location. Consequently, it is important to accurately characterize the performance and energy usage of applications that execute in a co-located manner on these architectures. This work investigates some of the disadvantages of co-location, and presents a methodology for building models capable of utilizing varying amounts of information about a target application and its co-located applications to make predictions about the target application’s execution time and the system’s energy use under arbitrary co-locations of a wide range of application types. The proposed methodology is validated on three different server class Intel Xeon multicore processors using eleven applications from two scientific benchmark suites. The model’s utility for scheduling is also demonstrated in a simulated large-scale high-performance computing environment through the creation of a co-location aware scheduling heuristic. This heuristic demonstrates that scheduling using information generated with the proposed modeling methodology is capable of making significant improvements over a scheduling heuristic that is oblivious to co-location interference.
CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units
Tập 67 - Trang 47-68 - 2013
Dominic D. J. Chandar, Jayanarayanan Sitaraman, Dimitri Mavriplis
The application of graphics processing units (GPU) to solve partial differential equations is gaining popularity with the advent of improved computer hardware. Various lower level interfaces exist that allow the user to access GPU specific functions. One such interface is NVIDIA’s Compute Unified Device Architecture (CUDA) library. However, porting existing codes to run on the GPU requires the user to write kernels that execute on multiple cores, in the form of Single Instruction Multiple Data (SIMD). In the present work, a higher level framework, termed CU++, has been developed that uses object oriented programming techniques available in C++ such as polymorphism, operator overloading, and template meta programming. Using this approach, CUDA kernels can be generated automatically during compile time. Briefly, CU++ allows a code developer with just C/C++ knowledge to write computer programs that will execute on the GPU without any knowledge of specific programming techniques in CUDA. This approach is tremendously beneficial for Computational Fluid Dynamics (CFD) code development because it mitigates the necessity of creating hundreds of GPU kernels for various purposes. In its current form, CU++ provides a framework for parallel array arithmetic, simplified data structures to interface with the GPU, and smart array indexing. An implementation of heterogeneous parallelism, i.e., utilizing multiple GPUs to simultaneously process a partitioned grid system with communication at the interfaces using Message Passing Interface (MPI) has been developed and tested.
Energy-efficient polyglot persistence database live migration among heterogeneous clouds
- 2023
Kiranbir Kaur, Salil Bharany, Sumit Badotra, Karan Aggarwal, Anand Nayyar, Sandeep Sharma
Multithreaded computing in evolutionary design and in artificial life simulations
Tập 73 - Trang 2214-2228 - 2016
Maciej Komosinski, Szymon Ulatowski
This article investigates low-level and high-level multithreaded performance of evolutionary processes that are typically employed in evolutionary design and artificial life. Computations performed in these areas are specific because evaluation of each genotype usually involves time-consuming simulation of virtual environments and physics. Computational experiments have been conducted using the Framsticks simulator running a multithreaded version of a standard evolutionary experiment. Tests carried out on five diverse machines and two operating systems demonstrated how low-level performance depends on the number of physical and logical CPU cores and on the number of threads. Two string implementations have been compared, and their raw performance turned out to fundamentally differ in a multithreading setup. To improve high-level performance of parallel evolutionary algorithms, i.e. the quality of optimized solutions, a new distribution scheme that is especially useful and efficient for complex representations of solutions—the convection distribution—has been introduced. This new distribution scheme has been compared against a random distribution of genotypes among threads that carry out evolutionary processes.