ACM SIGBED Student Research Competition 2019

ACM SIGBED Student Research Competition

This year for the first time, the Embedded Systems Week (ESWEEK) invited participation in the ACM Student Research Competition (SRC). Sponsored by ACM and Microsoft Research, the SRC is a forum for undergraduates and graduate students to share their research results, exchange ideas, and improve their communication skills while competing for prizes.

Thanks to the additional support provided by SIGBED, students accepted to participate in the SRC were entitled to a travel grant (up to $1200) to help cover travel expenses. The top 3 undergraduate and top 3 graduate winners will receive all of the following prizes:

Monetary prizes of $500, $300, and $200, respectively.
An award medal (gold, silver or bronze) and a one-year complimentary ACM membership with a subscription to ACM’s Digital Library.
The names of the winners and their placement will be posted on the ACM SRC web site (https://src.acm.org/winners/2019).
In addition, the first place winner in each category (undergraduate, graduate) will receive an invitation to participate in the SRC Grand Finals, an on-line round of competitions among the first place winners of individual conference-hosted SRCs.
The top three graduate and undergraduate Grand Finalists will receive an additional $500, $300, and $200, respectively, along with Grand Finalist medals (gold, silver, bronze). Grand Finalists and their advisors will be invited to the Annual ACM Awards Banquet for an all-expenses-paid trip, where they will be recognized for their accomplishments, along with other prestigious ACM award winners, including the winner of the Turing Award.

The SRC is structured in two rounds: a poster session and a presentation session. A panel of judges selects a number of finalists from the poster session, who are invited to the SRC presentation session at ESWEEK 2019 and compete for the prizes. The evaluation is concentrated on the quality of both visual and oral presentation, the research methods, and the significance of contribution. You can find more information on the ACM Student Research Competition site (https://src.acm.org/). You can also find all the details of this year’s SRC @ ESWEEK here: https://esweek.org/src

Timeline

Abstract Submission

Saturday, July 21 2019

11:59pm UTC-12

Acceptance Notification

Sunday, August 11, 2019

SRC First Round: Posters

Monday, October 14 2019

SRC Second Round: Presentations

Wednesday, October 16 2019

Organizers

Renato Mancuso

Boston University

Co-organizer & Judge

Hyoseung Kim

University of California Riverside

Co-organizer & Judge

Bryan ward

MIT Lincoln Lab

Judge

Wanli Chang

University of York

Judge

Borzoo Bonakdarpour

Iowa State University

Judge

SRC @ ESWEEK'19 Winners

Graduate Projects

Hasindu Gamaarachchi

University of New South Wales, Australia

1st Place

Real-time, portable and lightweight Nanopore DNA sequence analysis using System-on-Chip

See Abstract

DNA sequence analysis is the key to precision medicine. Over the last two decades, DNA sequencing machines have evolved from >500kg machines to pocket-sized devices such as the 87g Oxford Nanopore MinION. However, software tools that analyse the terabytes of data produced by sequencing machines are still dependent on high-performance or cloud computers, which limits the utility of portable sequencers. State-of-the-art DNA analysis software tools are typically designed and developed by biologists, having access to near-unlimited computational and memory resources, and are thus considerably un-optimised. These tools are extremely complex and are collections of dozens of various algorithms and numerous heuristically determined parameters. Previous research on accelerating and optimising such tools has focused on sub-components, which are just a fraction of a tool. Consequently, such accelerations contribute minimally to global efficiency when integrated into an actual software tool.

For the first time, we optimise a complete Nanopore DNA analysis work-flow (a collection of few software tools run sequentially) to execute on portable and lightweight embedded systems. We analyse the work-flow and identify the nature of the workloads (CPU intensive, memory-intensive, I/O intensive) in different portions of the work-flow. Then we systematically re-structure the software and optimise bottlenecks to execute on lightweight System-on-Chip equipped with embedded GPU. We synergistically use the characteristics of biological data, associated algorithms, and computer software and hardware architecture for re-structuring and optimising. Major bottlenecks are resolved via CPU optimisations, parallelisation for GPU architectures, GPU optimisations (exploiting data access patterns for better cache usage and memory coalescing), and heterogeneous CPU-GPU work-load balancing. The heterogeneous CPU-GPU work-load balancing strategy is capable of determining if a given piece of DNA data is best suitable for the CPU or GPU and assigns it appropriately during execution. Importantly, our re-structuring and optimisations do not alter the accuracy of the results.

Sumit K. Mandal

Arizona State University

2nd Place

Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

See Abstract

Networks-on-Chip (NoCs) is important in system performance and power consumption of many-core chip-multiprocessors. Therefore, pre-silicon evaluation environments include cycle-accurate NoC simulators. However, cycle-accurate NoC simulators consume significant portion of full system simulation time. To reduce simulation time, replacing cycle accurate NoC simulators with equivalent models is a common practice. However, existing analytical models in the literature cannot model large group of industrial NoC due to two reasons. First, they assume the NoCs as queuing network in continuous time. This assumption is not valid in reality, as each transaction in NoC happens in discrete clock cycles. Second, since industrial NoCs employ priority schedulers and multiple priority classes, existing analytical models which assume fair arbitration, cannot model industrial NoCs.

To address the challenges mentioned, first we consider NoCs as a discrete time queuing network. Then we propose a systematic approach to construct priority-aware analytical performance models using micro-architecture specifications and NoC traffic. Our approach decomposes the given NoC into individual queues with modified service time to enable accurate and scalable latency computations. Specifically, we introduce two novel transformations along with an algorithm that iteratively applies these transformations to decompose the queuing network.

Kasra Moazzemi

UC Irvine

3rd Place

Learning Coordination for Runtime Resource Management of Heterogeneous Systems

See Abstract

Dynamic resource management for many-core systems is becoming more and more challenging. The complexity can be due to diverse workload characteristics with conflicting demands or limited shared resources such as bandwidth, computating resources and power. This is the reason why resource management strategies for heterogeneous systems need to distribute shared resource justly while coordinating the highlevel system goals at runtime in a scalable and efficient manner. To address the complexity of dynamic resource management in heterogeneous systems, state-of-the-art techniques that use heuristics, control theoretic or machine learning approaches have been proposed. In order to better adapt to dynamic changes at runtime there needs to be a learning mechanism involved in the resource management of modern embedded systems. In case of heterogeneous systems with multiple types of processing elements such as CPU, GPU, DSP, etc. resource manager is responsible for realizing the status and limitations of each unit and coordinate between them.

In this work, we propose a hierarchical management scheme that coordinates the learning mechanisms used in management of compute units. Learning mechanism used in management of various compute units can be in the form of adaptive control, reinforcement learning or other adaptive methods. It is important to note these methods require large amount of computation and needs tailoring to operate on embedded devices at runtime. Modern heterogeneous platforms offer multiple compute units with various computational power and programming models. This heterogeneity often call for individual runtime management schemes for each compute unit (e.g. GPU, DSP) or cluster of compute units (e.g. CPU clusters). There is a need for coordination between these mechanisms in order to make sure the system can achieve its runtime goals dictated by the user or operating enviornment.

Undergraduate Projects

Margaret Steiner

George Washington University

1st Place

Improving the Reliability of Medical Smart Alarms Using Confidence Calibration

See Abstract

Hospitalized infants are commonly monitored with pulse oximetry to detect intermittent hypoxemia events. Yet, low peripheral oxygen saturation (SpO2) alarms often do not indicate clinically actionable events and thus contribute to alarm fatigue, a phenomenon where life-threatening events may be missed due to excessive false alarms. Smart alarms which distinguish between clinically valid and invalid alarms have potential to mitigate risks associated with pulse oximetry monitoring and improve outcomes. While smart alarm technologies have been developed, estimating the confidence of alarm suppression remains an open challenge. Post-processing calibration methods correct probability estimates from classifiers so that the model output is a trustworthy representation of correctness likelihood.

The goal of this work is to apply confidence calibration – specifically the temperature scaling method – to the classification of low SpO2 alarms in a smart alarm for intermittent hypoxemia, in order to increase the clinical utility of such a system. Specifically, the contributions are (i) development of a low SpO2 smart alarm with calibrated confidence and (ii) an evaluation of MCE/ECE on real patient data. Towards this aim, the AdaBoost with Reject Algorithm was implemented on a dataset of 9,547 low SpO2 alarms from 100 hospitalized children at the Childrens Hospital of Philadelphia. This classifier is designed to optimize performance in determining alarm validity while ensuring no clinically significant alarms are silenced. The temperature scaling method – an extension of Platt scaling which has been previously shown to be effective in calibration tasks – is implemented via Newtons method on the predicted class probabilities outputted by the classifier on validation data during patient-wise cross-validation.

Abhay Sheel Anand

Bharati Vidyapeeth’s College of Engineering, India

Bhawana Chhaglani

Bharati Vidyapeeth’s College of Engineering, India

2nd Place

Gauntlet: A light and compact wearable for capturing precise finger and hand gestures

See Abstract

In this work, we propose the design and development of a lightweight and compact wearable sensing glove that is capable of capturing movements of every joint of the hand. This includes a total of 18 SMD rotary position potentiometers placed at every joint in the hand and an IMU placed at the back of the hand to capture motions like waving. The glove is intricately designed to ensure unconstrained movements that allow one to make uncontrived hand gestures. The motivation behind this innovation was to develop a precise and low-cost system for deaf and mute people to ensure their seamless communication with other people. This glove is tested for converting different sign languages(American sign language, Pidgin signed English and Signed exact English) into speech and it outperforms the existing systems that employ linear potentiometers, flex sensor or computer vision in terms of accuracy and cost-effectiveness as SMD potentiometers are comparatively cheap. Most of the existing methods for measuring finger movements revolve around the idea that DIP(Distal Phalanx) and PIP(Proximal Phalanx) joint movements of the finger take place simultaneously. There are numerably less but important scenarios where these movements occur separately and need to be analyzed. This system is capable of identifying and analyzing those cases.

The present system consists of a network of potentiometers, a microcontroller, Bluetooth and an android application. To decode the hand gesture using raw voltage values of the potentiometer is a challenging task since multiple gestures for a wide range of users have to be identified. Therefore, using machine learning algorithm seemed like a viable approach for this task. There are two approaches to recognize the gesture. One is to read the potentiometer values send it to the smartphone and identify the gesture on the smartphone using classification algorithm. Another approach is to classify the gesture on the microcontroller itself and send it to the smartphone. Although in the first approach, smartphones can provide more RAM for running the algorithm, it will result in a constant high battery consumption rate. In the second approach, there are memory issues as microcontrollers have limited RAM. So we use ProtoNN – a k-NN based algorithm, specifically made to run on resource-constrained devices. The microcontroller reads the potentiometer values, decodes and sends them to the phone that converts this data into speech. Currently, the system is successfully trained for identifying five gestures. In this poster, we compare the accuracy of our system with the existing state-of-the-art techniques and examine the sensitivity of our system in recognizing the gestures precisely. This poster also presents design decisions and challenges encountered in the development of this system.

Nima Shoghi

Georgia Tech

3rd Place

SLAM Performance on Embedded Robots

See Abstract

We explore whether it is possible to run the prevalent ORB-SLAM2 simultaneous localization and mapping algorithm in real-time on the Raspberry Pi 3B+ for use in embedded robots. We use a modified version of ORB-SLAM2 on the Pi and a laptop to measure the performance and accuracy of the algorithm on the EuRoC MAV dataset. We see similar accuracy between the two machines, but the Pi is about ten times slower. We explore optimizations that can be applied to speed up execution on the Pi. Finally, we conclude that with our optimizations, we can speed up ORB-SLAM2 by about five times with a minor impact on accuracy, allowing us to run ORB-SLAM2 in real-time.

Tags: general

Awards