DC20

UMD Data Challenge 2020

Data Exploration for a Sustainable Planet

February 22 – 29, 2020


Sponsors


Awards

Grand Prize Award

Team DC20035: Yanzhi Shen and Zhenyang Wang
Mentor: Alex Baker, Booz Allen Hamilton
Dataset: 
New Pollution, Provided by Booz Allen Hamilton

Best Global Sustainability Project

Team DC20006: Nisha Dayananda, Ankit Dhall, Kankshi Dhar, Monalisa Swami
Mentor: Jamey Hanson, Amazon Web Services
Dataset: Ocean Cleanup, Provided by Booz Allen Hamilton

Most Innovative Project

Team DC20065: Maksim Eren, Nick Solovyev, Aminat Alabi
Mentor: Michael Behrens, Amazon Web Services
Dataset: Ocean Cleanup, Provided by Booz Allen Hamilton

Highest Quality Project

Team DC20036: Wenshan Cao, Yannan Liu, Shaoli Qian
Dataset: Campus Traffic Count Sensor Data, Provided by UMD Department of Transportation Services

Best Expression of Results

Team DC20015: Malav Parekh, Emmet Ryan, Andrew Carroll
Mentor: Jan-Mou Li, Metropolitan Washington Council of Governments
Dataset: Regional Traffic Count, Provided by Metropolitan Washington Council of Governments

Best Team Presentation

Team DC20074: Jen Sun, Alex Wayne
Mentor: Douglas VanDerwerken, USNA
Dataset: HUD’s Programs, Provided by U.S. Department of Housing and Urban Development

Outstanding Undergraduate Project

Team DC20045: Luke Gibson, Joshua May
Mentor: Goeff, Burstein, Merkle, Inc.
Dataset: Campus Traffic Count Sensor Data, Provided by UMD Department of Transportation Service

Outstanding Graduate Project

Team DC20052: Bojing Fan, Wanyun Yang, Ziai Wang
Mentor: Jennifer Chaffee, JHU/APL
Dataset: New Pollution, Provided by Booz Allen Hamilton

Outstanding Combined Undergraduate and Graduate Project

Team DC20030: Wei-Hsuan Hung, Ruthwik Kuppachi, Chuyu Zhang, Parth Boricha
Mentor: Alexandra Hansma, Booz Allen Hamilton
Dataset: Ocean Cleanup, Provided by Booz Allen Hamilton

Outstanding UMD Project

Team DC20035: Yanzhi Shen and Zhenyang Wang
Mentor: Alex Baker, Booz Allen Hamilton
Dataset: 
New Pollution, Provided by Booz Allen Hamilton

Outstanding UMBC Project

Team DC20065: Maksim Eren, Nick Solovyev, Aminat Alabi
Mentor: Michael Behrens, Amazon Web Services
Dataset: Ocean Cleanup, Provided by Booz Allen Hamilton

Outstanding USNA Project

Team DC20080: Ryan Eilers, Jason Henry, Deon Odom, Chesley Krug
Mentor: Will Traves, USNA
Dataset: Regional Traffic Count, Provided by Metropolitan Washington Council of Governments

People’s Choice Award

Team DC20027: Angela Chou, Varun Goenka, Dhruv Popat, Siting Yang
Mentor: Jeff Henrikson, UMD, Atmospheric and Oceanic Science
Dataset: Campus Traffic Count Sensor Data, Provided by UMD Department of Transportation Services


Datasets

  1. Organization: Maryland Small Business Development Center (SBDC)
    • Project Name: Training Program
    • Description: Maryland Small Business Development Center (SBDC) provides free training and individual business consulting to entrepreneurs with existing and pre-venture small businesses. The dataset lists entrepreneurs who participated in at least one group training during the last five years. The dataset contains 13 variables and 25,736 rows. The organization is interested in identifying the historical market of its training attendees and appreciate training topics of interests.
  2. Organization: University of Maryland Alumni Association
    • Project Name: Outreach Program
    • Description: The Alumni Association strives to make data-driven decisions to continuously improve the alumni outreach programs and events for students, alumni, faculty, and staff. Through an extensive coding system, alumni participation is recorded in our database by a three-part, alphanumeric activity code. The Alumni Association would like help in identifying the variables that are correlated to our desired outcomes of higher attendance of first-time attendees and major gift prospect attendees. The dataset contains 635 records across 7 fiscal years.
  3. Organization: Booz Allen Hamilton
    • Project Name: Recycling Diversion
    • Description: In New York City, Recycling Diversion rate and Capture Rate Capture are collected for each zone and district. This ratio measures how much of the targeted materials are actually being recycled, which is a measure of how successfully such materials are recycled. In order to understand what the participation rate or rate at which a district recycles is, we can explore the data for recycling collection. This gives opportunities for targeting education for specific zones and specific types of recycling. We can also use this dataset to predict recycling behavior. This is a time series data set containing 9 columns and 2,832 rows.
  4. Organization: U.S. Department of Housing and Urban Development
    • Project Name: HUD’s Programs
    • Description: The U.S. Department of Housing and Urban Development (HUD) provides annual rental subsidy to 9.5 million individuals in 4.6 million households. There are multiple subsidy programs, but they serve the same basic function of providing a monthly subsidy to make housing affordable for primarily extremely low-income households. HUD’s subsidy programs can be broadly lumped into project-based housing and tenant-based housing. For programs that can be lumped into project-based housing, including Public Housing (PH), Project Based Vouchers (PBV) and a variety of Multifamily Housing (MH) programs (Project Based Section 8, Section 811, and Section 202), the subsidy is tied to a physical unit. In contrast, the Housing Choice Voucher (HCV) program, considered tenant-based housing, has its subsidy tied to the household. The three datasets provided each contain a random sample of more than 40,000 households in HUD’s PH, MH and HCV programs in 2009, 2014 and 2018, respectively.
  5. Organization: University of Maryland Department of Architecture
    • Project Name: D.C. Building Energy Benchmarks
    • Description: The Clean and Affordable Energy Act of 2008 established that all private buildings over 50,000 gross square feet within the District of Columbia, including multifamily residences, must annually measure and disclose their energy and water consumption to the Department of Energy and Environment (DOEE). Benchmarking is defined as tracking a building’s energy and water use and using a standard metric to compare the building’s performance against past performance and to its peers nationwide. These comparisons have been shown to drive energy efficiency upgrades and increase occupancy rates and property values. This dataset contains more than 3,800 geocoded building records from 2013 to 2017.
  6. Organization: U.S. Department of Housing and Urban Development
    • Project Name: The Low-Income Housing Tax Credit (LIHTC) Program
    • Description: Created by the Tax Reform Act of 1986, the Low-Income Housing Tax Credit program (LIHTC) gives State and local LIHTC-allocating agencies the equivalent of nearly $8 billion in annual budget authority to issue tax credits for the acquisition, rehabilitation, or new construction of rental housing targeted to lower-income households. Although some data about the program have been made available by various sources, HUD’s database is the only complete national source of information on the size, unit mix, and location of individual projects. With the continued support of the national LIHTC database, HUD hopes to enable researchers to learn more about the effects of the tax credit program. This dataset contains 34,234 rows of location and housing attributes of LIHTC properties nationwide. HUD is interested in exploring the spatial distribution of the 100% low-income properties and mixed-income properties and other unique patterns.
  7. Organization: Metropolitan Washington Council of Governments (MWCOG)
    • Project Name: Regional Traffic Count
    • Description: MWCOG has organized and published annual average daily traffic by vehicle classifications for the National Capital Region. The suggested dataset for this Challenge represents the average hourly traffic counts between 6 and 9 A.M. in a typical 2017 weekday. As the regional transportation planning organization, we are interested in knowing what can be told about the traffic conditions of the morning rush hours in the Capital Region.Tell a story about the AM rush hour traffic in the region using data analyses and visualizations.893 rows of data are contained in this dataset.
  8. Organization: The U.S. National Cancer Institute (NCI)
    • Project Name: Health Information National Trends Survey (HINTS5, Cycle 2)
    • Description: The U.S. National Cancer Institute (NCI) has been conducting the Health Information National Trends Survey (HINTS) since 2003 to learn about U.S. adults’ cancer-related perceptions and knowledge, their health behaviors, and their health-related information access, needs, seeking, and use. This survey is administered every few years to civilian, non-institutionalized adults in the U.S. The most recent HINTS dataset available is HINTS 5, Cycle 2. This dataset is dated November 2018, but the data was actually collected between January 26 and May 2, 2018. The file includes data from a total of 3,434 completed questionnaires and 70 partially completed questionnaires, for a total of 3,504 questionnaires.
  9. Organization: University of Maryland Department of Transportation Services (DOTS)
    • Project Name: Campus Traffic Count Sensor Data
    • Description: Over the summer in 2019, DOTS installed Numina sensors at 5 intersections on campus to count bikes, pedestrians, cars, buses, trucks and identify path of travel. The purpose of collecting traffic count sensor data is to improve DOTS’ ability to monitor traffic and understand travel behavior by pedestrians, bicyclists, and drivers over time. This simplified dataset consists of traffic count information by different modes (walk, bike, car, bus, and truck) at a 15-minute interval in a 2-week period (Oct 28 – Nov 10) for all 10 sensors on campus. The dataset has 7 columns and 13,480 rows.
  10. Organization: Booz Allen Hamilton
    • Project Name: New Pollution
    • Description: Air pollution is a mixture of particles and gasses in the atmosphere referred to as pollutants. Pollutants are harmful particles produced as a byproduct of human industrial advancement such as cars, factories, agriculture, and mining operations. Air with unhealthy concentrations of these pollutants cause a variety of diseases, undrinkable water, and an uninhabitable ecosystem. From this dataset, one can create a model to forecast the different concentrations of pollutants. Possible risks and effects of current industrial processes can be assessed using the information from these models. This is a time series data set identifying the location and time certain pollutants are produced. This dataset contains 43,895 rows.
  11. Organization: Booz Allen Hamilton
    • Project Name: Ocean Cleanup
    • Description: We care about keeping our oceans clean because they produce over half the world’s oxygen and absorb 50 times more carbon dioxide than our atmosphere. The ocean transports heat from the equator to the poles, regulating our climate and weather patterns. How many resources do we need to clean up the oceans? Where do we optimize resource allocation? TIDES is a public data system containing the world’s largest ocean trash dataset, all collected by volunteers. Filtered to U.S. data only, including variables such as zone, clean-up type, clean-up date, pounds of trash, and all the different types of trash collected. The dataset includes 37,904 rows or records with 61 columns or variables.

DC21 Schedule | Dataset Information | Judging Criteria


Hosted by