Datasets

Datasets

1. Organization: Organization: Montgomery County Commuter Services (MCCS)

    • Project Name: Transportation Management
    • Description: In Montgomery County, Transportation Demand Management (TDM) programs have been focused on major activity centers where high density commercial and residential land uses are located – and where traffic congestion has been most significant. TDM programs are used as an effort to reduce traffic congestion, improve air quality, and promote sustainable land use and transportation patterns. The County hopes to understand how its efforts to reduce transportation impacts, using TDM programs to achieve a certain percentage of Non-Auto Driver Mode Share (NADMS), translates into savings in Vehicle Miles of Travel (VMT). The dataset contains 350 variables representing survey questions and 11,400 rows. There are seven years of survey data to choose for comparison.

Team Projects 

Team DC21026: Khawaja Adeel Ahmad, Melanie Veliz, Franklin Dasho, Michael Prebble | Data set 1 – Data Challenge | https://github.com/adeelahmad1998/Data-set-1—Data-Challenge-.git

Team DC21047: Nathaniel Schreiner, Gregory Smith, Moses Tauteoli | The Hitch | https://www.github.com/gregbsmith/DC21047

Team DC21052: Cecilia Ballesteros, Daniel Bartosik, Sean Doherty | DC21052-Lvl1-Analysis | https://github.com/m241494/DC21052-Lvl1-Analysis



2. Organization: UMD National Center for Smart Growth (NCSG)

      • Project Name: Behavioral Changes during COVID-19
      • Description: Researchers from UMD’s National Center for Smart Growth (NCSG) designed a survey to investigate the influence of the pandemic and statewide stay-at-home orders on the way people travel to meet their essential needs, do their routine daily exercises, and get to work—if employed in an essential service sector. Monitoring and analysis of the changes in travel behavior and the level of physical activity would be beneficial in many ways; it would help for a more efficient and comprehensive analysis of the various impacts the current situation has on transportation needs, and physical and mental health, and would help us better plan and manage future situations. The dataset can be used for analysis of influence of physical activity on physical and mental health, socio-demographic characteristics on general behavior changes, and other topics of interest. Between April 10th, 2020 and July 8th, 2020, 587 responses were collected from all around the country while the final dataset contains 564 rows with 29 variables representing the 26 survey questions.

Team Projects 

Award Winner: Outstanding Undergraduate Project
Team DC21001: Sakib Sarwar, Rachel Macairan, Eder Ramirez, Grant Buttrey | Behavior Changed During CVID-19 | https://github.com/sakib-sarwar/Team1DataChallenge.git

Team DC21004: Believe Aklaku, Li-Chih Wang, Stephanie Umutoni | Behavior Changes during COVID19 lockdowns | https://github.com/sumutoni/DC2021_Team21004

Team DC21005: Nancy Nguyen, Anna Nguyen, Linda Quach | COVID-19 Behavioral Changes | https://github.com/nnguyen-7/DC21-Team5

Award Winner: Highest Quality Project
Team DC21006: Adam Levav, Junrong Liu, Ting-Yu Liu, Walesia Robinson II | Applying an Intersectional Lens on Behavioral Changes During COVID-19 | https://github.com/Aurathic/UMD-Data-Challenge-2021

Team DC21009: Marie Brodsky, Liron Karpati, Joey Kim, Amanda Liu | Pandemic Parenting and Lifestyle Changes | https://github.com/mariebrodsky/data-challenge

Team DC21014: Annie Hu, Hsin-Pei Yang, Farheen Nabi | COVID-19 BEHAVIORAL CHANGES | https://github.com/ahu2020-hu/DC21014

Team DC21025: Nathan Bezualem, James Pham, Alex Ortunio | Behavioral changes due to COVID19 | https://github.com/vladodio/dataChal

Team DC21027: Christopher Mantzouranis, Olivier Amoussou, Kathya Soto, Sebastian Polanco | Behavioral Change due to COVID-19  | https://github.com/CMantzouranis/Team27_GitHubRepo

Team DC21028: Kiara Raab, Chibuikem Oparaoji | Exercise Analysis on the Pandemic | https://github.com/kiaraab/DC21

Award Winner: Best Expression of How to Cultivate Better Living
Team DC21030: Rohan Cowlagi, Ruthwik Kuppachi, Iskander Lou, Jacquelyn Smith | Life In Lockdown | https://github.com/rohanc18/UMD-Data-Challenge-21-Team-30

Team DC21038: Rachel Rowe, Frederick Sell | Behavior Changes During COVID-19: Who People Are and How That Affects Their Activity Level | https://github.com/rachelrowe898/dc21_team38

Team DC21040: Neviya Prakash, Whilmina Dsouza | Covid- 19 Behavior Changes | https://github.com/WhilminaD/DataChallenge2021_BehaviorChangesCovid2019



3. Organization: United States Department of Agriculture (USDA)

    • Project Name: Packaged Meals
    • Description: Over the past several decades, Americans have grown to rely on the convenience of foods prepared outside of the home. Since 2018, USDA released a branded food product database (BFPD) which is the result of a Public-Private Partnership, whose goal is to enhance public health and the sharing of open data by complementing USDA Food Composition Databases with nutrient composition of branded foods and private label data provided by the food industry. A subset of data was prepared from BFPD to include categories that relate to packaged meals. The Department of Agriculture would like help in identifying the most popular ingredients and the combinations of those ingredients. Agencies can use this information for nutrition stability of these products while manufacturers can develop plans for future production. There are 13 variables and 4,438 rows in the dataset.

Team Projects 

Team DC21007: James Wang, Amol Agrawal | Analysis of the USDA Packaged Meals Dataset | https://github.com/Akameki/UMD-DC21

Team DC21016: Jonathan Chen, Samuel Shen, Jonathan Chen | Packaged Food: What is it? | https://github.com/jonathanchen985/UMD-Data-Challenge-Team-16.git

Award Winner | Grand Prize: Best Overall Project 
Team DC21021: Allison Cahanin, Katherine Toren | Improving Quality of Life through Nutrition | https://github.com/allisoncahanin/packaged-meals-DC-21

Team DC21022: Abel Tilahun, Josue Mejia, Lidya Admasu| USDA Packaged Meal | https://github.com/atilahu6/USDA-package-meal—Team22

Team DC21033: Apurva Prakash Dixit, Stevan Sunny Thomas| USDA Packaged Meal | https://github.com/purpleblack7/UMDDC21

Team DC21056: Zachary Bell, Jen Sun| Add it to Our Grocery List! | https://github.com/acsociety/data-challenge-2021.git

Team DC21067: Eashwar Sathyamurthy| Packaged Foods Data Analysis | https://github.com/Eashwar-S/DC21-Data-Challenge



4. Organization: Social Data Science Center (SoDA)

    • Project Name: COVID-19 Global Symptoms Tracker
    • Description: University of Maryland’s Joint Program in Survey Methodology and Carnegie Mellon University’s Delphi Research Group, collaborated with Facebook to invite people to participate in surveys that ask about how they are feeling, including any symptoms they or members of their household have experienced and their risk factors for contracting COVID-19. The surveys are designed to provide valuable information to help monitor and forecast how COVID-19 may be spreading. The organization is interested in learning what insights policymakers can use to determine where to send resources and what patterns are present in changing attitudes toward COVID and physical distancing.  All datasets include 20 variables while rows vary from 27 to 65,000 rows depending on month and area. Surveys range from April to November of 2020 and are separated by country and region.

Team Projects 

Award Winner: Best Team Presentation
Team DC21003: Jaganathan Velraj, Kaveh Vakili, Fekedeselassie Kinfu| DC2021 Covid Tracker | https://github.com/jv232/DC2021_CovidTracker

Team DC21013: Marilyn Pothen, Chika Chuku, Yi-hsuan Chen, Rankin D’Souza | COVID-19 Indicators Tracker | https://github.com/tgaidis/UMDdataChallenge21

Award Winner: Best Expression of Results
Team DC21042: Theodore Gaidis, Gabriel Sestieri, Manar Al-badarneh, Brendan Goodhue | COVID-19 Indicator Analysis | https://github.com/tgaidis/UMDdataChallenge21

Team DC21057: Eduardo Gomez, Paul Rahner, Jenny Luo, Isaac Cho | The Affects of Socioeconomic Factors and Organizational Trust on Vaccinations | https://github.com/eduardogomez12/UMD_DC_DC21057

Team DC21079: Mark Jung | Insights from COVID-19 US Symptoms Data | https://github.com/markojungo/DC21



5. Organization: UMD Department of Transportation Services (DOTS)

      • Project Name: VeoRide E-scooter Transportation
      • Description: The company that provides the e-scooter service in the UMD area, VeoRide Inc., is a mobility service provider that manages their micro mobility fleet through the use of the Mobility Data Standard (MDS). Data generated from individual e-scooters and stored in VeoRide’s computers contain the information necessary for this study. The dataset comprises 62 days (October 2020 and October 2019) of historical data, to include ride counts, event times, trips, routes, and status changes. The Department of Transportation Services would like to know how the publicly available e-scooter service available on campus provided by VeoRide Inc., supplements the campus transportation network. Within the timeframe of the data, there are 14 variables and 40,325 rows of recorded rides.

Team Projects 

Team DC21010: Ashish Bachavala, Matt Braun | Veo Revenue and Network Efficiency Analysis | https://github.com/ashish0810/dc21010

Team DC21012: Harsh Pundir, Aayush Shah, Aseem Baji, Navina Kaur Sethi | Data Challenge – VeO Ride Inc. | https://github.com/HARSHPUNDIR/Data-Challenge-2021-DOTS

Award Winner: Best Team Presentation
Team DC21015: Claire Mytelka, Zachary Dorris, Bryan Rezende, Ethan Levy | VeoRide E-Scooter Usage within the UMD Transportation System | https://github.com/cmytelka/Team15-DC21

Team DC21018: Shanil Kothari, Siddhesh Gupta | Data_Challenge_21 | https://github.com/ThatGamingFella/data_challenge_21

Award Winner: Most Innovative Project
Team DC21029: William Grunow, Savia Gordon, Irfaan Jamarussadiq, Tanishq Kaushik | How to increase the adoption of e-scooters in College Park | https://github.com/EntropicEffect/UMD_DC_2021_Team29_Presentation

Team DC21031: Miranda Clay, Chengzi Tian | Voeride Trends: 2019 vs. 2020 | https://github.com/mclay11/DC21-Team31

Award Winner: Outstanding Combined Undergraduate and Graduate Project
Team DC21035: James Boggs, Madeline Raith, Ziqin Ni | Veoride E-Scooter at MD College Park | https://github.com/jboggs1/DataChallenge2021

Team DC21036: Arfa Sheikh, Naila Naeem, Ehaab Basil | MD E-Scooter Analysis | https://github.com/ehaabbasil/TEAM36-Data-Challenge-E-Scooter-lvl_3

Team DC21055: Elana Kozak, Michael White, Jennifer Jung | Electronic Scooters on Campus: How They Are Used | https://github.com/ekozak7/DC21055https://github.com/ekozak7/DC21055

Team DC21058: Jeff Peters | Student Transportation in the COVID World | https://github.com/Jeff-Peters2158/DC21058

Team DC21065: Abhinav Modugula | Fast, Convenient, and Safe: An Analysis of E-Scoters Rides at UMD | https://github.com/abhinavmodugula/E-Scooter-Analysis

Team DC21068: Allison Lu, Anne Zappas | VeoRide x UMD | https://github.com/allisonluu/DC21-Team68



6. Organization: The U.S. National Cancer Institute (NCI)

    • Project Name: Health Information National Trends Survey (HINTS 5, Cycle 3)
    • Description: The U.S. National Cancer Institute (NCI) has been conducting the Health Information National Trends Survey (HINTS) since 2003 to learn about U.S. adults’ cancer-related perceptions and knowledge, their health behaviors, and their health-related information access, needs, seeking, and use. This dataset was released in  January 2021 and contains data that was collected between February and June 2020. NCI is looking for analysis of respondents’ internet use, use of social networks, knowledge of clinical trials, political viewpoints, and other factors compared to feelings of frustration, anxiety/depression, or their confidence in their ability to take good care of their health. The dataset includes data from a total of 3,792 completed questionnaires and 73 partially completed  questionnaires, for a total of 3,865 questionnaires (rows) and 438 variables of interest.

Team Projects 

Team DC21002: Emmet Ryan, Ryan Pindale, Michael Zhang, Ayman Fatima | Chronic Disease and Healthcare | https://github.com/DC21002/ChronicDisease

Team DC21051: MIDN 2/C Chase Lee, MIDN 2/C Brigitta Szepesi | Identifying Key Factors in Determining Opportunity Zones | https://github.com/m223804/umd_datachallenge_team51



7. Organization: United States Department of Housing and Urban Development (HUD)

    • Project Name: Opportunity Zones
    • Description: Opportunity Zones are a place-based incentive that were created by the Tax Cuts and Jobs Act of 2017. This incentive allows investors to allocate unrealized capital gains into Qualified Opportunity Funds to invest in Qualified Opportunity Funds. Governors of each state were able to propose up to 25% of low-income census tracts to be Opportunity Zones. States were also eligible to nominate contiguous census tracts adjacent to low-income census tracts if the adjacent census tract had a median family income within 125% of the eligible low-income census tract and the eligible low-income census tract was selected. Students are expected to be able to link/join/merge other datasets to the Opportunity Zone Eligible Census Tracts dataset and identify any trends using Census and/or American Community Survey data. The Opportunity Zone dataset contains 17 variables and 72,882 rows. There are 3 American Community Surveys, each with 73,764 rows and between 127-248 variables.

Team Projects 

Award Winner: Outstanding Graduate Project
Team DC21061: James Maslek | Machine Learning approach to Opportunity Zone Designation| https://github.com/jmaslek/UMD_DataChallenge



Dataset Levels


Level 1: Participants with little to no knowledge in data science. The problem statement is straightforward about what the final product may look like. The dataset contains enough information to answer the questions in the problem statement. Start from the basics and create an interesting story.


Level 2: Participants with basic data analysis knowledge. The problem statement is open-ended yet straightforward. The dataset has a standard structure suitable for beginners. Creative and interdisciplinary solutions are welcomed.


Level 3: Participants with some data analysis background. The problem statement is open-ended about what the final product may look like. The dataset may contain many variables of interest. Analyses from different angles by various techniques are encouraged.


Level 4: Participants with advanced data analysis skills. The problem statement is open-ended and requires multitudes of analytical perspectives and visualizations. Statistical modeling is highly recommended.


Level 5: Participants with advanced data analysis knowledge and skills. The problem statement includes requirements of modeling. The dataset has a complex structure, numerous variables of interest, and spatial-temporal dimensions.


DC21 Schedule | Dataset Information | Judging Criteria


Hosted by