• RESUME

  • Education

    MSc. in Data Science ·

    Columbia University

    2015 - 2016

    New York, USA


    BSc. in Applied Mathematics ·

    ITAM

    2010 - 2012

    Mexico City, Mexico


  • Work Experience

    Data Scientist ·

    Sinnia

    2012 - 2015

    Mexico City, Mexico


    Data Science for Social Good Fellow ·

    University of Washington

    Summer 2016

    Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

  • Courses and Certifications

  • Key Courses

    Columbia University

    • Neural Networks and Deep Learning (2016)
    • Machine Learning for Data Science (2016)
    • Computer Systems for Data Science (2016)
    • Exploratory Data Analysis and Visualization (2016)
    • Social Network Analysis (2016)
    • GIS Spatial Data Analysis (2015)
    • Q&A for IBM Watson (2015)

    ITAM

    • Machine Learning (2012)
    • Time Series Analysis (2012)
    • Survival Analysis (2012)
    • Numerical Optimization (2012)
    • Bayesian Statistics (2012)
    • Simulation (2012)
    • Statistical Learning (2011)
  • Certifications

    Data Science Specialization ·

    Johns Hopkins University via Coursera

    2015

    Machine Learning Summer School ·

    Carnegie Mellon University

    2014

    Tackling the Challenges of Big Data ·

    MITx

    2014
  • Projects

  • MSc. in Data Science Capstone Project at Columbia University

    Recommendation systems depend heavily on user reviews and other proprietary data to personalize their suggestions for news, movies and music. However, user data is usually not pucblicly available. Drawing from 30,000-movie descriptions stored in the Internet Movie Database, or IMDb, we built two recommendation models: one based on movie plot, and the second on other movie features, such as actors, movie genre and the size of the production staff. Under the related app we built, a user picks a movie and receives two sets of recommendations.

    Neural Networks and Deep Learning Project at Columbia University

    We implemented the style transfer algorithm designed by the authors Gatys et al. which can generate new synthesized images by combining the style and the content of two images. We explain the methodology developed by the authors and implement the solution using Theano. We find that results are highly dependent on the values specified for the hyper parameters and training is quite time intensive due to the large number of parameters required. In addition, we successfully generate new images combining other pictures with the style of some well known painters.

    Explore the US flight network and routes by carrier and date and a centrality analysis on the airports.

    Exploratory Data Analysis Course 2016 at Columbia University

    Visualize the relationship between atmospheric pressure variations with catastrophic events.

    Exploratory Data Analysis Course 2016 at Columbia University

    Visualize NYC amenities along with demographic and census indicators.


    Data Science for Social Good Fellowship at the University of Washington

    As a Data Science for Social Good Fellow at the University of Washington eScience Institute, I worked on the CrowdSensing the Census project. Our goal was to develop a reliable and general model that can predict socio-economic levels of a city by making use of real-time data such as OpenStreetMap data or Cellphone Detail Records (CDR) data. This work is meaningful because it provides an up-to-date, cost-efficient alternative to large scale censuses, which are unable to capture the rates of global urbanization. Specifically, we worked on a prediction model from available datasets targeting the city of Milan and Mexico City. We chose these cities as the Big Data Challenge initiative made it possible to open Telecom Italia data (CDR) to the public. We chose Mexico City because of the availability of OpenStreetMap data, and to include diverse urban fabrics within our study. As ground truth estimates of census-level variables, we made use of census data that included statistics about illiteracy, educational attainment, percent foreign population, renter status, unemployment and work force participation.

    Map of distribution of wealth, Mexico City

    From OpenStreetMap, we extracted point of interest and urban amenity features, which were included in our prediction model as indicators of access to resources and social or economic wealth in a given part of the city. We also extracted street network data, as two centrality measures (closeness and betweenness) which lend to an understanding of the distribution of connectivity and access in cities.

    Map of betweenness centrality in Mexico City.

    For Milan, we included Call Detail Record data, which included data on the raw amounts of calls in and out of areas of the city, as well as network features between areas of the city. With these data, we developed a model and visualization tool, that we hope to expand to develop a useful means to use open data to predict wealth distributions in cities, globally.

    Screenshot of dashboard created to visualize results

    We also presented our findings at the University of Chicago Data Science for Social Good Conference 2016, and our project poster can be seen below.

    DevFest 2016 Hackathon at Columbia University

    Explore and analyze the variation in presidential candidates' emotions during debates. For that matter, we took as a sample the debate between Hillary Clinton and Bernie Sanders, which took place on February 4th, 2016. This project relies heavily on the following APIs:
    · Microsoft® Project Oxford Video API
    · Microsoft® Project Oxford Emotion API
    · Alchemy® Entity Extraction API

    Awards and impact


    · Microsoft Prize Winner at the DevFest 2016 Hackathon at Columbia University
    · Honorable Mention at NYC Media Lab 2016 at Columbia University.
    · Featured in Quartz.com

    Q&A for IBM

    Developing Data Products Course by Johns Hopkins via Coursera

    Analysis of social exclusion (marginalization) in Mexico.
    Visualize · contains tools to visualize the marginalization variables in the map.
    Create Index · contains tools to create a marginalization index using variables chosen by the user.
    Clusterize · contains tools to cluster similiar states into groups.

    Finalist in the National Data Science Competition 2014 in Mexico.


    Link: dataton.datos.gob.mx/
    Analyse tourism flows in Mexico using data from 9 million tweets.
    Provide an overview of the touristic experience using sentiment analysis.
    Create engaging and interesting interactive visualisations

  • Skills

  • Informatic

    R
    90%
    Python
    80%
    MySQL
    75%
    Spark
    70%
    Scala
    70%
    MongoDB
    60%
    RoR
    60%
    HTML
    60%
    Javascript
    50%
  • CONTACT

  • Contact Info