Data Sources of Interest:

  • DFP Log Data
  • Location Data (Users)
  • Geographical Mapping (Misc.)
  • Retail Stores Location and MetaData
  • Demographics & Lifestyle Attributes
  • Triggers
  • Moat

Tools:

  • Redshift/PostgreSQL
  • MySQL (RDS)
  • R Studio (EC2)
  • Python
    • Boto (Github)
    • Anaconda
    • DB APIs
  • Jupyter (EC2)
  • EC2 (Single Machine: In Memory)
  • DFP Gsutil
  • Mapping Software (Experimental)
    • CartoDB
    • MapLarge
  • Custom UI (Internal)
  • Java (GCS -> S3 -> Redshift)
In [21]:
%%bash
bash /etl/dfp/gcs/gcs_list_files.sh
In [22]:
dfp_logs = pd.read_table('/ebs/dfp_logs_gcs.txt', header=None)
dfp_logs[0].apply(lambda x: x.split('/')[3].split('_')[0]).drop_duplicates()
Out[22]:
0                         NetworkActiveViews
1455                       NetworkActivities
2901              NetworkBackfillActiveViews
4356                   NetworkBackfillClicks
5811              NetworkBackfillImpressions
7266     NetworkBackfillRichMediaConversions
8667         NetworkBackfillVideoConversions
10122                          NetworkClicks
11577                     NetworkImpressions
13032                NetworkVideoConversions
Name: 0, dtype: object

What are we doing: Using the location data to create insights and products that can be monetized with businesses (both advertising and non-advertising) · Understanding and exploration of location data

· Building a places database

· Understanding patterns and segmenting customers based on types of businesses/places visited

· Predict users who are going to visit a new business based on places visited and demographics/other third party

· Visualization for sales(movement of users across the day – traffic to a Walmart/coffee shop by time of day/day of week) and for the analytics we are currently doing

Data we use: · Location Data

· Places Data

· Exploring the use of other data sources (within and outside the company)

o Factual/Lotame Segments

o User Information that we collect

Tools: · Python

· ArcGIS

Some Challenges: We are currently performing this using one day of location data. Some capabilities that we need to develop · Scaling these analyses across larger data

· Learning and using geospatial libraries in Python instead of using standalone geospatial tools (ArcGIS Integration, QGIS)

· Integration of other data sources to the location and places data