Skip to content

earthmover

earthmover transforms collections of tabular source data (flat files, FTP files, database tables/queries) into text-based (JSONL, XML) data via YAML configuration.

earthmover demo

Quick-start

  1. Install earthmover with

    pip install earthmover
    

  2. Create an earthmover.yml configuration file that defines your project config, data sources, transformations, and destinations:

    earthmover.yml
    version: 2
    
    config:
      output_dir: ./output/
    
    sources:
      csv_source:
        file: ./data/file.csv
        header_rows: 1
      sql_source:
        connection: "postgresql://user:pass@host/database"
        query: >
          select column1, column2, column3
          from myschema.mytable
    
    transformations:
      stacked:
        source: $sources.csv_source
        operations:
          - operation: union
            sources:
              - $sources.sql_source
    
    destinations:
      mydata:
        source: $transformations.stacked
        template: ./json_templates/mydata.jsont
        extension: jsonl
        linearize: True
    

  3. Create the ./json_templates/mydata.jsont template file (which may use Jinja) to use when rendering the data for your mydata destination:

    ./json_templates/mydata.jsont
    {
        "column1": "{{column1}}",
        "column2": "{{column2}}",
        "column3": "{{column3}}"
    }
    
  4. Now run earthmover

    earthmover run
    
    and look for the output file output/mydata.json.

How it works

earthmover is similar to dbt, though it executes data transformations locally using dataframes (rather than in a SQL engine). Like dbt, it creates a data dependency DAG from your earthmover.yml configuration and materializes output data in dependency-order.

Read more

Above is a simple quick-start example. Please read the documentation for more information about earthmover's many features, including: