DataStage基础培训教程.ppt
DataStage基础培训,Jerry 2006.03,2,议程,Hello World DataStage Components Define Parameter & Table Hash File、Transformer、Aggregator Director & Monitor Administrator & Manager Routine & Control,3,演示:Hello World,4,Hello World DataStage Components Define Parameters & Tables Hash File、Transformer、Aggregator Director & Monitor Administrator & Manager Routine & Control,议程,5,DataStage Architecture,Target (Database or File),ODBC/Native,DataStage Connect API,DataStage Server (WinNT, Win2000 or UNIX),ODBC/Native,Data Sources (Database or File),DataStage Connect API,DataStage Connect API,DataStage Connect API,Data flow,Data flow,6,DataStage Components,Manager,Designer,Director,Metadata collection and management,Design process flow,Run jobs, check logs and set schedules,DataStage,Administrator,Create, Edit projects,7,Hello World DataStage Components Define Parameter & Table Hash File、Transformer、Aggregator Director & Monitor Administrator & Manager Routine & Control,议程,8,全局变量与Job变量,全局变量 - 生命周期:整个Project - 在Administrator中定义 Job变量 - 生命周期:一个Job - 在Designer、Manager中定义,9,演示:定义一个Job变量,在Designer中定义参数,10,Meta data definition,元数据管理的重要组成部分 在Manger或Design中定义 演示: - import from a flat file in .txt format - import from an DBMS table,11,演示Table Definition,在Manage中定义Table,12,Hello World DataStage Components Define Parameter & Table Hash File、Transformer、Aggregator Director & Monitor Administrator & Manager Routine & Control,议程,13,演示:生成事实表,明细表,事实表,关联,聚合,14,Hash File,用途: - 左连接时用作副表 - 多次被访问的数据集 - 存储其他临时数据 关键点: - 必须指定key - output的position必须与input一致,15,Transformer,用途: - 提供丰富的运算符和函数 - 数据清洗、转换 - 关联多个数据源 关键点: - 副表的key必须被主表的某个字段关联 - 尽量避免两个Transformer直接相连,16,Aggregator,用途: - Sum, Max, Min, Average等聚合函数 - 一般用于生成事实表,17,Hello World DataStage Components Define Parameter & Table Hash File、Transformer、Aggregator Director & Monitor Administrator & Manager Routine & Control,议程,18,Debug and Tuning,View Status and Logs - status, log, detail等多种视图 - 配合Monitor来查错、调优,19,Job Status,Not Compiled Compiled Reset Running Finished Finished (with warning) Abort,20,Schedule,Job Add to Schedule,21,Hello World DataStage Components Define Parameter & Table Hash File、Transformer、Aggregator Director & Monitor Administrator & Manager Routine & Control,议程,22,Administrator,Add a new project Modify project properties - 字符集 - 日志保留天数 - hash file and write catch Define environment viable,23,Manager,Import and export projects or jobs - 两种文件格式:.dsx .xml - 整个project, 根据category Table definition Manage Routine,24,演示:备份project,25,Hello World DataStage Components Define Parameter & Table Hash File、Transformer、Aggregator Director & Monitor Administrator & Manager Routine & Control,议程,26,Routine,一种自定义函数,使用VB语法 - Transformer Routine - Before/After Subroutine 系统内置了丰富的Routine 演示:定义一个Transformer Routine,27,Job Control,在一个Job中调度其他Job,28,Q&A,Thanks!,