Friday, 17 April 2015

BRIEF INFO ABOUT BLOG

 Abinitio launched by Mr.Viswa at Sri Vinay Techhouse and running batches

successfully.

SRI VINAY TECH HOUSE OFFERING THE BELOW COURSES BEST IN QUALITY.

MICROSOFT BUSINESS INTELLIGENCE
INFORMATICA
QLIKVIEW
TERADATA DEVELOPMENT AND DBA
DATASTAGE



ABINITIO launched and running successfully.
This is executed to help job hunters and seekers to provide ease in the interviews.Majorly RESPONSIBILITIES , RESUMES &  SOME FAQS covered.
FOR MORE ATTEND VISWA SIR CLASSES AT INSTITUTE.

OTHER OFFERINGS IN THE  INSTITUTE:


TERADATA

INFORMATICA

DATASTAGE

MICROSOFT BUSINESS INTELLIGENCE

QLIKVIEW


Regards,
Moderator Team.

Thursday, 16 April 2015

Most Important INTERVIEW QUESTIONS


what are the Partition component tell me in detail
what is the purpose of  overide key parameter in the join component ?
what is difference b/w join and merge..components ?
If table A contains 10 and table B contains 0 record ,what is the o/p total number of records if we do cross join
Wat is genaric graph?
how will write the code for reformat compotent as a generic graph?
what is lookup template
what is compressed lookup.
Wat is AB unit ?
what are the grap level parameters? how u define  in u r project
what are the measure of the fact table?
I have 2 way partition flow file we need to generate surrogate key for out file How to generate it
What is wrapper script.
Tell about the rollup and its parameters
what is the diff b/e the rollup and scan
What is the purpose of m_rollup
what is chk-in and chek-out where does we do it and how many ways.
What is pstes.
Tell the most used air_commands in your project.
write any two sample scripts previously used in project

Tell me about yourself and your current project?
What are the components you have used in your graph?
Explain the multi file, multi file system and multi file directory?
What are m_commands you have used?
What is the difference between m_dump and m_expand?
How do u connect to the database?
How do u test whether your dbc file is correct or not?
How do you change 2-way to 4-way multifile?
Have you worked with EME if yes, tell me what all the air commands you have used are?
How do you find out the version of EME?
Tell me the difference between api and utility?
What is output_index and output_indexes?
If I give key null in join component will the component work?
What are the parameters in join component?
What is override key parameter in join component?
What is layout?
What are the errors you have faced?
When will you get the fatal error?
What are the UNIX commands you have used?
What is SED and tell me one example where you have used in your project?
What AWK tell me one example?
How do you find out and delete the files which were created 30 days before?
How do you find out the nth highest salary in the table?
What is skew?
Which parameters first resolved whether graph level or sandbox level?
How do you connect to oracle?
If you select in-memory – input need not be sorted in join component how it will work?

which database you are using
What is the difference between .dbc and .cfg file
What is a Surrogate Key. How do you Create it.
What is conditional dml.
What is Lookup file
Tell me briefly about the parallelism.
Which components cannot support the pipeline parallelism
What are the continuous components
what is the diff b/w scan and rollup.
What is .pset
tell me the importance of EME in abinitio
How you run the graph in Unix environment
Tell me about air commands
What is AB_LOCAL where you will you it.
How to schedule graph's in abinitio.
How you open the multifile in unix.
What is the diff b/w sandbox and EME
What is chek-in and chek-out from gde and unix.
I have some 100 records in that fetch out the records related to word delhi from the line of 56 to 89.
How you improve the performance of previous build graphs.
Tell me about version controlling in abinitio

What is your Role in current Project
Explain the Flow of your current project
What the Database using for your project
How you connect to your Database
What are the permissions you have to access Database
What the .dbc file contains.
What is the difference b/w Truncate and delete.
How you Take data from flat files.
What the requirement document contains exactly.
How to prepare analysis document and What it contains.
How assign keys components can work.
How you get history file.What are primary keys and foreign key
Explain Reformat with parameters and how it acts in Real-time.
Functionality of Roll-up and how to filter the records in it.
I have 100 records in which 10 records need to get aggregated ,left the rest.How will you handle.
What are the components you worked most.
What is view.
Create a view using table and i need only 100 rows in my view instead of all records.
What is m_dump command.
How you execute the script from GDE.
What is wrapper script.
How many graph's you developed up-to here.

which version you are using.
what are the components you are working
what is the scheduling tool you have used.
Tell me about join .
What are the parameters of join explain all.
what is semi-join.
what are cartesian joins.
what is lookup.
what the partition components you worked
What is the difference between merge and concatenate.
what is the diff b/w scan and role-up.
How will you test a dbc file from command prompt.
What is generic graph.
what is wrapper script.
What are m_commands.
What is the purpose of m_rollup.
How to write conditional dml if the file has huge amount of data.
how do you convert4-way mfs to 16-way mfs.
how would you do performance tuning for already builted graph.
Tell the sed and awk commands with some suitable examples.
write two sample scripts previously you wrote.

INTRODUCTION TO ABINITIO

INTRODUCTION TO ABINITIO
The Ab Initio software is a  fourth generation data analysis, batch processing data manipulation graphical user interface (GUI)-based parallel processing product which is commonly used to extract, transform, and load (ETL) data. The Ab Initio product also allows for processing of real-time data.

Ab Initio is having a Two Tier Architecture with Graphical Development Environment(GDE) & Co>Operating system couple together to form a client-server ‘like’ architecture.

 Core products of ABINITIO corporation

Graphical development Environment(GDE):-

GDE is a graphical application for developers which is used for designing and running AbInitio graphs. It also provides:
- The ETL process in AbInitio is repres
ented by AbInitio graphs. Graphs are formed by components (from the standard components library or custom), flows (data streams) and parameters.
- A user-friendly frontend for designing Ab Initio ETL graphs
- Ability to run, debug Ab Initio jobs and trace execution logs
- GDE AbInitio graph compilation process results in generation of a UNIX shell script which may be executed on a machine without the GDE installed

Co>OperatingSystem

Co>Operating System is a program provided by AbInitio which operates on the top of the operating system and is a base for all AbInitio processes. It provdes additional features known as air commandswhich can be installed on a variety of system environments such as Unix, HP-UX, Linux, IBM AIX, Windows systems. The AbInitio CoOperating System provides the following features:
- Manage and run AbInitio graphs and control the ETL processes
- Provides AbInitio extensions to the operating system
- ETL processes monitoring and debugging
- Metadata management and interaction with the EME

ABINITIO EME:--

Enterprise Meta>Environment (EME) is an AbInitio repository and environment for storing and managing metadata. It provides capability to store both business and technical metadata. EME metadata can be accessed from the Ab Initio GDE, web browser or AbInitio CoOperating system command line (air commands)

Thursday, 2 April 2015

Data Warehousing Questions


Hi All,

We like to share data warehousing related questions. This topic covers from scratch to end in High level.


Need of Data warehouse
To Analysis of data and History Maintenance.
Companies Require Strategic information to face the competition in market.
The Operation system are not designed for strategic information.
To Maintain History of data for whole Organization and to have a single place where the entire data stored.


What is data warehousing and Explain Approaches?

Many companies follow either Characteristic defined by W.H.Inmon or Sean kelly.
Inmon definition
Subjected Oriented,Integrated,Non Volatile,Time Variant.

Sean Kelly definition
Seperate,Available,Integrated,TimeStamped,Suject Oriented,Non Volatile,Accessible.

Dwh Approaches
There are two Approches
1.Top Down by Inmon

2.Bottom Up by Ralph kimbal

Inmon approach -->Enterprise datawarehouse structured first and next Datamart created.(TopDown).
Ralph kimbal------>Datamart designed first.Later Datamarts to Datawarehouse designed.(BottomUp).

What are the responsibilities of a data warehouse consultant/professional?

The basic responsibility of a data warehouse consultant is to ‘publish the right data’.
Some of the other responsibilities of a data warehouse consultant are:

1. Understand the end users by their business area, job responsibilities, and computer
tolerance.

2. Find out the decisions the end users want to make with the help of the data warehouse.

3. Identify the ‘best’ users who will make effective decisions using the data warehouse

4. Find the potential new users and make them aware of the data warehouse.

5. Determining the grain of the data.

6. Make the end user screens and applications much simpler and more template driven.

What are fundamental stages of Data Warehousing?

Offline Operational Databases - Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.

Offline Data Warehouse - Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.


Real Time Data Warehouse - Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)

Integrated Data Warehouse - Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.


What is Datamart Explain Types?

It is a specific Subject area or Functionality or Task.
It is Designed to facilitate end user Analysis.

Wrong Answer-- It is a subset of warehouse--Please dont use this wrong answer.
Types of Datamarts
Dependent,Independent,Logical.
Dependent--->Warehouse created first and datamart is created next.
Independent-->Datamart is created directly from the source systems without depending on the warehouse.
Logical--->It is a backup or replica of any other Datamart.


How to create Datawarehouse and Datamart?

DWH----->By Applying Datawarehouse Approach on any Database.

DM------->Its Created by either using Views or Complex Tables.

What is Dimensional Modeling?

It provides relationship between Dimension and Fact with the help of particular model.(Star,Snowflake etc)

What do you mean by Dimension table and Explain Dimension Types?

Dimension table is a collection of Attributes which defines a Functionality or Task.

Features:
1.It contains textual information or descriptive information.
2.Does not contain any measurable information.
3.Answers for wht,where,when,why qstns.
4,These tables are Master tables and also Maintains History.

Types of Dimension
a.Confirmed
b.Degenerated
c.Junk
d.Role Playing
e.SCD
f.Dirty

What is Fact table and explain types of Measures?

Fact table is a main table in Relational Model.it contains two sections.
a.Foreign keys to Dimensions
b.Measures or Facts.

Features
1.Fact table contains measurable information or Numerical information.
2.Answers for how many,how much related questions.
3.These tables are children or transactional tables also contain history.

Types of Measures

Additive Measure,Semi Additive Measure, Non Additive Measure.

What is Factless Fact Table?

A table which does not contain any Meaningful or Additive measures.


What is Surrogate key? How do we generate?

It is a key contains Unique values like a Primary Key.
A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key.
It is just a unique identifier or number for each row that can be used for the primary key to the table.

we may generate this key in 2 ways

System generated
Manual sequence

What is the necessity of having surrogate keys?

1.Production may reuse keys that it has purged but that you are still maintaining.

2.Production might legitimately overwrite some part of a product description or a
customer description with new values but not change the product key or the customer
key to a new value. We might be wondering what to do about the revised attribute
values (slowly changing dimension crisis)

3.Production may generalize its key format to handle some new situation in the
transaction system.
E.g. changing the production keys from integers to alphanumeric
or may have 12-byte keys you are used to have become 20-byte keys.

4.Acquisition of companies

What are the advantages of using Surrogate Keys?

1. We can save substantial storage space with integer valued surrogate keys.

2.Eliminate administrative surprises coming from production.

3.Potentially adapt to big surprises like a merger or an acquisition.

4.Have a flexible mechanism for handling slowly changing dimensions.

What is SCD? Explian SCD types?

SCD--->Slowly Changing Dimension
As a Dimensions maintains history of the Data.A process into this dimensions in less volume so we call this dimensions as Slowly Changing Dimension.The process we follow here called SCD process.

SCD Types
Type 1 ---> No History
The new record replaces the original record. Only one record exist in database - current data.

Type 2----> History Maintained ---> 1. Current Expired Method
2.Effective Date Range Method.
A new record is added into the customer dimension table.
Two records exist in database - current data and previous history data.

Type 3---->History Maintained.
The original data is modified to include new data. One record exist in database - new information are attached with old information in same row.

What are the techniques for handling SCD’s?

Overwriting
Creating another dimension record
Creating a current value filed

What are the Different methods of loading Dimension tables?

There are two different ways to load data in dimension tables.

Conventional (Slow) :
All the constraints and keys are validated against the data before, it is
loaded, this way data integrity is maintained.

Direct (Fast) :
All the constraints and keys are disabled before the data is loaded.
Once data is loaded, it is validated against all the constraints and keys.
If data is found invalid or dirty it is not included in index and all future
processes are skipped on this data.

What is OLTP?

OLTP is abbreviation of On-Line Transaction Processing. This system is
an application that modifies data the instance it receives and has a
large number of concurrent users.

What is OLAP?

OLAP is abbreviation of Online Analytical Processing. This system is an
application that collects, manages, processes and presents
multidimensional data for analysis and management purposes.

What is the difference between OLTP and OLAP?

Data Source
OLTP: Operational data is from original data source of the data.

OLAP: Consolidation data is from various source.

Process Goal
OLTP: Snapshot of business processes which does fundamental business tasks.


OLAP: Multi-dimensional views of business activities of planning and decision making.

Queries and Process Scripts
OLTP: Simple quick running queries ran by users.

OLAP: Complex long running queries by system to update the aggregated data.

Database Design
OLTP: Normalized small database. Speed will be not an issue due to
smaller database and normalization will not degrade performance.
This adopts entity relationship(ER) model and an application-oriented
database design.

OLAP: De-normalized large database. Speed is issue due to largern database and de-normalizing will improve performance as there will be lesser tables to scan while performing tasks.
This adopts star,snowflake or fact constellation mode of subject-oriented database
design.

Back up and System Administration

OLTP: Regular Database backup and system administration can do the job.

OLAP: Reloading the OLTP data is good considered as good backup option.


Describes the foreign key columns in fact table and dimension table?

Foreign keys of dimension tables are primary keys of entity tables.
Foreign keys of facts tables are primary keys of Dimension tables.

What is Data Mining?

Data Mining is the process of analyzing data from different perspectives and summarizing
it into useful information.

What is the difference between view and materialized view?

A view takes the output of a query and makes it appear like a virtual
table and it can be used in place of tables.

A materialized view provides indirect access to table data by storing
the results of a query in a separate schema object.


What is ODS?

ODS is abbreviation of Operational Data Store. A database structure that is a repository
for near real-time operational data rather than long term trend data.
The ODS may further become the enterprise shared operational database,
allowing operational systems that are being reengineered to use the ODS as there operation databases.

What is VLDB?

VLDB is abbreviation of Very Large DataBase. A one terabyte database would normally be considered to be a VLDB. Typically, these are decision support systems or transaction processing applications serving large numbers of users.

Is OLTP database is design optimal for Data Warehouse?

No. OLTP database tables are normalized and it will add additional time to queries to return results. Additionally OLTP database is smaller and it does not contain longer period (many years) data, which needs to be analyzed.

A OLTP system is basically ER model and not Dimensional Model.
If a complex query is executed on a OLTP system,it may cause a heavy overhead on the OLTP server that will affect the normal business processes.

If de-normalized is improves data warehouse processes, why fact table is in normal form?

Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns which are primary key to other table that itself make normal form table.


What are lookup tables?

A lookup table is the table placed on the target table based upon the primary key of the target,
it just updates the table by allowing only modified (new or updated) records based on the lookup condition.

What are Aggregate tables?

Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions . It is always easy to retrieve data from aggregated tables than visiting original table which has million records.
Aggregate tables reduces the load in the database server and increases the performance of the query and can retrieve the result quickly.



What is real time data-warehousing?

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes
available instantly.

What are conformed dimensions?

Conformed dimensions mean the exact same thing with every possible fact table to which they are joined . They are common to the cubes.


What is conformed fact?

Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.

How do you load the time dimension?

Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. 100 years may be represented in a time dimension, with one row per day.

What is a level of Granularity of a fact table?

Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of granularity would mean what detail are you willing to put for each transactional fact.

What are non-additive facts?

Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However they are not considered as useless. If there is changes in dimensions the same facts can be
useful.

What are Additive Facts? Or what is meant by Additive Fact?

The fact tables are mostly very huge and almost never fetch a single record into our answer set.
We fetch a very large number of records on which we then do, adding, counting, averaging, or
taking the min or max. The most common of them is adding. Applications are simpler if they store facts in an additive format as often as possible.
Thus, in the grocery example, we don’t need to store the unit price.
We compute the unit price by dividing the dollar sales by the unit sales whenever necessary.


What are the 3 important fundamental themes in a data warehouse?

The 3 most important fundamental themes are:
1. Drilling Down
2. Drilling Across and
3. Handling Time

What is meant by Drilling Down?

Drilling down means nothing more than “give me more detail”.
Drilling Down in a relational database means “adding a row header” to an existing SELECT
statement.

For instance, if you are analyzing the sales of products at a manufacturer level, the
select list of the query reads:

SELECT MANUFACTURER, SUM(SALES).

If you wish to drill down on the list of manufacturers to show the brand sold, you add the BRAND row header:

SELECT MANUFACTURER, BRAND, SUM(SALES).

Now each manufacturer row expands into multiple rows listing all the brands sold. This is the
essence of drilling down.

We often call a row header a “grouping column” because everything in the list that’s not
aggregated with an operator such as SUM must be mentioned in the SQL GROUP BY clause.
So the GROUP BY clause in the second query reads, GROUP BY MANUFACTURER, BRAND.


What is meant by Drilling Across?

Drilling Across adds more data to an existing row. If drilling down is requesting ever finer and
granular data from the same fact table, then drilling across is the process fo linking two or more
fact tables at the same granularity, or, in other words, tables with the same set of grouping
columns and dimensional constraints.

A drill across report can be created by using grouping columns that apply to all the fact tables
used in the report.

The new fact table called for in the drill-across operation must share certain dimensions with the
fact table in the original query. All fact tables in a drill-across query must use conformed
dimensions.

What is the significance of handling time?

Example, when a customer moves from a property, we might want to know:

1. who the new customer is
2. when did the old customer move out
3. when did the new customer move in
4. how long was the property empty etc


What are the important fields in a recommended Time dimension table?

Time_key
Day_of_week
Day_number_in_month
Day_number_overall
Month
Month_number_overall
Quarter
Fiscal_period
Season
Holiday_flag
Weekday_flag
Last_day_in_month_flag

What is the main difference between Data Warehousing and Business Intelligence?


The differentials are:

DW - is a way of storing data and creating information through leveraging data marts.
DM's are segments or categories of information and/or data that are grouped together to provide 'information' into that segment or category.
DW does not require BI to work. Reporting tools can generate reports from the DW.


BI - is the leveraging of DW to help make business decisions and recommendations.
Information and data rules engines are leveraged here to help make these decisions along with statistical analysis tools and data mining tools.

What is a Physical data model?

During the physical design process, you convert the data gathered during the logical design
phase into a description of the physical database, including tables and constraints.


What is a Logical data model?

A logical design is a conceptual and abstract design. We do not deal with the physical
implementation details yet;
we deal only with defining the types of information that we need.
The process of logical design involves arranging data into a series of logical relationships called
entities and attributes.


What are an Entity, Attribute and Relationship?

An entity represents a chunk of information. In relational databases, an entity often maps to a
table.
An attribute is a component of an entity and helps define the uniqueness of the entity. In relational databases, an attribute maps to a column.
The entities are linked together using relationships.


What is junk dimension?

A number of very small dimensions might be lumped together to form a single dimension,
a junk dimension - the attributes are not closely related.
Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

HR DISCUSSION SECTION

HR DISCUSSION SECTION

Our Team Gathered and Kept Top 10 HR questions which are often asked during Interviews and we believe these answers can really help you to get through.......

1.Tell me about yourself?

Try to introduce some of your most important Employment-oriented skills as well as your education and accomplishments to the interviewer.
Answer to this question is very important because it positions you for the rest of the interview. That's why this statement is often called the " Positioning Statement".

One should take the opportunity to show his/her communication skills by speaking clearly and concisely in an organized manner.
Since there is no right or wrong answer for this question hence it is important to appear friendly.

Answers can be:

1.I completed by Engineering in IIT, Delhi in Electronics & Communication Engineering. Afterwards, I started my career at Mumbai as a software engineer in LnT Infotech. I’ve been there for 3 years now. I love solving riddles and puzzles and I also enjoy jogging, reading and watching movies.

2. I am a person with strong interpersonal skills and have the ability to get along well with people. I enjoy challenges and looking for creative solutions to problems.


2.what are your strengths?

This is a simple and popular interview question.
All will say their strength like “I am Young, Dynamic, Intelligent, Smart and so on…”.

Do not simply state your strength. Everyone has some strength, you need to convert them into benefits.
In short, try to explain yourself by converting your features into strengths.

Your answers can be:

1) I am a hard worker and because of this ability I can work for additional hours to accomplish my tasks.
I am commitment oriented and hence I always enjoy the trust and confidence of my team mates which enables me to perform my duties very easily.

2) I’ve always been a great team player and I can work Efficiently to produce quality work in a team environment.
I can accomplish a large amount of work within a short period of time hence I get things done on time.

3)I am a quick learner, so I can learn any subject quickly and analyze my job and add value to it as well as I can identify the problem and solve them faster and better.


3.Difference between Smart Worker and Hard Worker?

"A hard worker always do right things but a smart worker always put things right."


4.Why should I hire you?

Reasons to Hire me:

a. I am a perfect fit for this position. I have three years of experience in this technology and my skills enable me to develop better products in less time.
On top of that I am a great team player that gets a long with everyone.

b.My Qualification and work Experience matches your needs perfectly.
Even though I realize other candidates also have the ability to do this job But I bring an additional quality that makes me the best person for this job - my passion for excellence.


5.why do you want to leave your current job/organisation/company?


This question looks very simple but it’s not that easy to answer it.
I like to say both wrong answer and right answer to this question.

Wrong Answer
1.I hate my job, my company and my boss,need more money.
2.My company makes me work for more additional hours and was paying on a low scale.
3.My co-workers never supported me and were jealous for my work.

If you give this type of wrong answer and bad-mouth about your previous company, superiors and co-workers because it will definitely sound negative on your part.
Interviewer may think you may talk bad about his company the next time you’re looking for another job.

Correct Answer :

1.I was looking for a position like this which is an excellent match for my Skills and Experience and I am not able to fully utilize them in my present job as there is very limited scope for growth.

2. I am interested in a new challenge and an opportunity to use my technical skills and experience in a different capacity than I have in the past.

6.what are your short term goals?

Short term goal depends upon where you stand right now.
A person with 5 years of experience will have different short term goals than a person with no work experience.

You can Answer like this:

I want to see myself as a Senior software developer or Team leader in your esteemed organization where by with all my skills and enhanced learning I shall be able to make valuable and meaningful contribution to your organization.

7.what are your Long term goals?

HR will ask this question to check your seriousness about career.

Correct Answer:

Give always ambitious answer that shows really you Love your Career.
Like this
1.My long term goal is to be an instructor. I have always love to teach and I would like to grow newer employees and help co-workers where ever I can.

2.After a successful career, I look forward to write a book on Programming Language.

3. My long term goal is to become partner for a consulting firm. I know a lot of hard work determination and patience is required to become a partner.

4.My long term goal is to become director of a company. I know it sounds a little too ambitious but I’m smart and willing to work hard a lot.


Wrong answer
Do not say like "To Become rich and retire early".

8.where do you see yourself five years from now?

This questions is similar to short term goal but you need to answer them differently like

1.In five years, I want to be a senior analyst (Architect/Manager/Lead). I want to expertise to directly impact the company in a positive way.

2. I want to become a manager. I want to continue gaining experience, and after learning many different aspects, I see myself in management.

9.Do you prefer to work alone or as a team player?

If you have a strong preference to work alone also the best answer is to say both.
Reality is that most jobs require us to work both independently and in Teams.
Most employers want someone who can work well in a team and work well alone.

You can Answer like this

1. I would like to work in an environment where there is a blend of both.
Its great to work in a team by sharing and learning ideas with each other, but it’s also great to sit at my own desk and work hard productively

2.I believe both are two sides of the same coin. They can never be isolated. A man has to work individually and also as a team player.The value of Teamwork is the emergence of new ideas and creative solutions as well as sharing of the work load.


10.what kind of salary you are looking for?

Always allow the interviewer to do it first. Do not give a figure right away.

After HR reply you can answer like this

1.I hope I'll get salary according to the company standards and designation to which I'm posting. The more important is to get an opportunity to work in this organization and enhancing my skills. Thank you.

2.Never say Lie about your current package,you can ask 25% - 50% from your present salary.


10.What do you know about this company/organisation?

Your answer should reflect your knowledge about the company and passion for the job

I learned that this company provides a strong core competency, very strong value systems and best practices so I believe I have a strong vision of viewing myself as a member of this company. Also it has one of the fastest growth rates and turnover in the industry and that would mean a faster growth rate for me as a professional.

ABINITIO_FREQUENTLY ASKED_INTERVIEW_QUESTIONS



1.What is the difference between rollup and scan? 

 Ans: 
By using rollup we cant generate cumulative summary records for that we will be using scan.  What is the difference between partitioning with key and round robin Ans: PARTITION BY KEY: In this, we have to specify the key based on which the partition will occur. Since it is key based it results in very well balanced data. It is useful for key dependent parallelism. PARTITION BY ROUND ROBIN:In this, the records are partitioned in sequential way, distributing data evenly in blocksize chunks across the output partition. It is not key based and results in well balanced data especially with blocksize of 1. It is useful for record independent parallelism.

2.How do you truncate a table 

ans: There are many ways to do it. 

1. Probably the easiest way is to use Truncate Table
2. Run Sql or update table can be used to do the same thing
3. Run Program

3.What is the difference between a DB config and a CFG file?

Ans; 
A .dbc file has the information required for Ab Initio to connect to the database to extract or load tables or views. While .CFG file is the table configuration file created by db_config while using components like Load DB Table

4.Types of parallelism in detail. 

ans:
There are 3 types of parallelism in ab-initio. 

1) Data Parallelism: Data is processed at the different servers at the same time. 
2) Pipeline parallelism: In this the records are processed in pipeline, i.e. the components do not have to wait for all the records to be processed. The records that got processed are passed to next component in pipeline.
3) Component Parallelism: In this two or more components process the records in parallel. Component parallelism:- A graph with multiple processes running simultaneously on separate data uses component parallelism. Data parallelism :- A graph that deals with data divided into segments and operates on each segment simultaneously uses data parallelism. Nearly all commercial data processing tasks can use data parallelism. To support this form of parallelism, Ab Initio provides Partition components to segment data, and Departition components to merge segmented data back together . Pipeline parallelism :- A graph with multiple components running simultaneously on the same data uses pipeline parallelism. Each component in the pipeline continuously reads from upstream components, processes data, and writes to downstream components. Since a downstream component can process records previously written  by an upstream component, both components can operate in parallel. NOTE: To limit the number of components running simultaneously, set phases in the graph.

5.What is the function you would use to transfer a string into a decimal?

Ans:
 For converting a string to a decimal we need to typecast it using the following syntax, out.decimal_field :: ( decimal( size_of_decimal ) ) string_field; The above statement converts the string to decimal and populates it to the decimal field in output.

6.How to execute the graph from start to end stages? Tell me and how to run graph in non-Abinitio system?

Ans: 
There are so many ways to do this, i am giving one example due to time constraint you can run components according to phasea how you defined. by creating ksh, sh scripts also you can run.

7.What is data mapping and data modelling? 
Ans; 
Data mapping deals with the transformation of the extracted data at FIELD level i.e. the transformation of the source field to target field is specified by the mapping defined on the target field. The data mapping is specified during the cleansing of the data to be loaded. For Example: source; string(35) name = "Siva Krishna "; target; string("01") nm=NULL("");/*(maximum length is string(35))*/ Then we can have a mapping like: Straight move.Trim the leading or trailing spaces. The above mapping specifies the transformation of the field nm

8.What is the difference between sandbox and EME, can we perform checkin and checkout through sandbox/ Can anybody explain checkin and checkout? 

Ans; 
Sandboxes are work areas used to develop, test or run code associated with a given project. Only one version of the code can be held within the sandbox at any time.  The EME Datastore contains all versions of the code that have been checked into it. A particular sandbox is associated with only one Project where as a Project can be checked out to a number of sandboxes

9.explain the environment varaibles with example.?
ans;
 Environemental variables server as global variables in unix envrionment. They are used for passing on values from a shell/ process to another. They are inherited by Abinitio as sandbox variables/ graph parameters like  AI_SORT_MAX_CORE AI_HOME AI_SERIAL AI_MFS etc. To know what all variables exist, in your unix shell, find out the naming convention and type a command like "env | grep AI". This will provide you a list of all the variables set in the shell. You can refer to the graph parameters/ components to see how these variables are used inside Abinitio.

10.What r the Graph parameter? 

ans:
 There are 2 types of graph parameters in AbInitio 1. local parameter  2. Formal parameters.(those parameters working at runtime)

11.How to Improve Performance of graphs in Ab initio?Give some examples or tips.? 

Ans:
 There are somany ways to improve the performance of the graphs in Abinitio. I have few points from my side. 1.Use MFS system using Partion by Round by robin. 2.If needed use lookup local than lookup when there is a large data. 3.Takeout unnecessary components like filter by exp instead provide them in reformat/Join/Rollup. 4.Use gather instead of concatenate. 5.Tune Max_core for Optional performance. 6.Try to avoid more phases.

12.What are the most commonly used components in a Abinition graph. example of a trasformation of data, say customer data in a credit card company into meaningful output based on business rules? 
Ans: 
The most commonly used components in to any Ab Initio project are  input file/output file input table/output table lookup file reformat,gather,join,runsql,join with db,compress components,sort,trash,partition by expression,partition by key ,concatinate

13.Difference between conventional loading and direct loading ? when it is used in real time .? 
ans: 
Conventional Load:  Before loading the data, all the Table constraints will be checked against the data.  Direct load:(Faster Loading)  All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.  Api conventional loading  utility direct loading.

14.How to find the number of arguments defined in graph? 

Ans: $# - No of positional parameters $? - the exit status of the last executed command.

15.What is the difference between .dbc and .cfg file?

Ans:
 .cfg file is for the remote connection and .dbc is for connecting the database. .cfg contains : 1. The name of the remote machine 2. The username/pwd to be used while connecting to the db. 3. The location of the operating system on the remote machine. 4. The connection method. and .dbc file contains the information: 1. The database name2. Database version 3. Userid/pwd 4. Database character set and some more...

16.How to do we run sequences of jobs ,,like output of A JOB is Input to B .How do we co-ordinate the jobs? 

Ans: By writing the wrapper scripts we can control the sequence of execution of more than one job.

17•How would you do performance tuning for already built graph ? Can you let me know some examples? 

Ans: 
example :- suppose sort is used in fornt of merge component its no use of using sort ! bcz we hv sort component built in merge. 2) we use lookup instead of JOIN,Merge Componenet. 3) suppose we wnt to join the data comming from 2 files and we dnt wnt dupliates we will use union funtion instead of adding addtional component for duplicate remover.

18•What is semi-join

ans: 
In abinitio,there are 3 types of join... 1.inner join. 2.outer join and 3.semi join. for inner join 'record_requiredn' parameter is true for all in ports. for outer join it is false for all the in ports. if u want the semi join u put 'record_requiredn' as true for the required component and false for other components..

19•How to get DML using Utilities in UNIX? 
Ans: If your source is a cobol copybook, then we have a command in unix which generates the required in Ab Initio. here it is: cobol-to-dml.

20•what is local and formal parameter? 

Ans: 
Two are graph level parameters but in local you need to initialize the value at the time of declaration where as globle no need to initialize the data it will promt at the time of running the graph for that parameter.

21•what is BRODCASTING and REPLICATE ? 
ans: 
Broadcast - Takes data from multiple inputs, combines it and sends it to all the output ports.  Eg - You have 2 incoming flows (This can be data parallelism or component parallelism) on Broadcast component, one with 10 records & other with 20 records. Then on all the outgoing flows (it can be any number of flows) will have 10 + 20 = 30 records Replicate - It replicates the data for a particular partition and send it out to multiple out ports of the component, but maintains the partition integrity. Eg - Your incoming flow to replicate has a data parallelism level of 2. with one partition having 10 recs & other one having 20 recs. Now suppose you have 3 output flos from replicate. Then each flow will have 2 data partitions with 10 & 20 records respectively.

22•What is m_dump m_dump command prints the data in a formatted way. m_dump <dml> <file.dat>

23•An exaple of realtime start script in the graph?

Ans:
 Here is a simple example to use a start script in a graph: In start script lets give as: export $DT=`date '+%m%d%y'` Now this variable DT will have today's date before the graph is run. Now somewhere in the graph transform we can use this variable as; out.process_dt::$DT; which provides the value from the shell.

24•How to run the graph without GDE? 

Ans: 
In RUN ==> Deploy >> As script , it create a .bat file at ur host directory ,and then run .bat file from Command prompt

25•How Does MAXCORE works? 
Ans: Maxcore is a value (it will be in Kb).Whne ever a component is executed it will take that much memeory we specified for execution

26•.What is $mpjret? Where it is used in ab-initio?

ans: 
You can use $mpjret in endscript like if 0 -eq($mpjret)then echo "success" else mailx -s "[graphname] failed" mailid 

27•How do you convert 4-way MFS to 8-way mfs? 

Ans: 
To convert 4 way to 8 way partition we need to change the layout in the partioning component. There will be seperate parameters for each and every type of partioning eg. AI_MFS_HOME, AI_MFS_MEDIUM_HOME, AI_MFS_WIDE_HOME etc.  The appropriate parameter need to be selected in the component layout for the type of partioning..

28•What is AB_LOCAL expression where do you use it in ab-initio? 

ans: 
ablocal_expr is a parameter of itable component of Ab Initio.ABLOCAL() is replaced by the contents of ablocal_expr.Which we can make use in parallel unloads.There are two forms of AB_LOCAL() construct, one with no arguments and one with single argument as a table name(driving table). The use of AB_LOCAL() construct is in Some complex SQL statements contain grammar that is not recognized by the Ab Initio parser when unloading in parallel. You can use the ABLOCAL() construct in this case to prevent the Input Table component from parsing the SQL (it will get passed through to the database). It also specifies which table to use for the parallel clause.

29•What is mean by Co > Operating system and why it is special for Abinitio ? 

ans: 
It converts the AbInitio specific code into the format, which the UNIX/Windows can understand and feeds it to the native operating system, which carries out the task.

30•How will you test a dbc file from command prompt ?

ans: 
try "m_db test myfile.dbc"

31•Which one is faster for processing fixed length dmls or delimited dmls and why ? 

ans:
 Fixed length DML's are faster because it will directly read the data of that length without any comparisons but in delimited one,s every character is to be compared and hence delays

32•.What are the continuous components in Abinitio? 

ans: 
Contineous components used to create graphs,that produce useful output file while running continously Ex:- Contineous rollup,Contineous update,batch subscribe

33•How to retrieve data from database to source in that case whice component is used for this?

ans;
 To unload (retrive) Data from the database DB2, Informix, or Oracle we have components like Input Table and Unload DB Table by using these two components we can unload data from the database.

34• What is the relation between EME , GDE and Co-operating system ? 

ans:
 EME is said as enterprise metdata env, GDE as graphical devlopment env and Cooperating sytem can be said as asbinitio server relation b/w this CO-OP, EME AND GDE is as fallows     Co operating system is the Abinitio Server. this co-op is installed on perticular O.S platform that is called NATIVE O.S .comming to the EME, , its hold the metadata,trnsformations,db config files source and targets information's. comming to GDE its is end user envirinment where we can devlop the graphs(mapping just like in informatica) designer uses the GDE and designs the graphs and save to the EME or Sand box it is at user side.where EME is ast server side.

35•What are kinds of layouts does ab initio supports 

ans:
 Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is defined such as it's same as the degree of parallelism. 

36•Do you know what a local lookup is? 

ans: 
Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much faster than retirving from disk. It allows the transform component to process the data records of multiple files fastly.

37•How many components in your most complicated graph? 

ans:
 This is a tricky question, number of component in a graph has nothing to do withthe level of knowledge a person has. On the contrary, a proper standardized and modular parametric approach will reduce the number of components to a very few. In a well thought modular and parametric design, mostly the graphs will have 3/4 components, which will be doing a particular task and will then call another sets of graphs to do the next and so on. This way total numbers of distinct graphs will drastically come down, support and maintenance will be much more simplified. The bottomline is, there are lot more other things to plan rather than to add components.

38•How to handle if DML changes dynamically in abinitio ?

ans:
 If the DML changes dynamically then both dml and xfr has to be passed as graph level parameter during the runtime.

40•Have you worked with packages? 

Ans: 
Packages are nothing but the reusable blocks of objects like transforms, user defined functions, dmls etc. These packages are to be included in the transform where you use them. For example, consider a user defined function like /*string_trim.xfr*/ out::trim(input_string)= begin let string(35) trimmed_string = string_lrtrim(input_string); out::trimmed_string; end Now, the above xfr can be included in the transform where you call the above function as include ''~/xfr/string_trim.xfr''; But this should be included ABOVE your transform function. For more details see the help file in "packages".

41•What are primary keys and foreign keys? 

Ans: 
In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship.Wheras the primary key table is the parent table and foreignkey table is the child table.The criteria for both the tables is there should be a matching column. 

42•What are Cartesian joins?

Ans:
 Cartesian join will get you a Cartesian product. A Cartesian join is when you join every row of one table to every row of another table. You can also get one by joining every row of a table to every row of itself.

43•Explain the difference between the “truncate” and "delete" commands? 

ans:
 Truncate :- It is a DDL command, used to delete tables or clusters. Since it is a DDL command hence it is auto commit and Rollback can't be performed. It is faster than delete.

44•How can i run the 2 GUI merge files? 
Ans:
Do you mean by merging Gui map files in WR.If so, by merging GUI map files in GUI map editor it wont create corresponding test script.without testscript you cant run a file.So it is impossible to run a file by merging 2 GUI map files.