Sponsored Links


Results 1 to 2 of 2

Thread: CS614 Assignment No. 03 Spring 2011 idea solution

  1. #1
    Administrator Vuhelper's Avatar
    Join Date
    Apr 2011
    Posts
    9,578

    Icon19 CS614 Assignment No. 03 Spring 2011 idea solution

    Sponsored Links1


    Assignment No. 03
    SEMESTER Spring 2011
    CS614- Data Warehousing
    Question # 1 [15 marks]
    Read and Understand the research paper and then give answer to the following Question.
    1) How the approach of parallelism is used in Data warehouse and OLAP.
    2) How the parallel star join works that is discussed in this research article.
    Note:
    i) Do not cut copy and paste from research paper. Write in your own words. If you will cut, copy and paste from research paper then you will get zero marks with no leniency.
    ii) Explain your answer briefly and precisely. If you will give extra i.e. un required explanation will lead to deduction of marks. Write your answer in points form.


    Question # 2 [5 marks]
    Give some suggestions of your own how to improve the proposed solution in research paper?
    Solution
    Email This


    Sponsored Links

  2. #2
    Administrator Vuhelper's Avatar
    Join Date
    Apr 2011
    Posts
    9,578
    Question # 1:-
    How the approach of parallelism is used in Data warehouse and OLAP?

    Ans:-
    By partitioning a data among a set of processors OLAP queries can be executed in parallel potentially achieving linear speedup and thus significantly improving query response time…..

    In recent years the databssase community has experienced a tremendous increase in the availability of new technologies to support efficient storage and retrieval of large volumes of data, namely data warehousing and On-Line Analytical Processing (OLAP) products. Efficient query processing is critical in such an environment, yet achieving quick response times with OLAP queries is still largely an open issue. In this paper we propose a solution approach to this problem by applying parallel processing techniques to a warehouse environment. We suggest an efficient partitioning strategy based on the relational representation of a data warehouse (i.e., star schema). Furthermore, we incorporate a particular indexing strategy, Data Indexes, to further improve query processing times and parallel resource utilization, and propose a preliminary parallel star-join strategy.
    A parallel aproch for the data ware house is strong due to inherent nature of such an environment.

    Still another appeal for parallelism in datawarehouse in the OLAP system is the logical is a logical design for the warehouse Data warehouses often contain large tables and require techniques both for managing these large tables and for providing good query performance across these large tables. This chapter discusses two key methodologies for addressing these needs: parallelism and partitioning.

    Parallelism is certainly an issue if you're using DTS. If you don't specifically set up the package to use parallelism then your only choice is to run multiple packages at once to get parallel processes running. If you
    have abstracted the data load in such a way that a single package is used to load multiple files dynamically, then parallelism needs to be considered in order to get the package running multiple threads within the same package.

    There are different degrees of how and where parallelism can and should be used. Ensuring that the query processor is setup correctly to handle parallel processing of the data load is also a consideration. One can also have multiple servers running to get the data loaded across federated servers. This is also considered to be parallelism

    Question #2:-

    1) How the parallel star join works that is discussed in this research article.

    Ans:-
    To compute the star join, we first find out the rows of the fact table that will be participating in the final
    cube grouping. Since restriction predicates reduce the number of rows in the final result table, we utilise
    them first. The following query shows how restriction predicates limit the number of rows in the result
    table:

    SELECT Year, State, Product_Name, SUM (Quantity) AS ‘Total Quantity’
    FROM Product P, Date D, Sales S, Region R
    WHERE P.Product_No = S. Product_No AND D.Date_key = S.Date_Key AND
    R.Region_ID = S. Region_ID AND
    D.Year IN (1998, 1999) AND P.Category IN (‘Printer’, ‘Scanner’)
    GROUP BY CUBE Year, State, Product_Name;

    Assuming the data warehouse (Figure 1) maintains sales records of 10 years, we can straight away notice that only the last two years’ sales rows will be picked from the fact table. The second restriction predicate on product category effects a further reduction.

    We use a simple method viz. to form rowsets (of the fact table) for one restriction predicate. The restriction predicates are applied to their corresponding dimensions in parallel. The JVIs that satisfy each predicate are passed on to a coordinator node where an intersection of the RIDs of all the restricted dimensions is performed. The resulting rowset (set of row IDs) thus formed is the result of the star join. The dimensions for which no restriction predicates are present do not participate in this operation as they annot cause any reduction to the resulting rowset. While operating on the fact table we try to distribute equally to each node available as much processing as is possible, so that I/O and computation could be performed efficiently in parallel. Algorithm 2 shows how the parallel star join is performed.

    The schema models the activities of a world-wide wholesale supplier over a period of seven years. The fact table is the SALES table, and the dimension tables are the PART, SUPPLIER, CUSTOMER, and TIME tables. The fact table contains foreign keys to each of the dimension tables. This schema suggests an eÆcient data partitioning as we will soon show.

    A common type of query in OLAP systems is the star-join query. In a star-join, one or more dimension tables are joined with the fact table.


    Query 1
    SELECT U.Name, SUM(S.ExtPrice)
    FROM SALES S, TIME T, CUSTOMER C, SUPPLIER U
    WHERE T.Year BETWEEN 1996 AND 1998
    AND U.Nation='United States' AND C.Nation='United States'
    AND S.ShipDate = T.TimeKey AND S.CustKey = C.CustKey
    AND S.SuppKey = U.SuppKey
    GROUP BY U.Name


    A set of attributes that is frequently used in join predicates can be readily identi_ed in the structure of a star schema. In the example star schema, ShipDate, CustKey, SuppKey, and PartKey of the SALES table can be identi_ed as attributes that will often participate in joins with the corresponding dimension tables. We can thus use this information to apply a vertical partitioning method on these attributes to achieve the bene_ts of parallelism. This paper shows, in fact, that one can use a combination of vertical and horizontal partitioning techniques to extract the parallelism inherent in star schemas. Speci_cally, we propose a declustering strategy which incorporates both task and data parti- tioning and present the Parallel Star Join (PSJ) algorithm,
    which provides a means to perform a star join in parallel using eÆcient operations involving only rowsets and projection columns.


    The Parallel Star Join Algorithm:-
    In this section we present our algorithm to perform star joins in parallel.
    We represent a general k-dimensional star-join query as follows.

    Query 3
    SELECT Ad
    P , Am
    P FROM F, D1, : : :, Dk WHERE P./ AND P_
    Here D1; : : : ;Dk are the k dimensional tables participating in the join. P_ and P./ denote a set of
    restriction and join predicates respectively. We assume that each individual restriction predicate
    in P_ only concerns one table and is of the form (aj hopi constant), where aj is any attribute in the
    warehouse schema and hopi denotes a comparison operator (e.g., =;_;_). We assume each join
    predicate in P./ is of the form al = at where at is any dimensional key attribute and al is the foreign
    key referenced by at in the fact table.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 3
    Last Post: 01-20-2012, 10:58 PM
  2. CS 201 Assignment # 2 Solution idea Spring 2011
    By Pari ch in forum Get Solution In 24 Hour
    Replies: 23
    Last Post: 12-27-2011, 05:38 PM
  3. Replies: 2
    Last Post: 11-01-2011, 03:19 AM
  4. Replies: 1
    Last Post: 07-01-2011, 05:03 AM
  5. Replies: 0
    Last Post: 04-25-2011, 08:41 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
-: Vuhelp Disclaimer :-
None of the files shown here are hosted or transmitted by this server. The links are provided solely by this site's users. The administrator's or staff of Vuhelp.net cannot be held responsible for what its users post, or any other actions of its users. You may not use this site to distribute or download any material when you do not have the legal rights to do so. It is your own responsibility to adhere to these terms. If you have any doubts about legality of content or you have any suspicions, feel free to contact us.
Online Education | JhelumSoft