Tech Guide - Interview , Programming & Others: September 2012

Sunday, September 30, 2012

Interview Questions BB 1

1. How many sockets can you have?you can listen on 65536 different ports. Some of these may be in use or reserved, so you may have fewer actual port availabilities.

what will a server do after getting a request from a client?

2. Multicast vs Broadcast?

When a device send a messege to all possible receiver it is called broadcasting. This message is received by all other devices. Destination address of message contain all bits set to 1. While when a device send a message to a group of devices it is called multicasting. Message is received by only devices which belongs to group mentioned in address field.Broadcast, FM radio and For Multicast, sending group mails.

3.atoi() function in Java and C++ ?

public static void main(String[] args) {
// TODO Auto-generated method stub
String input = "786786";
int length = input.length();
int sum = 0;
int convertedDig = 0; //Converted digit from char
int startIndex = 0; //StartIndex based on sign
int sign = 1; //Default sign is +ve.

//Check for -ve number.
if(input.charAt(0) == '-'){
sign = -1;
startIndex = 1;
}

//Iterate over String starting from 1st digit.(startIndex)
//
for(int i=startIndex;i<length;i++){
switch(input.charAt(i)){
case '0' : convertedDig = 0; break;
case '1' : convertedDig = 1; break;
case '2' : convertedDig = 2; break;
case '3' : convertedDig = 3; break;
case '4' : convertedDig = 4; break;
case '5' : convertedDig = 5; break;
case '6' : convertedDig = 6; break;
case '7' : convertedDig = 7; break;
case '8' : convertedDig = 8; break;
case '9' : convertedDig = 9; break;
}
sum += convertedDig * Math.pow(10, (length-i-1));
}
System.out.println("The atoi conversion is "+sum*sign);
}

4. Differences between pipes, FIFOs, message queues, shared memory, and sockets

Pipe

In computer programming, especially in Unix operating systems, a pipe is a technique for passing information from one program process to another.
Unlike other forms of interprocess communication (IPC), a pipe is one-way communication only.
A pipe is fixed in size and is usually at least 4,096 bytes.
Basically, a pipe passes a parameter such as the output of one process to another process which accepts it as input. The system temporarily holds the piped information until it is read by the receiving process.
Using a UNIX shell (the UNIX interactive command interface), a pipe is specified in a command line as a simple vertical bar (|) between two command sequences.
The output or result of the first command sequence is used as the input to the second command sequence. The pipe system call is used in a similar way within a program.

For two-way communication between processes, two pipes can be set up, one for each direction. A limitation of pipes for interprocess communication is that the processes using pipes must have a common parent process (that is, share a common open or initiation process and exist as the result of a fork system call from a parent process).

Named pipe (FIFO Pipes)

In computer programming, a named pipe is a method for passing information from one computer process to other processes using a pipe or message holding place that is given a specific name.
Unlike a regular pipe, a named pipe can be used by processes that do not have to share a common process origin and the message sent to the named pipe can be read by any authorized process that knows the name of the named pipe.
A named pipe is sometimes called a "FIFO" (first in, first out) because the first data written to the pipe is the first data that is read from it.
While you can use only stdin and stdout with ordinary pipes (i.e redirect input and output streams from one process into another ), using FIFOs you can use filenames rather than just stdin and stdout.

example:

mkfifo pipe_test
grep "test" my_file.txt > pipe_test
wc -l < pipe_test

Message queuing

In programming, message queuing is a method by which process (or program instances) can exchange or pass data using an interface to a system-managed queue of messages.
Messages can vary in length and be assigned different types or usages.
A message queue can be created by one process and used by multiple processes that read and/or write messages to the queue.
For example, a server process can read and write messages from and to a message queue created for client processes.
The message type can be used to associate a message with a particular client process even though all messages are on the same queue.
The message queue is managed by the operating system (or kernel).
Application programs (or their processes) create message queues and send and receive messages using an application program interface (API).
In Unix systems, the C programming language msgget function is used with various parameters specifying the action requested, message queue ID, message type, and so forth.
The maximum size of a message in a queue is limited by the operating system and is typically 8,192 bytes.

Shared memory

In computer programming, shared memory is a method by which program processes can exchange data more quickly than by reading and writing using the regular operating system services.
For example, a client process may have data to pass to a server process that the server process is to modify and return to the client. Ordinarily, this would require the client writing to an output file (using the buffers of the operating system) and the server then reading that file as input from the buffers to its own work space. Using a designated area of shared memory, the data can be made directly accessible to both processes without having to use the system services. To put the data in shared memory, the client gets access to shared memory after checking a semaphore value, writes the data, and then resets the semaphore to signal to the server (which periodically checks shared memory for possible input) that data is waiting. In turn, the server process writes data back to the shared memory area, using the semaphore to indicate that data is ready to be read.

Other forms of interprocess communication (IPC) include message queueing, semaphores, and sockets.

A good source for programming in C for shell and synchronization : http://www.cs.cf.ac.uk/Dave/C/

Sockets is a method for communication between a client program and a server program in a network. A socket is defined as "the endpoint in a connection." Sockets are created and used with a set of programming requests or "function calls" sometimes called the sockets application programming interface (API). The most common sockets API is the Berkeley Unix C interface for sockets. Sockets can also be used for communication between processes within the same computer.

This is the typical sequence of sockets requests from a server application in the "connectionless" context of the Internet in which a server handles many client requests and does not maintain a connection longer than the serving of the immediate request:

socket()
|
bind()
|
recvfrom()
|
(wait for a sendto request from some client)
|
(process the sendto request)
|
sendto (in reply to the request from the client...for example, send an HTML file)

A corresponding client sequence of sockets requests would be:

socket()
|
bind()
|
sendto()
|
recvfrom()

Sockets can also be used for "connection-oriented" transactions with a somewhat different sequence of C language system calls or functions.

Saturday, September 29, 2012

OS Concepts : Race Conditions

A race condition or race hazard

is a type of flaw in an electronic or software system where
the output is dependent on the sequence or timing of other uncontrollable events.
The term originates with the idea of two signals racing each other to influence the output first.
Race conditions can occur in electronics systems, especially logic circuits, and in computer software, especially multithreaded or distributed programs.
Race conditions arise in software when separate processes or threads of execution depend on some shared state.

Mutually exclusive operations are those that cannot be interrupted while accessing some resource such as a memory location. I

Operations upon shared states are critical sections that must be mutually exclusive in order to avoid harmful collision between processes or threads that share those states.

As a simple example let us assume that two tasks each want to increment the value of a global integer by one. Ideally, the following sequence of operations would take place:

Task 1	Task 2	New Value of "i"
		0
read value of i		0
increase value		0
write value to i		1
	read value of i	1
	increase value	1
	write value to i	2

In the case shown above, the final value of i is 2, as expected. However, if the two threads run simultaneously without locking or synchronization, the outcome of the operation could be wrong. The alternative sequence of operations below demonstrates this scenario:

Task 1	Task 2	New Value of "i"
		0
read value of i		0
	read value of i	0
increase value		0
	increase value	0
write value to i		1
	write value to i	1

The final value of i is 1 instead of the expected result of 2. This occurs because the increment operations of the second case are not mutually exclusive. Mutually exclusive operations are those that cannot be interrupted while accessing some resource such as a memory location. In the first case, T1 was not interrupted while accessing the variable i, so its operation was mutually exclusive.

For another example, consider the following two tasks, in pseudocode:

global int A = 0
 
// increments the value of A and print "RX"
// activated whenever an interrupt is received from the serial controller
task Received()
    A = A + 1
    print "RX"
end task
 
// prints out only the even numbers
// is activated every second
task Timeout()
    if (A is evenly divisible by 2)
        print A
    end if
end task

Output would look something like:

0
0
0
RX
RX
2
RX
RX
4
4

Now consider this chain of events, which might occur next:

timeout occurs, activating task Timeout
task Timeout evaluates A and finds it is divisible by 2, so elects to execute the "print A" next.
data is received on the serial port, causing an interrupt and a switch to task Received
task Received runs to completion, incrementing A and printing "RX"
control returns to task Timeout
task Timeout executes print A, using the current value of A, which is 5.

Mutexes are used to address this problem in concurrent programming.

File systems

In file systems, two or more programs may "collide" in their attempts to modify or access a file, which could result in data corruption.

File locking provides a commonly used solution.

A more cumbersome remedy involves organizing the system in such a way that one unique process (running a daemon or the like) has exclusive access to the file, and all other processes that need to access the data in that file do so only via interprocess communication with that one process (which of course requires synchronization at the process level).

A different form of race condition exists in file systems where unrelated programs may affect each other by suddenly using up available resources such as disk space (or memory, or processor cycles).

Software not carefully designed to anticipate and handle this race situation may then become quite fragile and unpredictable. Such a risk may be overlooked for a long time in a system that seems very reliable. But eventually enough data may accumulate or enough other software may be added to critically destabilize many parts of a system. Probably the best known example of this occurred with the near-loss of the Mars Rover "Spirit" not long after landing, but this is a commonly overlooked hazard in many computer systems. A solution is for software to request and reserve all the resources it will need before beginning a task; if this request fails then the task is postponed, avoiding the many points where failure could have occurred. (Alternatively, each of those points can be equipped with error handling, or the success of the entire task can be verified afterwards, before continuing.) A more common but incorrect approach is to simply verify that enough disk space (for example) is available before starting a task; this is not adequate because in complex systems the actions of other running programs can be unpredictable.

Friday, September 28, 2012

Dynamic Programming Practice Problems : Genes and Sequences

Here we look at a problem from computational biology. You can think of a DNA sequence as sequence of the characters “a”,”c”,”g”,”t”. Suppose you are given DNA sequences D1 of n1 characters and DNA sequence D2 of n2 characters. You might want to know if these sequences appear to be from the same object. However, in obtaining the sequences, laboratory errors could cause reversed, repeated or missing characters. This leads to the following sequence alignment problem.

An alignment is defined by inserting any number of spaces in D1 and D2 so that the resulting strings D_1 and D_2 both have the same length (with the spaces included as part of the sequence). Each character of D_1 (including each space as a character) has a corresponding character (matching or non-matching) in the same position in D_2.

For a particular alignment A we say cost(A) is the number of mismatches (where you can think of a space as just another character and hence a space matches a space but does not match one of the other 4 characters). To be sure this problem is clear suppose that D1 is ctatg and D2 is ttaagc. One possible alignment is given by:

D1 is a DNA sequence with n1 characters and D1 is a DNA sequence with n2 characters.

The general form of the subproblem we solve will be: Find the best alignment for the first i characters of D1 and the first j characters of D2 for 1 < i <= n1 and 1 < j <= n2.

Let D(i) be the ith character in string D.

Let c[i, j] be the cost of an optimal alignment for D1(1), . . . ,D1(i) and D2(1), . . . ,D2(j). We can define c[i, j] recursively as shown (where c[n1, n2] gives the optimal cost to the original problem):

As usual we take i =n and j =m and start from the back of the sequence and then solve the problem

In an optimal alignment either the last character of D_1 is a space or it is the last character (character i) of D1 and the last character of D_2 is a space or it is the last character (character j) of D2.

If D1(i) = D2(j) then clearly it is best to align them (so no need to add any space).

However, if D1(i) != D2(j) then a space could be added to neither or just one. In all three cases one mismatch is caused by the last characters.

Notice that there would never be any benefit in ending both D1 and D2 with a space. Hence the above recursive definition considers all possible cases that the optimal alignment could have. Since the solution to the original problem is either the value of the subproblem solution (if D1(i) = D2(j)) or otherwise one plus the subproblem solution, the optimal substructure property clearly holds.

Thus the solution output is correct.

ct at g

tta agc

In the above both D_1 and D_2 have length 8. The cost is 5. (There are mismatches in

position 1, 3, 5, 6 and 8).

Give the most efficient algorithm you can (analyzed as a function of n1 and n2) to compute the alignment of minimum cost.

Solution :

So,

PROOF

We now argue this recursive definition is correct.

You can form D_1 and D_2 (and hence the alignment) for the subproblem from the right to left as follows.

For the time complexity it is clearly O(n1 · n2) since there are n1 · n2 subproblems each of which is solved in constant time. Finally, the c[i, j] matrix can be computed in row major order and just as in the LCS problem a second matrix that contains which of the above 4 cases was applied can also be stored and then used to construct an optimal alignment.

Dynamic Programming Practice Problems : Strings and Interleaving

For bit strings X = x1 . . . xm, Y = y1 . . . yn and Z = z1 . . . zm+n, we say that Z is an interleaving of X and Y if it can be obtained by interleaving the bits in X and Y in a way that maintains the left-to-right order of the bits in X and Y .

For example if X = 101 and Y = 01 then x1x2y1x3y2 = 10011 is an interleaving of X and Y , whereas 11010 is not. Give the most efficient algorithm you can to determine if Z is an interleaving of X and Y . Prove your algorithm is correct and analyze its time complexity as a function m = |X| and n = |Y |.

Solution :

Its a simple sober problem and one can easily write a program to solve it. However, the dynamic representation and proof is important in this case.

Let z(1), . . . , z(i+j) is an interleaving of x1, . . . , xi and y1, . . . yj for 0 < i <= m and 0 < j <= n.

Let c[i, j] be true if and only if z(1) . . . , z(i+j) is an interleaving of x(1), . . . , x(i) and y(1), . . . , y(j) .

If i = 0 then x(i) = (the empty string) and if j = 0 then y(j) = (the empty string).

The subproblem c[i, j] can be recursively defined as shown (where c[m, n] gives the answer to the original problem.

i starts from m and j starts from n. We start from the back of the interleaving string i,e, at i+j position and moves towards the 0th position to see if all strings are tested concurrently and are empty.

First the case where i = j = 0 is when both X and Y are empty and then by definition Z (which is also empty) is a valid interleaving of X and Y .

If x(i) != z(i+j) and y(j) == z(i+j) then there could only be a valid interleaving in which xi appears last in the interleaving, and hence c[i, j] is true exactly when z1, . . . , z ( i+j−1) is a valid interleaving of x1, . . . , x(i−1) and y1, . . . yj which is given by c[i , j -1].

Similarly, when xi == zi+j and yj != zi+j then c[i, j] = c[i − 1, j].

If xi = yj = z(i+j), the interleaving (if it exists) must either end with xi (in which case c[i − 1, j] is true) or must end with yi (in which case c[i, j − 1] is true). Thus returning c[i − 1, j] OR c[i, j − 1] gives the correct answer.

Finally, since in all cases the value of c[i, j] comes directly from the answer to one of the subproblems, we have the optimal substructure property.

The time complexity is clearly O(nm) since there are n ·m subproblems each of which is solved in constant time.

Finally, the c[i, j] matrix can be computed in row major order.

PROOF

We now argue this recursive definition is correct.

Dynamic Programming Practice Problems : Canoes and Posts

You are traveling by a canoe down a river and there are n trading posts along the way. Before starting your journey, you are given for each 1 <= i < j <= n, the fee f(i,j) for renting a canoe from post i to post j. These fees are arbitrary. For example it is possible that f(1,3) = 10 and f(1,4) = 5. You begin at trading post 1 and must end at trading post n (using rented canoes). Your goal is to minimize the rental cost. Give the most efficient algorithm you can for this problem. Be sure to prove that your algorithm yields an optimal solution and analyze the time complexity.

Solution :

Let m[i] be the rental cost for the best solution to go from post i to post n for 1 <= i < n.

The final answer is in m[1].

We can recursively, define m[i] as follows:

m[i] = 0 if i = n

min {for all i<j <=n }(f(i,j) + m[j]) otherwise

Proof of correctness :

The canoe must be rented starting at post i (the starting location) and then returned next at a station among i + 1, . . . , n. In the recurrence we try all possibilities (with j being the station where the canoe is next returned).

Furthermore, since f(i,j) is independent from how the subproblem of going from post

j, . . . , n is solved, we have the optimal substructure property.

For the time complexity there are n subproblems to be solved each of which takes

O(n) time. These subproblems can be computed in the order m[n],m[n−1], . . . ,m[1].

Hence the overall time complexity is O(square(n)).

NOTE: One (of several) alternate general subproblem form that also leads to an O(square(n)) algorithm is to find the best solution to get from post i to post n where the canoe cannot be exchanged until post j is reached (for 1< i < j n).

Dynamic Programming Practice Problems : Cents and Coins

Many times its difficult to find solutions to dynamic problems. In this series of posts, I will try to post a few problems on Dynamic programming and provide solution in a way you can understand and write up your own solutions easily.

However, I would suggest you to try solving the problem on your own before jumping into solutions directly.

1. Suppose we want to make change for n cents, using the least number of coins of denominations 1, 10, and 25 cents.[ for example suppose you have to give n cents (say, 89 cents or 56 cents or 39 cents or 12 cents etc or some 30 cents ) to a shopkeeper for purchasing a small item. In your wallet you have only coins of 1, 10 and 25 cents. Now, decide which coins will you give and what should be the solution.]. Describe an O(n) dynamic programming algorithm to find an optimal solution. (There is also an easy O(1) algorithm but the idea here is to illustrate dynamic programming.)

Solution :

There is a very straightforward O(1) time solution. It can be shown that if n >= 50 then any solution will include a set of coins that adds to exactly 50 cents. Hence it can be shown that an optimal solution uses 2 · floor(n/50) quarters (if we were given 50 cent coins also, we could have taken it as only floor(n/50)) along with an optimal solution for making n/50 − floor(n/50) cents which can be looked up in a table of size 50.

Following is a dynamic programming algorithm ( that does not use the fact that an optimal solution can be proven to use 2 · floor(n/50) quarters and hence is not as efficient.) The general subproblem will be to make change for i cents for 1 i n. Let c[i] denote the fewest coins needed to make i cents.

c[i] = use i pennies if i < 9

Note that c[n] is the optimal number of coins needed for the original problem.

Clearly when i < 10 the optimal solution can use only pennies (since that is the only coin available). Otherwise, the optimal solution either uses a dime or a quarter and both of these are considered.

Finally, the optimal substructure property clearly holds since the final solution just adds one coin to the subproblem solution. There are n subproblems each of which takes O(1) time to solve and hence the overall time complexity is O(n).

Then we can define c[i] recursively by:

c[i − 10] + 1 if 10 <= i < 24
OR

min(c[i − 10] + 1, c[i − 25] + 1) if i <=25

Thursday, September 27, 2012

Paper Reading and Critiques : A Comprehensive approach for the Development of Modular Software Architecture Description Languages

A Comprehensive approach for the Development of Modular Software Architecture Description Languages

In light of an easily extensible ADL language, which is easy to support and build, the paper presents a perfect solution. Following are some points regarding xADL

While extension of features can look attractive as xADL addresses syntactic compatibility and extensibility but since the semantic compatibility is entirely a developer’s responsibility, propagating semantics to across developing teams and ensuring its correctness and uniqueness is an additional burden for a developer. As such, a developer may choose to go with domain-specific ADL which can provide as required features in order to save time, efforts and money.

xADL is an effort to move away from proprietary ADL solutions (fit-in-all solution) by utilizing a standard and extensible XML-based representation for software architectures. But, it lacks support for behavioral descriptions as well as any accompanying analysis tools. It (xADL 2.0) lacks support for distributed and dynamic architectures.

It does not solve the Feature interaction problem. A problem from “Architecture “which states that A feature interaction is some way in which a feature or features modify or influence another feature in defining overall system behavior. A bad feature interaction is one that causes the specification to be incomplete, inconsistent, or un-implementable.

xADL is best for believer of “fit-all in one” as well as those who know the advantages of XML over other data formats as a standardized means for transferring and relating information. Also, its tool base like archEdit, Visio for xADL, XML Spy, Apache Xerces etc is strong which makes it a strong candidate for ADL specification and extension.

Paper Reading and Critiques : Specifying Distributed Software Architecture

Specifying Distributed Software Architecture

This paper tries to justify (using pi-calculus) Darwin as a choice for modeling Distributed Software Architecture.

Distributed systems require dynamism which Darwin supports as it enables dynamic analysis of architectures by instantiating parameters and components to enact "what if" scenarios. It also provides run-time replication of components which is an essential feature of distributed software architecture.
Distributed systems require their constituents to be individual entities. As such, analysis of correctness of individual as well as composite systems which are formed from these systems is required. Paper supports this notion by specifying that components are “context independent”, which is a fair notion in textual format.
But if an architect is using Darwin tools for building distributed systems, than its compilers, constraint checkers, and runtime systems will raise exceptions if when faced with unplanned architectural changes like addition of new component or alteration in connections/services of components. This particular limitation in Darwin’s toolset can be traced back to its limitations in specification-time evolutions, due to its “in-line” connector properties.
Issues such as reliability, efficiency and real time analysis is missing from this paper. While design of Distributed Systems requires proper scalability, performance and reliability analysis. It may also require fault tolerance and bandwidth usage modeling. In the absence of these modeling features for distributed systems it is hard to judge Darwin as a prime candidate for modeling Distributed systems.

Paper reading and Critiques : The Architecture Analysis & design language (AADL): An introduction

The Architecture Analysis & design language (AADL): An introduction

AADL is designed for the specification, analysis, and automated integration of real-time performance-critical (timing, safety, schedulability, fault tolerant, security, etc.) distributed computer systems. It allows analysis of system designs (and system of systems) prior to development. Also, it supports a model-based, model-driven development approach throughout the system life cycle.

The paper lacks reasoning for providing support for embedded real time system, i.e., if a reader has no knowledge about the initial “need” (history) of having a formal DL for embedded systems, he might not fully understand existence of non-functional support by the AADL like schedulability, performance etc. and so might not be able to fully exploit the system for his system’s modal.
Eventually when I searched over internet I found that the basic motivation behind AADL is modeling electronics’ time-critical performance oriented system in which the software is tightly bound to its hardware.
In terms of mapping the software model with the execution counterpart, there is no relevant information so again one has to dig deep into AADL supporting websites.
Authors described basic support and features for AADL in textual and graphical forms with relevant examples, which help us to relate usage with significance and presentation. For example, significance of annex, components, process, flows, calls, connections etc. Help an architect decide when and under what circumstances should he incorporate those features.
AADL supports MetaH’s features like extensibility exposed via annex, dynamic support via mode switching, non-functional properties of both hardware and software counterparts via features like schedulability (via periodic/aperiodic etc.). This is the one of the first architectures supporting real time critical embedded systems. Even Boeing is using the same for its product line. I hope to read more papers on AADL and perhaps architectures of some real systems build on it.

Paper Reading and Critiques : A Classification and Comparison Framework for Software Architecture Description Language

A Classification and Comparison Framework for Software Architecture Description Language

This paper is about a “Classification and Comparison framework for ADLs”, which definitely provides a fair list of categories for comparing features, usability and usefulness of ADLs, in lieu of both general purpose requirements and special needs.

· However, I seriously like the idea that the framework talks about comparing characteristics of frameworks that do perhaps not exist in many ADLs but have been identified in the literature as useful asset to architecture-based-development like traceability and refinement.

· Besides all the positive points as mentioned by author about the framework, its criticism of “Clements’s strategy for using tool support” as classification criteria, is in contradiction to the features of the proposed framework i.e. “Tool Support”.

· The paper clearly differentiates ADL’s from other languages like UML, formal semantic theory like Z notation, State Charts, etc. clearly justifies the need for ADL.

o For example, one can use extensions in UML but it would be highly unrealistic to model every aspect of ADL in UML as the abstraction provided by this design tool can actually raise components and connectors to top-level abstractions where as per the definition of ADL we propose to see only system level modeling and abstractions.

o State charts does not model architectural configuration and “topology” by state charts can only be determined by studying its constituent components.

· Overall, the way paper is categorized to include each and every aspect of ADL’s specification and properties like connectors, and reasons of inclusion and support of those categories (like in-line connectors don’t support evolution and semantics) is an abundant source of information for an architect who can pick and choose from these ADL’s as per the problem domain’s requirements.

· The paper can act as a quick chart for a mid-level to experienced architect and as a starting point for igniting the thought process of evaluation of existing ADL’s to choose from.

· The author might have made a distinction between essential and desired criteria for ADL as it would give more clarity to the basic goal of writing this framework “What is ADL”.

Wednesday, September 26, 2012

Architectural Mistmatch: Why Reuse is so Hard?

Well the paper is written in 1994 which is almost 2 decades earlier, which if we compare with present existing integration strategies , design principles and architectural decisions may or may not give us a complete solution. Like if we want to adopt the 4 given suggestions in this paper, then we may or may not find them that useful for research into “software integration” domain, because there are architectures like SOA -> eliminates the need for dependencies as all components talk through web-based-messaging technology like SOAP, OR we may think of REST which also in various ways eliminates the need for architectural dependency.

•Problems/challenges of Integrating reusable components to build a system ( eg. Aesop -> Environment Generating System )

•Components studied ->

–OBST -> object-oriented DB

–Interviews -> GUI interfaces (Stanford University)

–SoftBench -> event-Based tool integration (HP)

–Mach RPC Interface Generator (MIG)-> RPC stub generator (CMU)

•Performance of resultant system ->

–delayed development

–sluggish Software

Problems in Integration

+ Excessive code. The binary code of our user interface alone was more than 3 Mbytes after stripping. The binary code of our database server was 2 . 3

Mbytes after stripping. Even small tools (of, say, 20 lines of code) interact- ing with our system were more than 600,000 lines after stripping! In an

operating system without shared libraries, running the central compo- nents plus the supporting tools (such as external structure editors, specification

checkers, and compilers) overwhelmed the resources of a midsize workstation.

+ Poor performance. The system operated much more slowly than we wished. Some of the problems occurred because of overhead from tool-to-database communication. For example, saving the state of a simple architectural diagram (containing, say, 20 design objects) took several minutes when we first tried it out. Even with performance tuning, it still took many seconds to perform such an operation. The excessive code also contributed to the performance problem. Under the Andrew File System, which we were using, files are cached at the local workstation in total when they are opened. When tools are large, the start-up overhead is also large. For example, the start-up time of an Aesop environment with even a minimal tool configuration took approximately three minutes.

Need to modify external packages. Even though the reusable packages seemed to run "out of the box" in our initial tests, we discovered that once we

combined them in a complete system they needed major modifications to work together at all. For example, we had to significantly modify the

SoftBench client-event loop (a critical piece of the functionality) for it to work with the Interviews event mechanism. We also had to reverse-engi- neer the memory-allocation routines for OBST to communicate object handles to external tools.

+ Need to reinvent existing finctions. In some cases, modifymg the packages was not enough. W e also had to aug- ment the packages with different ver- sions of the functions they already sup- plied. For example, we were forced to bypass Interviews' support for hierar- chical data structures because it did not allow direct, external access to hierar- chically nested subvisualizations. Similarly, we ended up building our own separate transaction mechanism

that acted as a server on top of a ver- sion of the OBST database software, even though the original version sup- ported transactions. W e did this so

that we could share transactions across multiple address spaces, a capability not in original version.

+ Unnecessarily complicated tools. Many of the architectural tools we wanted to develop on top of the infra- structure were logically simple sequen-

tial programs. However, in many cases it was difficult to build them as such because the standard interface to their environment required them to handle

multiple, independent threads of com- putation simultaneously.

+ Error-prone construction process. As we built the system, modifications became increasingly costly. The time to recompile a new version of the sys-

tem became quite long and seemingly simple modifications (such as the introduction of a new procedure call) would break the automated build rou-

tines. The recompilation time was due in part to the code size. But more sig- nificantly, it was also because of inter- locking code dependencies that

required minor changes to propagate (in the form of required recompila- tions) throughout most of the system.

•Excessive Code -> GUI (3 MB), DB server(2.3 MB), (editors +specs checkers + compilers) = Overwhelmed workstation.

•Poor performance -> Tool-to-DB communications overhead + start-up overhead.

•Modify External Packages -> SoftBench’s Client-event loop, OSBT’s memory-allocation routines

•Reinvent Existing functions -> OSBT’s transaction mechanism (multiple address space access)

•Unnecessarily complicated tools -> handling multithread environment where its not required.

•Error-prone construction process -> costly modifications, reform build routines, dependencies require changes to propagate throughout the system.

Causes in nutshell - Assumptions

•Nature of Components

–Infrastructure

–Control Model

–Data Model

•Nature of connectors

–Protocols

–Data model

•Global architectural structure - > OSBT’s transaction server rebuilt

•Construction process -> Package wise code integration

Solutions

•Make architectural assumptions explicit-> documentation, 3-D interfaces

•Orthogonal Components ->

–substitute sub-modules to play with architectural assumptions

– modularization

•Bridging techniques

–wrappers, negotiated interfaces etc be standardized

•Sources of design guidance ->

–Reuse expertise and guidance from design patterns and tools