Thursday, 24 August 2017

Introduction to predictive modelling

What is Predictive modeling?



  • A set of methods to arrive at quantitative solution to problems of business interests
  • It is a part of Data Science or Statistical learning
  • Examples
  1.  Predict whether a patient, hospitalized due to a heart attack, will have a second heart attack.   The prediction is to be based on demographic, diet and clinical measurements for that patient.
  2. Predict the price of a stock in 6 months from now, on the basis of company performance measures and economic data.
  3. Identify the numbers in a handwritten ZIP code, from a digitized image.
  4. Estimate the amount of glucose in the blood of a diabetic person, from the infrared absorption spectrum of that person’s blood.



Predictive modeling process





Types of predictive model learning


The learning problems that we consider can be roughly categorized as either supervised or unsupervised



  • Supervised learning

         In supervised learning, the goal is to predict the value of an outcome measure based on a                      number of input measures
         
         Examples
            Predict whether a patient, hospitalized due to a heart attack, will have a second heart attack.                 The prediction is to be based on demographic, diet and clinical measurements for that patient.

            Predict the price of a stock in 6 months from now, on the basis of company performance                       measures and economic data.

            Identify the numbers in a handwritten ZIP code, from a digitized image.
            Estimate the amount of glucose in the blood of a diabetic person, from the infrared absorption             spectrum of that person’s blood.


                
  • Unsupervised learning
           In unsupervised learning, there is no outcome measure, and the goal is to describe the                          associations and patterns among a set of input measures.


         Examples

           Identifying the products that are usually sold together
            
           Identifying of typical profile of employees who quit quickly



Variable Types and Terminology

  •  X-----> Set of inputs/Independent variables/Predictors
  • Y------->Set of outputs/Dependent variables/Responses

  

Wednesday, 21 June 2017

program to capture and validate argument from command line

usage() {
        echo "usage: $0 [options]" >&2
        cat >&2 <<"EOF"
Options:
  -h, --help            show this help message and exit
  -t TIMEINTERVAL       time interval between each dumps
  -i INSTANCE         websphere instance name
  -v, --verbose         Enable verbose output

EOF
        exit
}
optspec=":hv:t:i:-:"
while getopts "$optspec" optchar; do
    case "${optchar}" in
        -)
            case "${OPTARG}" in
                timeinterval=*)
                    sleep_time=${OPTARG#*=}
                    ;;

                instance=*)
                    instance_name=${OPTARG#*=}
                    ;;
                verbose)
                    VERBOSE=true
                    ;;
                        *)
                    if [ "$OPTERR" = 1 ] && [ "${optspec:0:1}" != ":" ]; then
                        echo "Unknown option --${OPTARG}" >&2
                        usage
                    fi
                    ;;
            esac
            ;;
        t)
            sleep_time=${OPTARG}
            ;;
        i)
            instance_name=${OPTARG}
            ;;

        h)
            usage
            exit 2
            ;;
        *)
            if [ "$OPTERR" != 1 ] || [ "${optspec:0:1}" = ":" ]; then
                echo "Non-option argument: '-${OPTARG}'" >&2
                usage
            fi
            ;;
    esac
done
shift $((OPTIND-1))

if [ -z "$sleep_time" ]
then
        echo need to set sleep_time
        usage
fi
if [ -z "$instance_name" ]

then
        echo need to provide instance name for which dumps need to be collected
        usage
fi

echo "$instance_name"
echo "$sleep_time"

Wednesday, 17 May 2017

SQL Benchmark

This can be helpful if you want to test performance of a sql.

Monday, 20 March 2017

Analyzing AWR Report

AWR can be helpfule to analyse below issue

The AWR can be used to identify
  • SQLs or Modules with heavy loads or potential performance issues. These could be from other processes than the one with reported issues.
  • Symptoms of those heavy loads (e.g. logical I/O (buffer gets), Physical I/O, contention, waits).
  • SQLs that could be using sub-optimal execution plans (e.g. buffer gets, segment statistics).
  • Numbers of executions.
  • Parsing issues.
  • General performance issues, e.g. system capacity (I/O, memory, CPU), system/DB configuration.
  • SGA (shared pool/buffer cache) and PGA sizing advice.
Basic steps to look into the AWR
  1. Need to check DB time is not much greater than Available DB time

Here Available DB Time = Number of CPU * Elapse Time 
                         = 72*60 = 4320 Min 
As here DB time is 17,435 which is much greater than DB time available so here is a issue
      2. Need to check DB time is not much greater than Available DB time
In above example number of session increased from 2k to 4k which is also showing issue  

      3. Load Profile 
This is also an important section to look upon, we can figure out if there is high physical read /write , Hard parses or high rate of sql executions. 

4.Check top 10 foreground event for any suspicious activity

This is very useful section of the report, Here in below example we can see ~ 80 % DB time is spent on 2 events 


5. Check SQL ordered by Elapsed Time

So above two query are suspicious, we can look into execution plan of above query for more details 

Thursday, 16 February 2017

How RMAN utility works internally ?

Recovery Manager or better known as RMAN, is an Oracle client utility that comes pre installed with the Enterprise or Standard edition.
This RMAN executable uses a file called recover.bsq , this file is located in $ORACLE_HOME/rdbms/admin , basically what the executable does, is to interpret the commands you give it , direct server sessions to execute those commands, and record its activity in the TARGET database control file that is being backed up.

The way that the RMAN client directs the server sessions to execute the commands are through channels , a channel represents one stream of data to a device, and corresponds to one database server session. The channel reads data into PGA memory, processes it, and writes it to the output device.
The work of each channel, whether of type disk or System Backup Tape (SBT), is subdivided into the following distinct phases:

1.    Read Phase
A channel reads blocks from disk into input I/O buffers. The allocation of these buffers depend on the number of data files being read simultaneously from disk and written to the same backup piece. One way to control the numbers of files is the backup parameter FILESPERSET
2.    Copy Phase
A channel copies blocks from input buffers to output buffers and performs additional processing on the blocks, like the validation of the data blocks, as it verifies that it's not backing up corrupt data blocks, it's also the phase where it does the binary compression and the backup encryption

3.    Write Phase
A channel writes the blocks from output buffers to storage media. The write phase can be either to SBT or to disk, and these are mutually exclusive, meaning you write to one or the other, not both. 

 Architecture:



Monday, 13 February 2017

What are different type of GC ?

Java has four types of garbage collectors,
  1. Serial Garbage Collector
  2. Parallel Garbage Collector
  3. CMS Garbage Collector
  4. G1 Garbage Collector
Default is Parallel Garbage Collector

  1. Serial Garbage Collector

Serial garbage collector works by holding all the application threads. It is designed for the single-threaded environments. It uses just a single thread for garbage collection. The way it works by freezing all the application threads while doing garbage collection may not be suitable for a server environment. It is best suited for simple command-line programs.
Turn on the -XX:+UseSerialGC JVM argument to use the serial garbage collector.

2. Parallel Garbage Collector


Parallel garbage collector is also called as throughput collector. It is the default garbage collector of the JVM. Unlike serial garbage collector, this uses multiple threads for garbage collection. Similar to serial garbage collector this also freezes all the application threads while performing garbage collection

3. CMS Garbage Collector

Concurrent Mark Sweep (CMS) garbage collector uses multiple threads to scan the heap memory to mark instances for eviction and then sweep the marked instances. CMS garbage collector holds all the application threads in the following two scenarios only,
  1. while marking the referenced objects in the tenured generation space.
  2. if there is a change in heap memory in parallel while doing the garbage collection.
In comparison with parallel garbage collector, CMS collector uses more CPU to ensure better application throughput. If we can allocate more CPU for better performance then CMS garbage collector is the preferred choice over the parallel collector.
Turn on the XX:+USeParNewGC JVM argument to use the CMS garbage collector.

4. G1 Garbage Collector

G1 garbage collector is used for large heap memory areas. It separates the heap memory into regions and does collection within them in parallel. G1 also does compacts the free heap space on the go just after reclaiming the memory. But CMS garbage collector compacts the memory on stop the world (STW) situations. G1 collector prioritizes the region based on most garbage first.
Turn on the –XX:+UseG1GC JVM argument to use the G1 garbage collector.

>> G1 GC is long term replacement for CMS

Differences
  1. It is Compacting collector
  • It mark the objects eligible for eviction, it will reclaim memory of that place which is going to release most of memory ( area consists of more obkect eligible for reclaim memory
  • After marking the live objects in the heap in the same fashion as the mark-sweep algorithm, the heap will often be fragmented. The goal of mark-compact algorithms is to shift the live objects in memory together so the fragmentation is eliminated. The challenge is to correctly update all pointers to the moved objects, most of which will have new memory addresses after the compaction. The issue of handling pointer updates is handled in different ways.
   2. In older version young/eden/ perm memory is fixed, here it is dynamic

Full garbage collections are still single threaded, but if tuned properly your applications should avoid full GCs.

Collecting GC details from running java process without adding the GC monitoring argument

If you’ve forgotten to enable GC logging, or wanted to monitor GC in the middle of the load test, there is a good substitute to watch how GC operates over time.

jstat is the tool of choice. jstat can provide good visibility into GC for a live program. jstat provides nine options to print different information about the heap; jstat -options will provide the full list.
One useful option is -gcutil, which displays the time spent in GC as well as the percentage of each GC area that is currently filled. Other options to jstat will display the
GC sizes in terms of KB.

jstat takes an optional argument—the number of milliseconds to repeat the command—so it can monitor over time the effect of GC in an application.

Syntax : jstat -gcutil <pid> <Milli Sec to repeat>

Here is some sample output repeated every second:
Ø jstat -gcutil 9076 1223

      S0     S1     E        O         M     CCS    YGC  YGCT    FGC    FGCT     GCT
80.52   0.00  12.98  45.07  95.85  91.20   4808  296.955    42   18.608  315.563
80.52   0.00  12.98  45.07  95.85  91.20   4808  296.955    42   18.608  315.563
80.52   0.00  12.98  45.07  95.85  91.20   4808  296.955    42   18.608  315.563

When monitoring started, the program had already performed 4808 collections of the young generation (YGC), which took a total of 296.955 seconds (YGCT). It had also performed
42 full GCs (FGC) requiring 18.608 seconds (FGCT); hence the total time in GC (GCT) was 315.563 seconds.

Since the sample was taken after the no load test and since there is no active load, so the readings were same.


All three sections of the young generation are displayed here: the two survivor spaces (S0 and S1) and eden (E). Then old generation (O) and MetaSpace (M).

Monday, 6 February 2017

Address Resolution Protocol

Address Resolution Protocol (ARP)

ARP is used to translate an IP address into MAC address.

If Computer1 wants to communicate with Computer2 on a LAN,  when it comes to Layer2 Communication ( Data Link Layer ), computers identify each other with MAC addresses.

When Comp1 gets the IP of Comp2:

Ø It looks at its own cache to see if it has the MAC address

Ø If present, it appends message with the address and sends it over. Else, it will broadcast a message to all the systems in the network asking for a MAC address

Ø The ARP Request is received by all the systems but only the computer with the target IP responds to it

Ø Now, since both Comp1 & Comp2 have IP and MAC, they can communicate

Router Information Protocol

·      RIP is a distance vector protocol. Using RIP, each router sends its entire routing table to its closest neighbors every 30 seconds
·      The neighbors in turn will pass the information on to their nearest neighbors, and so on
·      If a router crashes or a network connection is severed, the network discovers this because that router stops sending updates to its neighbors, or stops sending and receiving updates along the severed connection
·      If a given route in the routing table isn't updated across six successive update cycles (that is, for 180 seconds) a RIP router will drop that route, letting the rest of the network know via its own updates about the problem and begin the process of reconverging on a new network topology.


Ø When router receives routing updates, it compares them with the routes which it already has in its routing table.

Ø If update has information about a route which is not available in its routing table, router will consider that route as a new route.

Ø Router will add all new routes in routing table before updating existing one.

Ø If update has better information for any existing route, router will replace old entry with new route.

Ø If update has exactly same information about any existing route, router will reset the timer for that entry in routing table.



What is basic approach to look into out of memory issue ?

  • Identify nature of memory issue
  1. Is it due to spike in usage?
There is a possibility that application is running fine till 100 user but when user count increased to 200 there is spike in memory usage and system throw out of memory error


     2. Is it memory leak?
There is a possibility that developers written a bad code lead to unwanted uses of memory by application and probem in freeing up memory , in this kind of issue we will see that there are GC happening in the application but application memory usage is almost constant. What i mean to say is even GC is happening but memory freeing is very small or nothing.

  • Once you have successfully identify the problem you can open the memory dump using MAT and analyse the usage further  for RCA

Saturday, 4 February 2017

Types of Dynamic Routing Protocols





IGP – Interior Gateway Protocol (OSPF , RIP, EIGRP) : Used to find network path information within a single autonomous system(AS)

1.   DISTANCE VECTOR –

Distance vector routing is so named because it involves two factors: the distance, or metric, of a destination, and the vector, or direction to take to get there.

Routing information is only exchanged between directly connected neighbors. This means a router knows from which neighbor a route was learned, but it does not know where that neighbor learned the route; a router can't see beyond its own neighbors

2.   LINK STATE –

Link-state routing, in contrast, requires that all routers know about the paths reachable by all other routers in the network.

Link-state information is flooded throughout the link-state domain to ensure all routers posses a synchronized copy of the area's link-state database.

From this common database, each router constructs its own relative shortest-path tree, with itself as the root, for all known routes


EGP – Exterior Gateway Protocol  – Used to find network path information between different autonomous systems.

BGP is the only EGP that exists currently.

Commonly used terminologies:

HOP COUNT - Hop count is the number of network devices between the starting node and the destination node


AUTONOMOUS SYSTEM – Internetwork under the control of a single organization. Ex: AT&T, University Network

Few Basic C programs in Loadrunner


Q: which one will be printed in LR o/p
  1. printf(“Hello World”);
  2. lr_output_message(“Hello World);
Answer: Both will be printed if log is enabled else only 2nd will be printed
Q . write c  code to sum two number and print it on console
Int i=0;
Int j=5;
Int sum =i +j;
lr_output_message(“the sum of two number is %d” ,sum);

Q. Write c code to print a string variable in loadrunner
char str[] ="hello"
lr_output_message(%s, str);

Q. Write c code to copy a string to another string and print in loadrunner
char str1[] = "Hello World" ;
char str2[30] ;

strcopy(str2,str1);
lr_output_message("New string is %s", str2);

Q. write code to merge two string in loadrunner using c
char str1[] = "Hello World" ;
char str2[30] ;
strncopy(str2,str1,5);
lr_output_message("New string is %s", str2);

Q. write code copy part of string and print it in loadrunner using c

char str1[] = "Hello" ;
char str2[] =”World”
strcat(str1,str2);
lr_output_message("New string is %s", str1);

Q. write code to find length of c string variable and print in lr ?

char str1[] = "Hello" ;

lr_output_message("New string is %s", strlen(str1));

Q write code to find first and last occurrence of a character in  given string ?

char str1[] = "Hello World" ;
char * first;
char * last;

//finding first occurance of o

first = (char *) strchr(str1, 'o');
lr_output_message(first);

last = (char *) strrchr(str1, 'o');

Q write code to find first occurrence of a string in  given string ?

char str1[] = "I am a leader but i am not a boss” ;
char search[] = "am" ;
char * postion ;
int offset ;

position = (char *) strstr(str1, search);
offset = (int) (position - str +1)
lr_output_message(" first occurrence of search string is at %d ", offset);

Q. Which function is used to compare two string
Ans: strcmp

How to handle dynamic drop down list?, e.g while creating users it need to pick country from a drop down list

Ans:
Insert below line before the request having countries details
web_reg_save_param_ex( "Parameter=country" "LB=\>", "RB=</option>", "Ordinal=ALL", SEARCH_FILTERS, LAST);


It will store all the country details in CountryName array, we can access it starting from CountryName_1
Count of elements can be counted with CountryName_Count
It is a lr variable so we can access it using lr_eval_string function
lr_output_messge(lr_eval_string(“{CountryName_1 }”));

OSPF Protocol

OSPF – Open Shortest Path First Protocol

Routers connect networks through IP. OSPF is used to to find the best path for the packets as they pass through a set of connected networks.

OSPF is a Link State Protocol. Link State Protocol exchange the state of their links and the cost associated with it.

OSPF, when configured, will listen to its neighbors and gather all link state data available to build a topology map of all available paths in the network & save this information in the Topology Database.

With this information, it calculates the shortest path using the Dijkstra algorithm.

OSPF Areas:

A number of routers are grouped together into Routing Areas to simplify and optimize resource available resources.

Having multiple routers can flood the network and reduce efficiency. Hence, having resource optimization is especially important for large enterprise systems.

Areas are a logical collection of routers that carry the same Area ID or number inside of an OSPF network, the OSPF network itself can contain multiple areas, the first and main Area is called the backbone area “Area 0”, all other areas must connect to Area 0.



Friday, 3 February 2017

IPV4 v/s IPV6

IPV4 ( Internet Protocol Version4 ) is the most widely used Internet Protocol to connect to the Internet.

It uses 32- bits with a total of 2^32 addresses. With the growth of internet, the number of addresses available will eventually run out!

IPV6 being the newest version would increase the pool of addresses along with many other advantages over the previous version:

- Auto-configuration
- No more private address collisions
- Better multicast routing
- Simpler header format
- Simplified, more efficient routing

An IP address is a binary number but can be stored as text for human readers.  For example, a 32-bit numeric address (IPv4) is written in decimal as four numbers separated by periods. Each number can be zero to 255. For example, 1.160.10.240 could be an IP address.

IPv6 addresses are 128-bit IP address written in hexadecimal and separated by colons. An example IPv6 address could be written like this: 3ffe:1900:4545:3:200:f8ff:fe21:67cf