L layer NN - with ReLu

2 분 소요

L layer NN - with ReLu

목적

앞선 단층신경망에서 벗어나, 다층신경망을 직접 구현한다.

1 layer NN

모델은 ReLu를 활성화함수로 사용하며, 출력층에서는 sigmoid를 이용하며, 손실함수는 binary cross entropy를 사용합니다.

Untitled

파라미터 초기화

  def initialize_parameters(n_x,n_h,n_y):
    
  	np.random.seed(1) # seed 고정
    	
  	W1 = np.random.randn(n_h,n_x) * 0.01
  	b1 = np.zeros((n_h,1))
  	W2 = np.random.randn(n_y,n_h) * 0.01
  	b2 = np.zeros((n_y,1))
    	
  	parameters = {"W1": W1,
  	                  "b1": b1,
  	                  "W2": W2,
  	                  "b2": b2}
    	    
  	return parameters
    

L layer NN

L레이어를 가진 NN의 shape을 나타내면 다음과 같다.

W[n] = ( c[n] , c[n-1]) 을 가지며, b[n] = (c[n] , 1) ( c는 각 레이어당 노드개수이다.)

Untitled 1

파라미터 초기화

  def initialize_parameters_deep(layer_dims):
    
  	np.random.seed(3)
  	parameters = {}
  	L = len(layer_dims)
    	
  	for i in range(1, L):
  		parameters['W' + str(i)] = np.random.randn(layerdims[i],layerdims[i-1]) * 0.01
  		parameters['b' + str(i)] = np.zeros((layerdims[i],1))
    	
  	return parameters

전향파( Forwarding)

Z[L] = W[L] dot A[L-1] + b[L] 이다.

def linear_forward ( A, W , b):

	Z = np.dot(W,A) + b
	
	cache = (A, W, b)
	
	return Z,cache

전향파 활성화함수

Untitled 2

이모델에서는 활성화함수를 ReLu를 사용하기때문에, ReLu(Z) = max(0,Z)로 구현한다.

  def linear_activation_forward(A_prev, W, b, activation):
    
  	if activation == "sigmoid":
          Z, linear_cache = linear_forward(A_prev,W,b)
          A, activation_cache = sigmoid(Z)
            
      elif activation == "relu":
          Z,linear_cache =linear_forward(A_prev,W,b)
          A, activation_cache = relu(Z)
            
      cache = (linear_cache, activation_cache)
    
      return A, cache

전향파 구현

  def L_model_forward(X, parameters):
    
  	caches = []
  	A = X
  	L = len(parameters) // 2 # w,b 두개의 파라미터가 각 레이어마다 존재하므로 //2
    	
  	for i in range( 1, L) :
  		A_prev = A
  		A, cache = linear_activation_forward(A_prev , parameters['W' + str(i)],parameters['b' + str(i)],'relu')
  		caches.append(cache)
    	
  	AL , cache = linear_activation_forward(A,parameters['W'+str(L)],parameters['b' + str(L)],'sigmoid')
  	caches.append(cache)
    	
  	return AL, caches

비용함수

크로스엔트로피를 손실함수로 지정하고, 이에대한 모든 데이터샘플(m)에 대한 합산 및 평균을 내면 비용함수를 도출할 수 있다.

Untitled 3

  def compute_cost(AL, Y):
    
  	m = Y.shape[1]
  	cost = -(1/m) * np.sum((Y*np.log(AL))+
  	                          ((1-Y)*np.log(1-AL)))
  	cost = np.squeeze(cost)
  	return cost

역전파(BackPropagation)

Untitled 4

Untitled 5

def linear_backward(dZ, cache):
	A_prev, W, b = cache
	m = A_prev.shape[1]
	dW = (1/m) * np.dot(dZ, A_prev.T)
	db = (1/m) * np.sum(dZ,axis=1,keepdims=True)
	dA_prev = np.dot(W.T, dZ)
	return dA_prev, dW, db

RELU 미분

  def linear_activation_backward(dA, cache, activation):
        
      linear_cache, activation_cache = cache
        
      if activation == "relu":
            
          dZ = relu_backward(dA, activation_cache)
          dA_prev, dW, db =linear_backward(dZ,linear_cache)
            
      elif activation == "sigmoid":
            
          dZ = sigmoid_backward(dA, activation_cache)
          dA_prev, dW, db =linear_backward(dZ,linear_cache)
            
      return dA_prev, dW, db

역전파 구현

Untitled 6

  def L_model_backward(AL, Y, caches):
    
  	grads = {}
  	L = len(caches) 
  	m = AL.shape[1]
  	Y = Y.reshape(AL.shape)
    	
  	dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
  	current_cache = caches[L-1]
  	dA_prev_temp, dW_temp, db_temp=linear_activation_backward(dAL, current_cache, activation='sigmoid')
  	grads['dA'+str(L-1)]=dA_prev_temp
  	grads['dW'+str(L)]=dW_temp
  	grads['db'+str(L)]=db_temp
    	
  	for l in reversed(range(L-1)):
    	
  			current_cache = caches[l]
  	    dA_prev_temp, dW_temp, db_temp=linear_activation_backward(dA_prev_temp, current_cache, activation='relu')
  	    grads['dA'+str(l)]=dA_prev_temp
  	    grads['dW'+str(l+1)]=dW_temp
  	    grads['db'+str(l+1)]=db_temp
  	return grads

파라미터 업데이트

def update_parameters(params, grads, learning_rate):

	parameters = copy.deepcopy(params)
	L = len(parameters) // 2
	
	for l in range(L):
	        
	    parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - (learning_rate*grads["dW" + str(l+1)])
	    parameters["b" + str(l+1)] =parameters["b" + str(l+1)] - (learning_rate*grads["db" + str(l+1)])
	return parameters

Twitter Facebook LinkedIn

최종원

L layer NN - with ReLu

L layer NN - with ReLu

공유하기

댓글남기기

참고

NMT - with Attention

L layer NN - Cat Recognition

Transformers Architecture with TensorFlow

RNN - Jazz Solo with LSTM